As Artificial Intelligence (AI) becomes more widely used in our everyday lives to make key decisions, from making recommendations of parole decisions of prisoners to making lending decisions, ensuring fairness in AI is paramount.
At the O’Reilly conference in London, Jaywing took a closer look at why producing predictions that can be explained and justified is critical, highlighting the things that need to be considered in order to produce fair outcomes when machine learning systems are put into production.
Why the need for fairness in AI?
As AI is implemented into more areas of our lives, it becomes increasingly important for those systems to be able to explain how they arrive at their decision. Many AI applications are still plagued by the ‘black box’ problem of explainability and transparency, limiting the scope of their application.
It can be difficult to understand how an AI system arrives at a decision, making it difficult to know when problematic behaviours, such as bias, are present. For example, in 2016 a high-profile debate raged in the media when Propublica alleged that the COMPAS system which is used to predict which criminals are likely to re-offend - and which is used in determining release decisions – was biased. When implementing black box machine learning models, care is needed to avoid undesirable outcomes, and models that we cannot explain or control have the potential to be extremely problematic.
How does bias creep into AI algorithms?
“Software is not free of human influence. Algorithms are written and maintained by people, and machine learning algorithms adjust what they do based on people’s behaviour. As a result, algorithms can reinforce human prejudices,” Miller 2015.
Human biases are, regrettably, fairly common and so are embedded in many data sets. The algorithms trained on these data sets, which are entirely objective in their own right, may pick up on and express these biases. Where the algorithm is of a black box nature, this might not be spotted before the model is put to use, and perhaps not even for many years subsequently.
It is therefore critical that those developing machine learning systems are cognisant of these issues and take steps to avoid them. Being able to understand and explain the outputs of these models and why they make the decisions they do is key and will be increasingly important as AI adoption impacts more areas of our lives.
In the worlds of finance, insurance and banking, where firms must be able to stand by each and every decision taken by a credit or fraud model, getting the balance of control and explainability right is essential.
Where does this fit in credit scoring?
Machine learning has been used in credit scoring for over three decades, making it one of the earliest commercial applications of the technology. It is used to automate lending decisions that are hugely material to customers. As a result, these decisions are subject to a high level of scrutiny and ensuring fair outcomes is essential.
While machine learning methods have evolved significantly since being adopted in credit scoring, with the introduction of techniques such as deep learning and ensemble methods for example, the industry has never moved away from the use of generalised linear models (GLM). This is largely driven by the need to be able to explain and justify decisions to both consumers and regulators, which is straightforward for GLMs due to their inherent simplicity, but very difficult for newer, black box techniques.
When developing credit scores, firms take huge care to ensure that model parameters follow intuitive patterns e.g. increasing salary should mean increasing score and being in full-time employment should always produce a higher score than being unemployed. This helps to ensure fair assignments and crucially, it allows lenders to be confident, in advance, that the models that are being deployed will produce sensible decisions. It both reduces the likelihood that they will be challenged on what appears to be an unjustified decision and provides a mechanism by which the decision can be justified should that happen.
So, while it has been evident for some time that black box techniques such as deep learning can produce more powerful models, most lenders have not adopted them, due of these concerns.
How can deep learning be applied to credit scoring?
In non-linear models, the way that the model responds to changes in an input variable can change from case to case. For one customer it might say that an increasing salary is good and should attract a higher score, but for another it may say the opposite. This is highly problematic from the perspective of being able to justify decisions.
Our AI modelling software, Archetype, was designed to solve the black box problem of AI in credit scoring by allowing constraints to be placed on how the model behaves in respect of each of its inputs, in just the same way that lenders have done for GLMs historically. Additionally, a suite of reports is produced that shines light on which variables contribute most to the model predictions and detail of how they do so. This enables our clients to produce explainable and controllable models powered by Deep Learning.
Archetype is the first (and to our knowledge) the only commercially available product of its kind. The software enables analysts to squeeze much more insight from predictive models, while being confident that the models will behave appropriately when deployed in the real world. We have run a large number of trials on behalf of our clients and have found that Archetype consistently yields uplifts in predictive power ranging between 5% and 18% - which generally amounts to millions in bad debt savings and better customer outcomes - while still producing fully justifiable decisions.