The author

Nick Sime

Director of Fraud & Credit Risk Modelling

News & Views / Fairness in AI: applying deep learning to credit scoring

07 October 2019

Fairness in AI: applying deep learning to credit scoring

As Artificial Intelligence (AI) becomes more widely used in our everyday lives to make key decisions, from making recommendations of parole decisions of prisoners to making lending decisions, ensuring fairness in AI is paramount.

At the O’Reilly conference in London, Jaywing took a closer look at why producing predictions that can be explained and justified is critical, highlighting the things that need to be considered in order to produce fair outcomes when machine learning systems are put into production.

Why the need for fairness in AI?

As AI is implemented into more areas of our lives, it becomes increasingly important for those systems to be able to explain how they arrive at their decision. Many AI applications are still plagued by the ‘black box’ problem of explainability and transparency, limiting the scope of their application.

It can be difficult to understand how an AI system arrives at a decision, making it difficult to know when problematic behaviours, such as bias, are present. For example, in 2016 a high-profile debate raged in the media when Propublica alleged that the COMPAS system which is used to predict which criminals are likely to re-offend - and which is used in determining release decisions – was biased. When implementing black box machine learning models, care is needed to avoid undesirable outcomes, and models that we cannot explain or control have the potential to be extremely problematic.

How does bias creep into AI algorithms?

“Software is not free of human influence. Algorithms are written and maintained by people, and machine learning algorithms adjust what they do based on people’s behaviour. As a result, algorithms can reinforce human prejudices,” Miller 2015.

Human biases are, regrettably, fairly common and so are embedded in many data sets. The algorithms trained on these data sets, which are entirely objective in their own right, may pick up on and express these biases. Where the algorithm is of a black box nature, this might not be spotted before the model is put to use, and perhaps not even for many years subsequently.

It is therefore critical that those developing machine learning systems are cognisant of these issues and take steps to avoid them. Being able to understand and explain the outputs of these models and why they make the decisions they do is key and will be increasingly important as AI adoption impacts more areas of our lives.

In the worlds of finance, insurance and banking, where firms must be able to stand by each and every decision taken by a credit or fraud model, getting the balance of control and explainability right is essential.

Where does this fit in credit scoring?

Machine learning has been used in credit scoring for over three decades, making it one of the earliest commercial applications of the technology. It is used to automate lending decisions that are hugely material to customers. As a result, these decisions are subject to a high level of scrutiny and ensuring fair outcomes is essential.

While machine learning methods have evolved significantly since being adopted in credit scoring, with the introduction of techniques such as deep learning and ensemble methods for example, the industry has never moved away from the use of generalised linear models (GLM). This is largely driven by the need to be able to explain and justify decisions to both consumers and regulators, which is straightforward for GLMs due to their inherent simplicity, but very difficult for newer, black box techniques.

When developing credit scores, firms take huge care to ensure that model parameters follow intuitive patterns e.g. increasing salary should mean increasing score and being in full-time employment should always produce a higher score than being unemployed. This helps to ensure fair assignments and crucially, it allows lenders to be confident, in advance, that the models that are being deployed will produce sensible decisions. It both reduces the likelihood that they will be challenged on what appears to be an unjustified decision and provides a mechanism by which the decision can be justified should that happen.

So, while it has been evident for some time that black box techniques such as deep learning can produce more powerful models, most lenders have not adopted them, due of these concerns.

How can deep learning be applied to credit scoring?

In non-linear models, the way that the model responds to changes in an input variable can change from case to case. For one customer it might say that an increasing salary is good and should attract a higher score, but for another it may say the opposite. This is highly problematic from the perspective of being able to justify decisions.

Our AI modelling software, Archetype, was designed to solve the black box problem of AI in credit scoring by allowing constraints to be placed on how the model behaves in respect of each of its inputs, in just the same way that lenders have done for GLMs historically. Additionally, a suite of reports is produced that shines light on which variables contribute most to the model predictions and detail of how they do so. This enables our clients to produce explainable and controllable models powered by Deep Learning.

Archetype is the first (and to our knowledge) the only commercially available product of its kind. The software enables analysts to squeeze much more insight from predictive models, while being confident that the models will behave appropriately when deployed in the real world. We have run a large number of trials on behalf of our clients and have found that Archetype consistently yields uplifts in predictive power ranging between 5% and 18% - which generally amounts to millions in bad debt savings and better customer outcomes - while still producing fully justifiable decisions.

14 April 2026

Identifying hidden fraud networks: Why fraud detection needs a network-based approach

Fraud is now networked. Learn how graph databases help detect fraud rings, reduce losses and improve real-time decision making.

Ben Archer

12 March 2026

Sample size and model choice: When GBMs outperform DNNs in credit risk

When do GBMs outperform DNNs in credit risk modelling? New research shows how sample size and number of defaults influence machine learning model performance.

Nick Sime

06 March 2026

Smarter fraud and AML convergence: Escaping the silos

Why fraud and AML separation weakens detection and what unified, graph-based architecture requires by 2026.

Ben Archer

04 March 2026

Geopolitical shocks and credit risk: Are your models ready?

How geopolitical realignment challenges credit risk forecasting. Lessons from climate risk modelling and the 2025 BCST for UK banks and risk teams.

Steve Finlay

10 February 2026

Machine learning model stability: Do Gradient Boosting Machines (GBMs) and Deep Neural Networks (DNNs) really degrade faster?

Machine learning models often outperform early, but what happens after go-live? We look at long-term performance of GBMs and DNNs using multi-year credit data.

Nick Sime

06 February 2026

Tackling telecom-enabled fraud through smarter data collaboration

UK fraud losses topped £629m in H1 2025. Real-time telecom intelligence now lets banks intervene during scams. What CROs need to know.

Ben Archer

Want to work with us? Get in touch

Nick Sime

Fairness in AI: applying deep learning to credit scoring

Why the need for fairness in AI?

How does bias creep into AI algorithms?

Where does this fit in credit scoring?

How can deep learning be applied to credit scoring?

Identifying hidden fraud networks: Why fraud detection needs a network-based approach

Sample size and model choice: When GBMs outperform DNNs in credit risk

Smarter fraud and AML convergence: Escaping the silos

Geopolitical shocks and credit risk: Are your models ready?

Machine learning model stability: Do Gradient Boosting Machines (GBMs) and Deep Neural Networks (DNNs) really degrade faster?

Tackling telecom-enabled fraud through smarter data collaboration

Part of Jaywing

Follow us