According to recent research, lenders embracing AI are projected to benefit from a 31% increase in profitability, and most lenders believe that AI will transform areas like fraud detection completely within the next five years. But, how are lenders practically benefiting from it now? In this write-up of our recent webinar with UK Finance, we share learnings from work we have done with over 20 UK lenders.
Through our own work in this area we’ve seen that the evidence time and time again. AI can deliver impressive results when compared to traditional scoring approaches… With some lenders seeing an 18% uplift in their credit models, even on a fully constrained model. But adopting these techniques brings some challenges – most obviously the problem of explaining and justifying the scores that are generated from "black box" models.
We have also found that a subtler challenge arises during reject inference as a result of these models being so powerful, since they can fit very tightly to inferred outcomes in a way that linear models cannot.
During a webinar with UK Finance, we shared our experience of working with over 20 UK lenders in this field, talking about how we meet the regulatory challenge and use explainable, controllable AI to overcome the barriers to adoption within the lending world. In it, we covered key questions like:
- What increase in power can I expect?
- How stable are the models?
- What’s the typical cost of constraining the model to ensure sensible, explainable behaviour? – the answer surprised us
- How complex is modelling and the deployment of the resulting score?
- What are the governance considerations around deep learning models?
- What’s the regulator’s view?
- How do you handle reject inference in the new world?
Through surveying the c.100 attendees, we got some interesting insights. Here's a sample:
- The majority (c. 85%), have either not progressed with using AI at all in their credit models or have tested using it but haven’t got machine learning properly embedded.
- Only 14.7% have built models or are using AI-based models in credit scoring.
- Most, 80%, say it is essential to be able to explain machine learning-based models so it is likely that the lenders not using machine learning are concerned about explainability.
At the end of the webinar, we were asked several questions which we weren’t able to answer in full during the session, so we followed up with individuals separately. We felt these questions would be of interest to other lenders…
Will there be limitations of using deep learning for credit scoring if the company’s customer data set is not big, i.e. the sample is too small? How can we work around this?
This is a familiar question; we have often found that start-up lenders expect AI to be able to solve problems from very limited samples of data. But as with any scoring project, there needs to be enough data to produce a robust model, and in that respect, deep learning is no different from non-linear techniques such as linear regression. Typically, we recommend a minimum of around 10,000 records, of which around 500-1,000 have the outcome you are trying to predict; that predicted outcome would typically emerge over at least a six-month period.
More data is always preferable, and the best results (those that have the highest performance and are most stable) are those where there are many cases to train the model and a large number of candidate variables to choose from. Newer organisations with rapidly growing customer bases can often see very rapid improvement from an initial model based on limited data through to one which is based on a full outcome window on a lot more data.
There are some techniques that can be used where a company has limited data. The most widespread approach within a lending context is to get a look-alike sample of data from a credit reference agency, augmenting your own data with a retrospective sample of similar accounts from other (anonymous) providers.
Similarly, the use of pooled data for non-competitive situations (such as fraud detection) can be a way forward. Another approach that we’ve used effectively is known as ‘bootstrapping’ – effectively using multiple sample cuts of the same data extract to generate a model.
Ultimately, AI is not a magic bullet, and deep learning needs data to learn from in order to be most effective.
You mentioned around 500 'bad records' as being a requirement – is this a standard baseline for neural network modelling - or does the number of bad records depend on the number of inputs to the model?
It’s guidance more than a standard approach. A low number of bad records in an application model will tend to generate a result in which the performance of the build sample is not matched by the validation sample, as there is little consistency between records featuring the modelled outcome, or where the model struggles to perform on unseen data.
However, each model development is different, and there will be cases where the provision of a more diverse set of input characteristics will help. Controlling the topology of the model build in these cases can also definitely help.
How do you establish what is reliable data for the models? Do you need a kind of data tuning or data mapping exercise prior to activating the models in production?
There is no substitute for good data governance and analytical oversight in determining which data to include within a modelling project, whether undertaken using AI or by more traditional means. We would always recommend that source data is considered in terms of its coverage and quality, and its potential to introduce bias within the model (for example, avoiding the use of any fields which subtly encode gender or race).
And within Archetype, our recommendation is to use some analytical expertise to sense check the choice of variables ultimately used for modelling.
Once you’ve got a data set for modelling which is agreed for use, the reliability of data for use in the models is usually evident through some fairly standard reporting and analysis. For example, if a model is generated based on some fields with very sparsely-populated data values, this can cause some instability, and a better approach is to merge together similarly-performing data values so that the model has a better chance of exploiting the interactions between variables, perhaps treating continuous data more like a categorical field.
For use in production, there is a requirement to ensure that the data used in the development stage does align with what the model will receive in a live environment. Archetype will produce execution code based on development data field names, and typically there is a mapping exercise such that the client decision engine passes its data into corresponding fields.
However, there is also a need to adjust any inbound data if it differs from what the model is expecting. That said, Archetype models will handle unexpected data cleanly.
Are you also developing deep learning models for credit or fundamental analysis for wholesale/corporate customers?
We are happy to consider any use case: Archetype is agnostic about exactly what it is modelling and about the data that it bases its outcomes on. The key requirement is for enough data to achieve a good result (e.g. 10,000 or so records, including around 1,000 of the outcome being modelled).
For corporate customers, e.g. in the commercial lending space, we often recommend a simpler approach based on slotting, using more traditional statistical techniques due to the fact that these lending decisions are often a little more judgmental in nature. However, for an organisation with lots of data, there is no reason why AI can’t be a means of generating a scoring model.
How important is the currency of data in the model performance? For example, an Open Banking data feed rather than Credit Referencing Agency (CRA) data?
Either source of data would be good for the creation of credit models, and we think that open banking is an interesting development that is likely to generate additional uplift within a credit model because of its ability to bring more detailed insights into customer income and expenditure into the model. For instance, spend on gambling or the use of expensive forms of credit are likely to be indicators of risk that would not routinely be evident from CRA data.
Open Banking also promises a much more comprehensive and accurate view of affordability and income than has been available from any of the credit reference agencies to date, since most affordability solutions are rich in assumptions and are heavily tilted towards exclusive use by current account providers.
As things stand, there are few organisations who can couple sufficient coverage of Open Banking data with performance / outcome data, as – ideally at least - it relies on customer permissions being in place for the capture of data over an extended period of time.
Additionally, CRA data remains the best source of outcome data as it considers industry-standard outcomes such as default or missed payments, which are readily visible within a standard bureau call. We therefore think that a combination of Open Banking and CRA data is the pragmatic route to an improved result.
However, an organisation that could crack the customer permissions threshold on Open Banking and use their own performance data to predict outcomes would be in a very strong position.
Wrapping it up
While AI is sophisticated, it’s actually incredibly straightforward to implement. It can provide you with significant uplifts in Gini, ensure regulatory compliance through a completely transparent approach and significantly increase profitability.
If you have any additional questions about AI in risk, don’t hesitate to contact one of our experts today. You can read our latest case studies to see how we’re helping lenders to pioneer AI models, or download our latest guide to AI in risk.