AI adoption is rising across UK banks, but the PRA’s October 2025 roundtable showed how often governance frameworks lag behind. Following a recent review, the regulator found that several core expectations in SS1/23 are not being applied consistently, particularly where AI models introduce opacity, uncertainty, and faster rates of change than traditional approaches.
The main takeaway? Without defined appetites, accurate tiering, robust validation, and more adaptive monitoring, firms risk operating models that behave beyond their intended tolerance.

What this article covers
|
The PRA’s observations centred on several areas where control frameworks need more precision. The sections below summarise the issues raised and what they mean in practice:
1. The threshold trap (risk appetite)
PRA observation
The PRA noted that Boards are often approving AI and ML models without setting clear boundaries for acceptable performance or uncertainty. In several cases, models were signed off without a defined Model Risk Appetite, leaving no quantitative basis for later challenge or intervention.
Jaywing’s view
AI models bring more uncertainty and less predictable behaviour than traditional approaches, so a defined tolerance is essential before deployment. Without it, firms cannot determine whether a model is operating within acceptable limits.
The fix
Avoid broad, qualitative appetite statements. Set measurable thresholds that reflect the model’s purpose. For example: “We accept a 5% increase in false-positive fraud alerts if it reduces detection time by 50%.”
Revalidation triggers should link directly to these limits so that any movement beyond tolerance prompts action.
2. Is your inventory telling the truth? (Tiering)
PRA observation
The PRA highlighted cases where firm-wide policy stated that AI and ML models should be classified as at least Medium Complexity, yet the corresponding model inventory recorded them as Low. This inconsistency limits the firm’s ability to understand its overall model risk and align controls to actual complexity.
Jaywing’s view
A single “low-risk” model might appear benign, but deploying the same black-box technique across multiple portfolios can introduce substantial aggregate risk that is not reflected in reporting.
When firms classify AI as Low to reduce governance requirements, they are bypassing the level of scrutiny SS1/23 expects. Tiering must reflect genuine complexity, not operational convenience.
3. Theory vs. reality (validation)
PRA observation
The PRA questioned the reliance on standard validation techniques such as cross-validation, noting that these methods assume independence and stationarity, assumptions that often don’t hold when working with large, complex datasets. As a result, models can appear stable in development but behave differently once deployed.

[Source: PRA event slides 2025]
Jaywing’s view
This is a familiar issue: a model may perform well on curated, historical data yet deteriorate quickly in real conditions. Validation that does not account for non-stationarity may reach a misleading conclusion.
The fix
Validation should test how the model behaves when underlying relationships change. This includes assessing performance under non-stationary conditions and incorporating stress scenarios that challenge the model’s stability and boundary conditions. Without these checks, firms risk relying on models that have not been tested for the conditions they are most likely to encounter.
4. The lie of “independence” (Explainability)
PRA observation
One of the PRA’s strongest technical warnings related to explainability. Many firms rely on tools such as SHAP and LIME to justify model decisions. These post-hoc methods, however, generate attributions by approximating local model behaviour around a prediction, often assuming feature independence or stable sampling—which rarely holds in correlated financial datasets.

[Source: PRA event slides 2025]
Jaywing’s view
Variables like income, loan size and debt-to-income ratios are inherently correlated, causing SHAP and LIME to misallocate importance. A truly causal driver, such as high existing debt, may be downplayed while a proxy such as low income is overstated. We can counter this by checking our data for high feature correlations and validating attributions vs expert-defined causal relationships (e.g. employment -> income -> debt capacity -> default risk) before regulatory or customer use.
The risk
A firm could present a SHAP-based explanation for a credit decline, only to learn later that the output was unstable because the underlying assumptions were not met. That weakness becomes difficult to defend in a regulatory review.
5. Using a sledgehammer to crack a nut (Complexity)
PRA observation
The PRA questioned the tendency to default to complex model architectures. Firms were encouraged to consider whether any incremental performance gain is sufficient to justify the added opacity and governance burden.
Jaywing’s view
If a neural network delivers only a modest uplift compared with a logistic regression, the additional governance required may not be warranted. In many cases, simpler and more transparent approaches are more appropriate and easier to manage under SS1/23.
Here’s when to question black-box AI:
|
Scenario |
Question to ask |
PRA expectation |
|
Marginal performance (<5% lift) |
Is the complexity justified? |
Consider simpler, transparent models (GLM/GAM). |
|
High feature correlation |
Can SHAP be trusted? |
Flag standard explainability tools as unreliable. |
|
Rapid deployment (>3 portfolios) |
Is aggregate risk measured? |
Re-assess Tiering classification immediately. |
6. The need for advanced monitoring techniques
PRA observation
The PRA noted that monitoring cycles for AI and ML models are often too infrequent. In several firms, monitoring occurred on a six-monthly basis, which the regulator considered insufficient given how quickly performance can deteriorate. In some cases, degradation could occur faster than governance processes can respond.
Jaywing’s view
These models require monitoring that can detect abrupt changes and act immediately. Automated controls (effectively a circuit breaker) allow the system to revert to a challenger model when performance moves outside set thresholds. This avoids delays associated with committee cycles and ensures the model continues to operate within its agreed tolerance.
Your next Board meeting: three questions to ask
As we’ve just seen, there’s quite a lot to unpack. Here’s a helpful place to start. At your next board meeting, ask your Model Risk function these three questions:
- The appetite test:
“Can we articulate the specific, quantitative risk appetite for our live AI models?” - The aggregate test:
“Do we have AI models marked as ‘Low Complexity’ in our inventory? If so, have we measured the cumulative risk?” - The circuit breaker test:
“If our fraud model drifts tonight, does it turn itself off, or do we wait for next month’s report?”
If these questions surface uncertainty, Jaywing’s regulatory experts can run a targeted E/I Stability Audit to assess whether your current governance and explainability approach meets the standard expected under SS1/23.
Strengthening control where it’s most needed
This sounds obvious, but the PRA’s observations point to one key message: AI models require the same level of discipline as any other high-impact model, supported by controls that reflect how these models behave in practice. Clear appetites, accurate tiering, realistic validation, and monitoring that responds quickly all contribute to a framework that withstands supervisory scrutiny.
One of the tips we give to Boards and senior risk teams is to prioritise ensuring these elements all work together. When governance is aligned, firms gain a clearer view of model performance, a stronger basis for approval, and fewer surprises as AI and ML techniques become more widely embedded across portfolios.
If you need guidance, we’d love to hear from you.