AI Governance

Governance is the road, not the speed bump.

The most common failure mode in regulated AI adoption is treating governance as friction to be reduced. The framing flip that makes it work.

By Bamidele Aly 6 min read

Two camps that cannot work together

Most "AI for finance" advice falls into one of two camps. The first is engineers who do not understand why regulators are nervous. They write proofs of concept that show what is technically possible and assume the deployment problem is "managing change". The second is risk professionals who do not understand what the technology can actually do. They write frameworks that prohibit categories of use the technology has already moved past.

Both camps then publish frameworks the other camp cannot use. The first camp's frameworks treat every model as a single deployable artefact and ignore the supervisory pipeline. The second camp's frameworks treat every model as a black box and demand controls that, applied literally, prohibit the use of any non-deterministic system. Both fail at the first regulator visit, in opposite ways.

The way through is neither speed nor caution; it is the right framing of what the controls are for.

The actual question

The question that matters when generative AI enters a regulated finance workflow is not "is this model good enough?" It is "can a supervisor follow what happened?". That changes the control surface.

In a pre-AI workflow, controls live around the human. The reviewer's name is on the file; the file references the source documents; the source documents are versioned. A supervisor can reconstruct, after the fact, what was considered and what was decided. Re-reviewing the human's work is the unit of control.

When a model drafts the disclosure, this breaks. The model's output is not the model's reasoning. The prompt is one input among many; the corpus the model was fine-tuned on is another; the system prompt is a third; the temperature setting is a fourth. None of these are visible in the disclosure itself. So the control surface has to shift from reviewing the work to reviewing the model's behaviour around the work.

Three things have to become traceable: input provenance, decision path, output revision history. Anything short of that fails the first time a supervisor sits down with the artefact and asks "how did this paragraph come to read this way?"

The question is not "is this model good enough?" It is "can a supervisor follow what happened?"

Practical instruments

Three that are doing real work right now. Risk-tiered classification: not all AI is equal. An LLM that drafts a memo for a senior to approve sits in a different risk class than a model that scores credit, which sits in a different class again than one that influences capital adequacy. The first needs draft-and-review controls; the second needs ongoing performance monitoring; the third needs model risk governance proper. The error is to design one set of controls and apply it everywhere.

The AI Project Canvas: a one-page artefact that forces the team to declare, before code is written, what the model will do, who will check it, what the auditable trail looks like, and what happens when it gets the answer wrong. It is not novel as a document type — every regulator has something analogous — but the discipline of completing it as the first design step, not the last compliance step, changes which projects start at all.

Supervisory alignment: design controls that match what supervisors are already looking for, in the language they already use. The PRA, the FCA, the ECB and the OCC each have published expectations on model risk that pre-date generative AI by a decade. The controls for an LLM-based advisory tool should map cleanly onto SS1/23 or SR 11-7 vocabulary. If they do not, the gap between what the team thinks it has built and what the supervisor expects to see is the gap the audit will land in.