Financial institutions are being warned not to treat large language models as isolated productivity tools, but as part of a wider governed operating architecture that must be continuously tested, monitored and controlled.
In a new paper exploring AI governance across banking, asset management and enterprise risk management, Joshua Shatz, Lead Compliance Officer at Wells Fargo, argues that the future of AI adoption in financial services will depend less on deploying the most powerful models and more on building disciplined orchestration frameworks around them.
“The financial services industry should not think of large language models as a single tool,” Shatz wrote in a public analysis on LinkedIn. “It should think of them as a coordinated decision-support architecture.”
For QA, software testing and quality engineering teams inside banks and financial institutions, the paper reinforces how AI adoption is rapidly becoming a testing, governance and operational resilience challenge rather than simply an innovation exercise.
Shatz stated that “the larger opportunity is to build an AI operating layer that connects data, policies, models, controls, workflows, people, and evidence,” adding that the “winning financial institution will not be the one that adopts the most powerful model first. It will be the one that builds the most disciplined orchestration system.”
‘LLM symphony’ architecture
Central to the paper is the idea of an “LLM symphony architecture”, where multiple specialised models, deterministic systems, retrieval tools, approval workflows and monitoring dashboards operate together under strict governance controls.
“The real question is whether the institution can use AI in a way that is accurate, controlled, explainable, auditable, compliant, and economically valuable,” Shatz explained.

Rather than relying on autonomous AI systems, the paper recommends tightly controlled workflows where AI outputs are grounded in approved enterprise data sources and subject to human review, validation and audit logging.
“The central thesis is simple: Financial institutions should not deploy one autonomous LLM. They should deploy a governed portfolio of specialised models and workflows,” he stated.
For software testing teams, the architecture resembles a continuous assurance model, where AI systems, workflows and integrations must be validated across multiple layers simultaneously.
The paper repeatedly stressed that AI systems in banking should operate inside “controlled architecture where source data, tools, permissions, approvals, and outputs are logged.”
AI testing frameworks
The report places heavy emphasis on AI testing disciplines that increasingly mirror traditional software assurance and model validation practices already familiar to QA teams inside banks.
Shatz outlined a dedicated “LLM Testing Framework for Financial Institutions” covering hallucination testing, bias testing, stability testing, explainability testing, data-leakage testing, security testing and human override testing.
The paper asked whether institutions can consistently explain “why the output was generated and what data supported it,” while also measuring “how often do humans reject, revise, or override the output?”
“Financial institutions should not deploy one autonomous LLM. They should deploy a governed portfolio of specialised models and workflows.”
– Joshua Shatz
Among the proposed controls are confidence scoring, exception queues, approval routing, version tracking and continuous monitoring dashboards.
“A mature institution should be able to answer: ‘Which LLMs are being used, by whom, for what decisions, with what data, under what controls, and with what error rate?’” Shatz wrote.
The paper also warned that financial institutions face a fundamentally different AI risk environment than most industries because of “regulatory scrutiny, financial materiality, fiduciary, conduct, and model-risk obligations.”
“A bank cannot allow an LLM to hallucinate a credit memo,” Shatz wrote. “A compliance function cannot permit customer-impacting decisions without evidence, traceability, and escalation.”
Human oversight
The paper repeatedly argues against fully autonomous AI decision-making in core banking workflows, particularly in high-risk areas such as lending, portfolio management and compliance.
“The better model is ‘LLM proposes, human disposes,’” Shatz wrote.
The report identifies most near-term banking value in what it calls “Levels 2 and 3” AI deployments, including analyst copilots, complaint classification, credit memo drafting, AML narrative support, portfolio commentary generation and operational-risk summarisation.
Under the proposed framework, higher-risk autonomous functions would require substantially greater validation, governance and monitoring controls.
“Most near-term value sits in Levels 2 and 3,” the paper notes, while warning that autonomous decision engines should “generally avoid unless heavily governed, validated, monitored, and legally permissible.”
“The future financial institution will not replace risk management with AI.”
– Joshua Shatz
For QA and software testing teams, this increasingly pushes AI assurance into the centre of enterprise risk management, requiring coordination between model risk, operational resilience, compliance, cyber security and software engineering functions.
The paper concludes that financial institutions will ultimately compete not on raw AI adoption speed, but on governance maturity.
“The future financial institution will not replace risk management with AI,” Shatz stated. “It will use AI to make risk management faster, more consistent, more transparent, and more evidence-based.”
He added: “The institutions that succeed will share one trait: they will not merely adopt LLMs. They will govern them.”
THIS WEEK

WHY not become a QA Financial subscriber?
It’s entirely FREE
* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *
READ MORE
- Trust, not speed: Why AI governance is now a testing battleground for banks
- NatWest’s AI trade finance overhaul opens new chapter for QA teams
- Banking UAT moves beyond sign-off as QA takes centre stage in system rollouts
- Citi ramps up AI-driven testing in race to modernise legacy systems
- Lloyds, HSBC and NatWest get OpenAI access amid mounting concerns
WATCH NOW

QA FINANCIAL PODCASTS

CLICK HERE TO LISTEN TO OUR EXCLUSIVE CONVERSATIONS

