
UK financial regulators are moving AI testing and cyber assurance further up the agenda after reports that officials from the Bank of England, the Financial Conduct Authority and HM Treasury are in urgent talks with the National Cyber Security Centre and a number of banks over major IT and software risks exposed by a new AI model.
The discussions are reportedly focused on potential vulnerabilities in critical financial systems highlighted by the latest release of U.S. AI company Anthropic, with representatives from major British banks, insurers and exchanges expected to be briefed in the next two weeks.
Anthropic’s new model has reportedly showed AI is now powerful enough to break existing testing, security, and regulatory controls.
Anthropic is an AI company that builds advanced language models, with Claude Mythos Preview being its latest model, developed under Project Glasswing, a controlled programme that allows organisations, including many banks, insurance companies and other financial services firm, to use the system to test and detect vulnerabilities and strengthen cybersecurity in critical software and infrastructure, basically as a live stress test of advanced AI models.
The regulatory developments in the UK follow a meeting between U.S. Treasury Secretary Scott Bessent and major Wall Street banks, last week, to discuss the model’s risk potential.
Many executives, including JPMorgan’s Jamie Dimon, Bank of America’s Brian Moynihan, Citigroup’s Jane Fraser, Goldman Sachs CEO David Solomon, Morgan Stanley’s Ted Pick and Wells Fargo CEO Charlie Scharf, attended the meeting, as per a Bloomberg report.
Joined-up assurance approach
Anthropic stressed that its new model had already identified “thousands of major vulnerabilities across operating systems, web browsers and other widely used software,” which largely explains why UK regulators appear to be treating the issue as a sector-wide resilience concern rather than a niche AI story.

Because that is where the significance lies for QA and software testing teams. It is not just that regulators are reacting to a powerful new model that exposes IT risks, but that UK supervisors are increasingly treating AI, cyber resilience, operational risk and third-party oversight as one joined-up assurance problem.
That direction has already been visible in the FCA’s AI Live Testing work and in the Bank of England’s growing focus on model governance, cyber recovery and evidence-based resilience.
Anthropic’s model, Claude Mythos Preview, is relevant because it is not being framed as merely being a general chatbot or productivity assistant.
Anthropic itself describes it as ‘a new general-purpose language model’ that is ‘strikingly capable at computer security tasks’, thereby scrutinising security elements, gaps and flaws throughout a bank’s digital infrastructure.
The firm claims the solution is able “to secure the world’s most critical software” as the model is used for for local vulnerability detection, black box testing of binaries, securing endpoints, and penetration testing of systems.
That matters directly to financial services QA because the kinds of systems the model is designed to probe are the same ones banks depend on every day: operating systems, browsers, endpoints, connected platforms and shared infrastructure.
Immediate trigger
The immediate trigger for the UK meeting is because regulators believe the new AI model has exposed and laid bare a host of direct vulnerabilities in critical IT systems.
Anthropic, for its part, has tried to limit access. as the company said the model is merely being deployed as part of “Project Glasswing”, a controlled initiative under which companies are permitted to use the unreleased Claude Mythos Preview model for defensive cyber security purposes.
That controlled release itself is now part of the story. A model that can accelerate vulnerability discovery may improve defence systems of complex banking software, but it also compresses the time available to test, patch, evidence and govern weaknesses before they become exploitable.
That is one of the concerns of the UK regulators, because, for QA teams, that changes the practical meaning of AI assurance. The question is no longer just whether a model is accurate, safe or explainable in isolation.
It is whether firms can test how AI-enabled tools behave against live infrastructure, across dependencies, under time pressure, and with enough documentation to satisfy both security and supervisory scrutiny. Regulators are not convinced.
Real-life environments
That broader focus on real-world enviroments has been building for months in the UK market. In its AI Live Testing initiative, the FCA has explicitly argued that firms need to validate AI in real-world conditions rather than leave systems stuck in pilots.
As Ed Towers, head of department in the FCA’s advanced analytics and data science unit, put it: “We’re providing a structured but flexible space where firms can test AI-driven services in real-world conditions, all with our regulatory support and oversight and help from our technical partner, Advai.”

Towers stressed that “through live testing we want to help UK innovators move safely beyond ‘POC paralysis’, or what is often described as ‘perpetual pilots’.”
That line now looks even more relevant in light of the Anthropic discussions. If regulators are worried that a new model can expose critical weaknesses across core IT systems, then banks will be under greater pressure to show they are not relying on paper controls or theoretical reviews. They will need evidence from testing that reflects production reality.
The FCA has also been unusually clear that firms should not treat AI as just a model.
Towers said: “We broadly define the AI system as: the actual AI model, information on the deployment context and core risks … governance and human in the loop considerations, evaluation techniques as well as the input and output controls.”
That is a crucial distinction for banks using advanced models in security, surveillance, fraud, compliance or operations. Once the system includes deployment context, controls and human oversight, QA expands from model testing into full-system assurance.
The same direction has been reflected in how regulators themselves are approaching AI. The shift towards applying AI to live data and real workflows has been described as “moving into production-style validation of AI systems, applying them directly to live intelligence workflows in a controlled test environment.”
In that context, AI is no longer something validated in isolation. It is being tested as a full system under real-world conditions, including governance, controls and human oversight.
Resilience, incident reporting and third-party risk are converging

The UK supervisory backdrop also makes clear why an Anthropic cyber model would land as a resilience issue, not just an AI one.
FCA director Mark Francis has warned that “digital resilience is being tested like never before, with firms facing growing online threats and increasing reliance on third parties to deliver the essential financial services consumers rely on.”
He added that new rules “give firms clearer rules and practical guidance to better manage disruption, while supporting our ambition to be a smarter regulator, giving us better data to spot risks, share insights and strengthen sector wide resilience.”
Incident reporting is increasingly being treated as a testable capability rather than a compliance afterthought, with banks needing to prove that detection systems, escalation workflows and reporting pipelines work reliably under real conditions.
More than 40 percent of reported incidents in 2025 were linked to third-party providers, pushing QA further into supplier controls, shared infrastructure and external dependencies.

Michael Murphy, deputy CTO at Arqit, explained that “as banks rely more heavily on third party providers, resilience is no longer just about protecting internal systems, it extends across a much wider and often more complex digital supply chain.”
He added: “If a growing share of incidents originate outside a firm’s direct control, then reporting alone can only go so far.”
Those dynamics are directly relevant to the Anthropic model. A system designed to uncover vulnerabilities across widely used software increases the pressure on firms to demonstrate control not only over their own systems, but across the full ecosystem they depend on.
Bank of England policy
The Bank of England and its Prudential Regulation Authority have also been stepping up engagement with firms on AI governance, model risk and testing.
Discussions with industry participants have focused on how organisations are applying validation, explainability and oversight as AI becomes embedded into core functions.
At the same time, regulators have made clear that resilience must be demonstrated in practice. “Cyber-attacks remain a major threat to the financial sector,” the Bank said recently, adding that resilience “can no longer be assumed, it must be proven.”
Supervisory expectations now require resilience to be “continuously tested, measured and evidenced,” with firms expected to validate how systems behave under stress and produce documentation that can withstand regulatory scrutiny.
For QA and software testing teams, that extends the scope of assurance beyond traditional software quality into model performance, system behaviour and governance.
It also introduces a need to validate how systems respond to evolving threats, including those surfaced by advanced AI tools.

For banks, insurers and market infrastructure providers, the Anthropic episode sharpens several existing QA priorities at once. AI testing has to include adversarial and resilience thinking, not just model quality metrics.
Cyber testing and AI governance are no longer separable, especially when models are being used to discover or reason about vulnerabilities.
Firms need evidence that controls work across the whole stack, including third-party services, operational workflows and reporting processes.
As the FCA has emphasised: “We focus on both quantitative and qualitative factors to get a truly holistic understanding of the AI system.”
That is a useful shorthand for what financial services QA is now being asked to deliver: not just whether something works, but whether it behaves safely, predictably and defensibly in real-world conditions.
The Anthropic model is relevant because it accelerates the discovery side of the problem. If vulnerability finding becomes faster and more scalable, then regression testing, patch validation, control testing, incident escalation, audit trails and third-party verification all have to move faster too.
In practice, that means QA teams in financial services are moving closer to the centre of cyber resilience, model governance and operational assurance.
The UK regulators’ reported meeting with banks is therefore about more than one powerful model. It fits a wider supervisory pattern: test AI in real conditions, treat the whole system as in scope, and prove resilience with evidence rather than assumptions.
When approached by QA Financial, the FCA declined to comment “at this stage”, while no one at Anthropic was available to comment.
QA FINANCIAL EVENTS


Why not become a QA Financial subscriber?
It’s entirely FREE
* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *
REGULATION & COMPLIANCE
Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.
READ MORE
- Inside banking’s shift to smarter QA to tackle complexity and risk
- SmartBear CPTO on AI in banking QA: ‘Impressive metrics but no critical scenarios’
- Banks push beyond traditional QA as resilience testing gains ground
- Banking QA professionals warn AI still doesn’t know ‘where the bodies are buried’
- RECAP: The QA Financial Healthcare & Insurance Forum Philadelphia 2026
WATCH NOW

QA FINANCIAL PODCASTS

CLICK HERE TO LISTEN TO OUR EXCLUSIVE CONVERSATIONS

