Artificial intelligence did not just creep into quality assurance in 2025, it arrived at full speed, dragging QA and testing teams into the centre of strategy conversations at banks and financial-services firms.
Across QA Financial’s reporting this year, a clear storyline emerged: generative AI and agentic AI moved from pilot experiments to production planning, while regulators, vendors and practitioners warned that the risks of moving too fast without robust testing are mounting.
For teams in software testing, the shift was impossible to ignore. AI is no longer something happening around QA. It is redefining what quality means, which engineering skills matter, and how assurance must be evidenced in an era where models, not just code, can break a bank.
Our December QA Financial year-in-review series, published earlier this month, summed up the sentiment circulating through banks and vendors alike: 2025 was the year in which AI became “the central force in QA transformation”.
That shift is visible in day-to-day engineering. Natural-language interfaces are already converting business requirements directly into test scenarios, catching inconsistencies and gaps early.
One senior QA leader explained earlier this year that natural language processing is now “being used to convert business requirements into test scenarios, catching inconsistencies and gaps early,” signalling a move from manual execution to orchestration and oversight of AI-augmented pipelines.
This is the heart of the change. Instead of writing tests from scratch, QA teams are supervising intelligent systems that produce them, steering behaviour, policing output and validating reasoning. It is no longer enough to confirm what software does; testers must increasingly understand how and why intelligent systems make decisions.
Data quality
If AI is now embedded in QA, the question running through QA Financial’s 2025 coverage was whether the industry can test these systems as rigorously as regulators, boards and customers expect.
That question came into focus when Celent published data showing that 54% of surveyed institutions expect to have agentic AI in production next year, despite only 26% having it live today. The acceleration is significant, and so are the foundations required to support it.

Janey Speed, Capital Markets Analyst at Celent, argued that the bottleneck is no longer compute power or model capability, but the basic integrity of training data.
“For AI use cases to not only be effective but also robust firms must ensure that they are training their models with high-quality data,” she said, warning that with the internet full of AI-generated noise, newer models “run the risk of being trained on bad data and poor responses.”
The implication for QA is immediate: test data engineering has become a quality gate, not an operational detail.
Her Celent colleague, Principal Analyst Alenka Grealish, described the commercial pressure behind this adoption. In her view: “GenAI and specifically agentic AI represent the next milestone in digitizing trade finance,” driven by the need to reduce processing delays and uncover fraud more effectively.
But she cautioned that banks cannot rely on the market to mature before engaging.
“While it is early days for GenAI and agentic AI applications,” Grealish explained. “It is not too early for trade finance banks to begin exploring trade finance use cases.”
The logic is clear: if financial institutions wait until systems are stable, they will already be behind.
Hallucinations and the new QA workload
As investment picks up, so do the consequences of getting things wrong. Testlio’s decision to launch an AI testing platform in 2025 was driven by a spike in hallucination-related failures across banking.
Summer Weisberg, COO and Interim CEO at Testlio (pictured at the top) noted that “trust, quality, and reliability of AI-powered applications rely on both technology and people.”
A review of early deployments revealed that 82% of the AI issues Testlio saw involved hallucinations or misinformation, particularly in chatbot and retrieval-augmented generation systems. Most of those bugs, close to 79%, were classified medium or high severity
The issue is not just that systems fail; it is that they fail with confidence.
“Testing AI systems demands a new level of sophistication,” stressed Testlio co-founder Kristel Kruustük, who argued that evaluating systems now means going beyond bug detection “to evaluate fairness, reasoning, and trust.”
The linkage between Celent’s data concerns and Testlio’s accuracy warnings is what defines the 2025 QA landscape: quality in AI workloads is now inseparable from judgement.
“If you can’t validate it, you probably shouldn’t automate it.”
– Dan Shimmerman
If generative AI dominated 2023 and 2024, agentic AI was the phenomenon QA teams had to confront in 2025. These systems plan, reason and execute tasks with autonomy rather than follow deterministic scripts, a leap in capability matched by a leap in risk.
Hugo Farinha, CTO and co-founder of Virtuoso QA, described agentic AI as “on track, but yet to pull into the station,” calling it “the missing link” that can combine different AI types and automation to tackle judgement-heavy processes. It is a vision that has shifted from theory to engineering roadmap.
Industry voices recently discussed that shift, when Blueprint CEO Dan Shimmerman moderated a panel with Douglas Heintzman (CEO of Syncura), Dr. Jenya Doudareva (AI Governance Lead, Canada Life) and GenAI strategist Dr. Pramila Nathan. Their collective view was that the opportunity is immense, and so is the room for error.

Dr. Nathan illustrated the complexity with a real-world scenario: a network of HR agents that inform one another’s insights. In her words, “agentic AI isn’t just about automation anymore, it’s about collaboration… testing them requires a whole new level of scrutiny.” Autonomy, she argued, multiplies QA responsibility rather than reduces it.
Heintzman echoed that concern, noting that “there’s a temptation to throw agentic AI at everything,” even though “many of these models still hallucinate logic chains or fabricate reasoning.”
In financial services, where decisions must be explainable, that kind of failure is not a bug, it is a governance breach.
Dr. Doudareva put it in operational terms: “If your data pipelines are weak or your business rules are poorly defined, agentic AI will only expose those flaws faster, and on a larger scale.”
Her point ties back to the Celent research: the road to agentic AI is paved not with novel algorithms, but with good data and stable rules.
The discussion closed with Shimmerman offering what became possibly the unofficial rule of 2025 QA: “If you can’t validate it, you probably shouldn’t automate it.”
‘Gold rush without a map’
And yet, the market pressure to adopt continues. Sauce Labs’ annual survey put numbers to the tension. Nearly 95% of companies reported setbacks in AI projects this year, even as investment increased and leadership expectations rose.

CEO Prince Kohli called out the disconnect: “The next great challenge isn’t building more powerful AI, it’s creating quality and testing frameworks to control it.”
With 61% of participants saying their leadership does not understand software testing fundamentals, Kohli warned that “this leadership blind spot results in teams being tasked with implementing powerful AI without proper support.”
The survey also found that 72% of respondents believe agentic AI will be capable of fully autonomous testing by 2027, but 60% admit they have yet to establish accuracy benchmarks.
Most leaders prefer a hybrid testing model, where humans supervise AI. And 80% believe companies should disclose when software has been tested only by AI. These numbers indicate a maturing understanding: quick adoption without assurance is no longer considered acceptable.
Where regulation and QA finally met
The QA Financial Forum in London made it clear that 2025 was the year regulators began talking about AI in the language of quality assurance.
Santosh Pandit, Senior Regulator at the Bank of England’s Prudential Regulation Authority, opened the event by stating that “software is the single most important risk that businesses will need to manage in the future,” setting the tone for the day.
At Nasdaq, Senior Director of Product Framework & Quality Engineering Sudeepta Guchhait explained how proprietary constraints require synthetic testing environments: “Believe me, we have not achieved 100% coverage… our customers will never give their data to us, so we generate our own.”
Nationwide’s Head of Testing Practice Tim Gould highlighted the fragility of speed: “AI looks like an accelerant until it becomes an accelerant for a fire.”
And Amdocs Quality Engineering’s Limor Gueta offered a path forward: “Only once we have enough trust in it, we will move it to the agentic experience.”

The pattern held across borders. Under DORA, the EU AI Act, CPS 230 in Australia and the European Accessibility Act, quality assurance is now defined as part of compliance, not a process that happens before deployment, but a continuous obligation.
Jens Kunz, partner at Noerr, described this moment as a “fundamental change” in how IT penetration testing is being approached. Infinity Tech Consulting’s Paul Mowat argued that AI systems must be validated “across functional, performance, security, and stress layers,” with the additional burden of proving algorithmic integrity.
And KPMG UK partner Daryl Elfield made the consequence explicit: if an AI system falls under a high-risk classification, “regular testing must take place to ensure accuracy, reliability, and security.”

Moreover, Jessica Rusu, Chief Data, Information and Intelligence Officer at the UK’s Financial Conduct Authority, signalled the next phase: cooperation across borders.
Regulators, she said, will be “championing safe and responsible AI innovation across UK and Singapore markets,” setting expectations for shared standards in model validation.
In summary, GenAI and agentic AI are no longer experimental technologies; they are already embedded in critical banking workflows. The innovation is real. So is the fallout.
This year showed that the winners in AI will not be the institutions that deploy the most models or agents. They will be the institutions whose QA teams can trace inputs to outcomes, prove that models behave predictably, demonstrate explainability to regulators, and establish trust before scale.
2025 began like a gold rush. It ended like a governance agenda. Many industry insiders agree that 2026 will belong to the organisations whose QA and testing teams can turn ambition into assurance, not by resisting autonomy, but by validating it.
COMING IN 2026


Why not become a QA Financial subscriber?
It’s entirely FREE
* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *
REGULATION & COMPLIANCE
Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.
READ MORE
- Leapwork engineering head: Why test automation so often fails to deliver
- World Economic Forum warns financial sector must strengthen AI risk controls
- How Banca Progetto is hard-wiring quality into Italy’s digital banking space
- How Wealthsimple builds quality into the product, not around it
- Digital revamp puts spotlight on internal controls at China Construction Bank
WATCH NOW

QA FINANCIAL PODCASTS

Listen to Sudeepta Guchhait on Nasdaq’s new Mimic AI testing platform
QA Financial sits down with Sudeepta Guchhait, Senior Director of Product Framework & Quality Engineering at Nasdaq
——–
Listen to Wesley Scheffel and Robin Rain on Schroders’ DevOps strategy
We catch up with Wesley Scheffel, Head of Cloud Platform and Product Engineering at Schroders, and Robin Rain, Head of Cloud Platform Architecture
——–
Listen to Citi’s Jason Morris on Lightspeed and the future of continuous delivery
Jason Morris, Head of Developer Pipelines for Securities Markets and Banking at Citi, talks about Lightspeed
