For QA and software testing teams inside banks, the biggest risk in AI-driven testing is not whether the technology is exciting.
The risk is whether it is dependable enough to be trusted inside critical systems where errors are not theoretical, but operational, regulatory, and financial.
In financial services, testing failures do not stay contained in development environments. They surface in production incidents, broken customer journeys, compliance breaches, and resilience gaps.
That is why AI in testing is increasingly being judged not by novelty, but by whether it can deliver stability, predictability, and audit-worthy outcomes.
Leapwork co-founder Claus Topholt has been blunt about the current limits. “The technology is a forward-facing future thing. It’s not in a state today where we can put it into our stable category and say, this is the stuff that you can rely on,” he said.
The concern is not abstract. Topholt warned that even state-of-the-art large language models still introduce randomness and inconsistency that QA teams cannot afford at scale.
“In normal situations, the precision and accuracy and robustness of large language models to solve difficult problems decrease from 90% to 65, maybe 70%,” Topholt added.
For regulated firms running thousands of automated tests every week, that gap is not just an engineering annoyance. It becomes a risk multiplier.
“We basically write down what we want and then we let the LLM go do it. And we do that ten thousand times,” Topholt explained. “So with such the current state-of-the-art model, he stressed this would lead to thousands of random errors.”
Only further down the line does the wider industry data begin to confirm what many QA leaders already experience daily: enthusiasm is real, but confidence remains fragile.
AI in testing
Leapwork this week announced the results of research examining how software teams approach AI in testing and what determines confidence in its use.
The company said the findings showed “broad optimism about AI’s role in testing,” but also a clear warning for teams responsible for quality across critical systems: confidence depends on accuracy, reliability, and the ability to keep tests current as applications change.

The study gathered responses from more than 300 software engineers, QA leaders, and IT decision-makers at large and mid-size organizations worldwide. Leapwork said most financial organisations now view AI as a priority for their future testing strategy, but the practical constraints remain rooted in stability, trust, and manual workload.
“It is no longer a question of whether testing teams will leverage agentic capabilities in their work,” said Kenneth Ziegler, CEO of Leapwork. “The question is how confidently and predictably they can rely on it.”
That tension between ambition and reliability is emerging as one of the defining themes for QA teams in banking and financial services.
AI is increasingly positioned as the next layer of automation maturity, but many teams are discovering that automation itself has long suffered from a deeper trust deficit.
Leapwork engineering leader Rohit Raghuvansi argued that the problem is rarely that teams are not trying hard enough.
“For QA and software engineering teams, test automation is a little like new year’s resolutions,” Raghuvansi pointed out. “It’s easy to set lofty goals, and even to pursue them nominally. But it’s much harder to achieve meaningful results.”
In banking environments, where quality assurance functions as a safeguard for systemic resilience, surface-level progress does not always translate into delivery confidence.
Raghuvansi observed that teams often define success in narrow terms. “QA engineers might say, for example, that they want to automate a certain percentage of their tests, and they might even make nominal progress toward that goal by writing more test cases or executing automated tests more frequently.”
But the deeper issue, he argued, is that activity does not always equal value. “When you drill down into what’s actually happening, it’s often the case that the nominal gains translate to very little in the way of true value creation,” he explained.
The Leapwork study results echo that same concern. While 88% of respondents said AI is a priority for their organisation’s future testing strategy, and 80% said AI will have a positive impact over the next two years, only a small minority have embedded AI deeply into key workflows today.
“Test automation is like new year’s resolutions. It’s easy to set lofty goals, but much harder to achieve meaningful results.”
– Rohit Raghuvansi
Leapwork said 65% of respondents currently use or explore AI across one or more testing activities, but only 12.6% use AI across key test workflows.
The gap is being driven by what matters most in financial services QA: accuracy and stability. Leapwork found that 54% cited concerns about accuracy and quality as factors that hold back broader use of AI in testing.
Topholt has framed that concern in stark operational terms. “Which is not an unreasonable amount of test runs over the course of a week or over the course of a month in a real enterprise setup,” he said, describing enterprise-scale testing volumes.
Yet the current reliability curve still produces noise. “That’s kind of problematic because most of those errors would show up in random places and it would clog up your ability to do kind of bug hunting,” he noted.
More damaging still is the possibility of silent failure. “But maybe the worst part of it is that some of them would be false positives,” Topholt said.
“That the large language model implementation of the test case would just basically go and say, yes, everything’s fine, but it ended up on a weird sidetrack that it wasn’t supposed to end up on.”
The risk of false confidence
In banking, false confidence is often more dangerous than visible defects. A failed test that breaks loudly can be fixed. A test that passes incorrectly can mask risk until it reaches production.
That is why Topholt stressed the accuracy threshold financial services teams should demand. “Right now where we want to be is somewhere above ninety-nine percent accuracy,” he said.
Leapwork’s findings suggest many teams are still far from that level of trust. Test fragility, difficulty automating flows across systems, and the time required to update tests ranked as top reasons teams struggle to automate more testing.
Nearly half of respondents said it takes three days or more to update tests after a change in a critical system.
For banks operating in continuous delivery environments, three days is not a small lag. It is a widening exposure window where test suites fall behind application reality.
Manual effort remains another structural constraint. Leapwork said that on average, only 41% of testing is automated today.
Teams also reported that test creation remains the biggest bottleneck. Leapwork found 71% said test creation slows their teams down the most, followed by test maintenance at 56%.
These burdens help explain why AI is attractive, but also why its adoption is cautious. AI promises acceleration, but only if it can reduce effort without introducing instability.
“Our research shows teams want AI to help them move faster, expand coverage, and reduce effort,” Ziegler said, “but accuracy remains table stakes.”

Raghuvansi warned that automation without trust leads directly back to manual fallback. “They don’t trust automated tests to be reliable and so, even if the tests suggest that code is bug-free and ready for release, no one is actually confident pressing the ‘go’ button,” he said.
That dynamic can erase automation’s value entirely.
“This can also lead to scenarios where engineers end up testing everything manually, even if they already tested it automatically, because they won’t trust test results until they obtain them by hand,” Raghuvansi explained.
“At this point, test automation has achieved no real value at all,” he stressed.
In regulated financial environments, this is not just inefficiency. It is a governance problem. QA leaders must demonstrate not only that testing is happening, but that results are meaningful, repeatable, and defensible under scrutiny.
Raghuvansi believes the root causes often extend beyond tooling. “On the surface, it can be tempting to blame technical factors alone,” he said, adding that “but the root causes of test automation shortcomings usually boil down to cultural and organizational challenges at least as much as technical barriers.”
He pointed to “a heavy reliance on ‘heroes’ to drive test automation strategies” and “the expectation that test automation will result in fast, easy wins” as patterns that undermine sustainable progress.
“Increasing automation for its own sake is of no value if it doesn’t improve overall outcomes,” Raghuvansi argued.
That is also shaping how AI is being positioned inside mature QA organizations. Raghuvansi said teams are “much likelier to trust AI-driven automated tests when the role of the tests is to reduce toil, not remove humans from feedback loops.”
Used carefully, “AI helps to reduce noise and effort, but without undercutting confidence in automated testing because humans remain in charge,” he remarked.
Topholt, too, has emphasised that testing AI systems still comes back to the same fundamentals. “You have input that you need answers to and it has to be robust in the way it responds,” he said.
Leapwork said the opportunity now lies in pairing AI capabilities with strong automation foundations, allowing teams to scale testing with confidence as systems evolve.
“The real opportunity lies in applying and integrating AI alongside stable automation, so teams gain speed and scale without sacrificing trust in outcomes,” Ziegler concluded.
COMING IN 2026


Why not become a QA Financial subscriber?
It’s entirely FREE
* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *
REGULATION & COMPLIANCE
Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.
READ MORE
- Why real-time monitoring and scenario testing are becoming core QA disciplines
- BankDhofar takes an automated approach to strengthen QA
- Banks warned AI still fails on real-world software testing tasks
- SEC’s AI emphasis drives new QA and testing imperatives for US banks
- Inside the chaos: The new reliability discipline reshaping banking QA
WATCH NOW

QA FINANCIAL PODCASTS

