David Colwell, VP, AI & Machine Learning at Tricentis
As banks and financial services firms accelerate the use of generative AI to speed up software delivery, quality assurance teams are facing a widening gap between development velocity and operational resilience.
Industry leaders warn that while AI-generated code is helping teams move faster, it is also amplifying testing blind spots at a time when regulators are scrutinising outages, change management and digital resilience more closely than ever.
That tension has been a recurring theme in recent warnings from Tricentis leadership, including concerns that some firms are now releasing code with insufficient testing and risking production failures.
Against that backdrop, David Colwell, VP of AI & ML at Tricentis, says generative AI is reshaping not just how software is written, but how quality must be governed inside complex, regulated environments.
“With generative AI now promised to make software development faster,” Colwell stated. “It has, but it has also introduced an unexpected new problem: productivity gains on paper, followed by very real slowdowns as engineers and testers clean up after AI-generated mistakes.”
For banks under pressure to modernise legacy platforms, migrate to the cloud and comply with regulations such as DORA, the cost of those slowdowns can be significant.
“If AI writes code like a teenager, then testers need to be the adults in the room.”
– David Colwell
Tricentis executives have previously warned that an imbalance between speed and resilience is emerging across banking QA, with outages increasingly traced back to poorly tested code changes rather than infrastructure failures. Colwell’s analysis adds colour to why that imbalance is growing.
“I often describe AI as writing code like a teenager,” Colwell said. “AI is impatient; it wants to reach an answer quickly. It’s overconfident, even when it’s wrong. It will happily make things up if it thinks that’s what you want. And yes, sometimes it even puts emojis into production code.”
AI coding tools are now pervasive across large organisations, including financial institutions. “Most large organisations are already using them in some form, whether officially sanctioned or not,” he noted.
While teams may see faster output, he pointed out that true delivery metrics tell a different story. “When you measure real productivity, how long it takes to get a change safely into production, for example, the results are mixed.”
That safety dimension is increasingly critical as banks face regulatory expectations to demonstrate control over software change, testing coverage and third-party risk.
Compliance hurdles
Recent case studies, including Tricentis’ work with Nordic banks, have shown how automated testing is being used to support DORA compliance by embedding traceability, risk-based testing and continuous validation into delivery pipelines.
Colwell argued that generative AI complicates that picture because it behaves fundamentally differently from human developers.
“AI doesn’t understand your system the way a human does,” he continued. “It doesn’t remember that outage you had last year. It doesn’t know why a particular edge case is radioactive.”
Instead, “it works within a limited context window and fills the gaps with plausible-sounding guesses.”
For QA teams, that creates new classes of defects that traditional testing was never designed to catch.
“The problem is that modern testing practices were built around human behaviour,” Colwell explained. “Humans are lazy in predictable ways… AI does” invent new logic. In regulated financial systems, those inventions can quickly become production risks.
Colwell described a real incident where an AI tool quietly introduced a business rule that no one had asked for.
“The generated code quietly invented a new rule for people aged 43, based on a regulation that it found on the Internet,” he said.
“No one asked for it, no one reviewed it and the tests didn’t catch it because no one thought to look there. That’s the danger.”
Automation push
The implications are particularly stark as firms adopt more automation across core banking, payments and ERP environments.
Tricentis claimed an industry first last year with an autonomous ERP testing platform, signalling a broader move toward agentic systems that can generate and execute tests at scale. Colwell cautioned that autonomy must be balanced with visibility and control.
“AI doesn’t just make mistakes at the boundaries we expect,” he said. “It creates entirely new boundaries.”
In some cases, “if tests fail, it may skip them,” or “rewrite the function to return the expected value instead.” From a quality standpoint, he adds, “that’s catastrophic.”
Those risks reinforce why Tricentis has been investing in what it describes as Quality Intelligence, combining AI-driven analysis with human oversight.
The appointment of Sealights co-founder Eran Sher to lead AI vision at Tricentis reflects a strategic focus on linking code changes, test coverage and risk exposure, an approach aimed squarely at large, regulated enterprises.
Colwell stressed that this shift fundamentally changes the tester’s role.
“If AI writes code like a teenager, then testers need to be the adults in the room,” he said. That means moving beyond late-stage defect detection.
“Testers are no longer just finding bugs after the fact; they are preventing defects before they propagate.”
“Modern testing practices were built around human behaviour [and] humans are lazy in predictable ways.”
– David Colwell
Colwell urged testers to actively use AI tools themselves. “Ask an AI assistant to explain a pull request in plain language. Ask why a particular condition exists,” he said.
This, he stressed, “isn’t about turning testers into full-time developers; it’s about giving them leverage.”
In practice, “AI-assisted code reviews caught logic that had passed human peer review but made no sense in the broader system context.”
Next phase
Looking ahead, Colwell sees agentic AI as inevitable but warns against unchecked autonomy. “Autonomy without oversight is a mistake,” he shared.
“Fully autonomous agents… can move very fast in the wrong direction.” Instead, “the future is not testers serving AI agents; it’s testers leading them.”
That philosophy aligns with broader industry moves as QA demand grows globally, including Tricentis’ expansion into Latin America and increased investment in cloud-based test data capabilities.
As financial institutions scale testing across regions and platforms, the ability to analyse change, assess risk and explain outcomes is becoming as important as raw automation.
“Adopting this approach starts with fundamentals, not tools,” Colwell concluded.
“If your processes are unclear, AI will scale the chaos.” As AI-generated code becomes unavoidable, he stated, quality teams must ensure “the right adults in the room who know when to trust the machine, when to question it and when to step in.”
Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.