Banking on AI: why QA teams in finance are in dire need of new skills

Joe Levy, the chief executive officer of Uplevel

Banks are racing to bolt AI onto everything from test automation to release pipelines. Speed is the story. But the real story is measurement, and most firms are measuring the wrong things. That’s why so many AI pilots feel impressive in demos and disappointing in production. The output is up; the outcomes are not.

“The potential of AI to deliver customer value is far greater than just code generation,” stressed Uplevel CEO Joe Levy, based in Seattle.

“Leaders should consider new use cases that clarify customer needs and automate time-consuming tasks like reviews, deployments and testing. That’s where AI begins to deliver real customer value, in the outcomes, not just the code.”

A new survey of 101 engineering leaders lays out the contradiction. Nearly nine in ten say they’re ready for AI. Almost all admit they lack the capabilities to manage it. They talk about business impact and track developer activity.

Most know cross-team coordination and architecture are the real bottlenecks, while they still optimise for lines of code, pull requests and story points. In banking, where resilience and compliance rule, that mismatch is not a quirk. It’s a risk.

Measure what matters

The report’s through-line is blunt: teams are using AI while judging it with pre-AI yardsticks. That skews incentives towards visible, individual productivity and away from the messy middle where delivery succeeds or fails: handoffs, integration, documentation, approvals, controls. In large institutions, that “middle” is where operational risk lives.

Leaders know this, at least instinctively. When asked what throttles delivery speed, they point to cross-team dependencies, technical complexity and unclear requirements.

Yet the dashboards stay fixed on individual output. The result is productivity theatre. Code moves faster; cycle time doesn’t. AI accelerates what’s already happening in the system. If collaboration is weak and architecture is brittle, AI just hits the gas.

Matthew Kirk, an AI/ML researcher quoted in the study, cuts to the chase: “A lot of the excitement over AI is really excitement over the idea of a hero programmer. The trouble is, people love the idea of the 10x programmer that AI could be, but then complain about the tech debt and communication issues it creates. Turns out 10x programmers can write 10x more software but also create 10x more problems.”

But banks do not need heroics, they need evidence. Did AI reduce incident rates? Shrink approval queues? Improve traceability for auditors? Tighten feedback loops between product and testing? If you can’t see those effects, you’re not measuring AI, you’re measuring motion, the report stated.

Financial services twist

Boardrooms want velocity, while regulators want assurance. Those priorities can align, but only if QA rewires how it works, and how it proves what works.

Amy Carrillo Cotten

The survey showed 30% of leaders ranking data security and privacy as their top AI concern, with 27% calling AI-driven technical debt the biggest strategic risk. That’s not conservatism; it’s context. Payment rails, core banking, trading, KYC: none of them tolerate flaky automation or opaque models, the authors wrote.

Amy Carrillo Cotten, Director of Client Transformation at Uplevel, described the ground truth: “Quality and security risks are the most important considerations on people’s minds in 2025, not proving ROI. From the top, people are focused on velocity, and there’s little dialogue about real, meaningful quality and security risks.”

In finance, “meaningful” has a precise translation: can you demonstrate that AI-enabled changes are safe, repeatable and compliant, and do it on demand? Carrillo Cotten wondered.

That requires moving beyond pass/fail thinking, she continued. It means testing not just code paths, but socio-technical paths: how a model’s suggestion moves through a pipeline; who reviews it; what approvals apply; where provenance is captured; which controls block release if drift, bias or hallucination risk spikes. Traditional QA can’t shoulder that alone. It’s a skills problem and a mandate problem.

From test cases to system cases

Leaders in the survey indicated they are prepared for AI, and then list the gaps that say otherwise. They prioritise QA and validation of AI outputs, performance monitoring and system integration. They also flag a shortage of strategy and of people who can steer change, not just tool it.

In banking, that translates into a QA talent blueprint that looks different to 2019. They need engineers who can test models as well as applications, measure drift and fairness over time, and instrument pipelines for observability at every gate.

They also need architects who can integrate AI into legacy cores without creating shadow change paths. They need leads who can turn that technical posture into regulator-ready evidence.

That’s why Levy’s second warning hits home: “Until leaders modernize their measurement frameworks, the very outcomes they hope AI will deliver may remain stubbornly out of reach. The organizations that get it right will look beyond activity metrics, tracking how AI improves teamwork, accelerates delivery, and drives business results that matter.”

Modernising measurement is the hinge. It forces new skills into scope. It also forces hiring managers to rethink the mix: fewer “framework polyglots” whose value peaks at tool adoption; more “systems translators” who can connect AI, process and business risk.

Business outcomes

Here’s the tell. Half the leaders surveyed put “achievement of business outcomes” among their top AI success metrics but only three per cent say they actually use business impact metrics to judge engineering performance. The rest fall back on developer productivity and uptime because they’re available, familiar, chart-friendly.

That gap gives QA an opening, and a responsibility, to lead.

Jim Pafford

Jim Pafford, senior VP of R&D at Alvaria, frames it in delivery terms bankers appreciate: “Focusing on team-level delivery outcomes improves engineering’s impact on business metrics. Take AI implementation: will it make the team deliver software faster, more reliably, in a well-documented way? Will it allow us to automatically create unit tests for our code? Will it enhance greenfield development by getting us 80% of the way there so we don’t spend too much time ideating?”

He added that “all of these result in increased operational efficiency and accelerating development, which roll up directly to business outcomes.”

Make that concrete for a bank: fewer false positives in fraud rules after AI-assisted test tuning; shorter regulatory sign-off cycles because evidence is generated as a by-product of delivery; lower mean time to restore because AI-generated docs keep pace with change. If your measurement framework can’t surface outcomes like those, your hiring and upskilling plan won’t either.

Rethinking quality

As all QA teams experience, AI is changing the texture of code: it’s quicker to write and, if needed, quicker to throw away. That unsettles a long-standing QA instinct: protect maintainability at all costs.

The study argued for a shift in mindset that will feel familiar to anyone who lived the move from “pets” to “cattle” in the cloud era. If AI-generated artefacts are easier to replace than to nurture, quality becomes the ability to detect risk early and swap safely, not to preserve every line indefinitely.

That does not lower the bar. It raises it just at a different layer. It demands release processes with less friction and more control points, comprehensive automated regression, and teams fluent in rapid, well-governed iteration.

“AI-generated code might follow similar patterns: optimised for speed and replaceability rather than long-term maintainability,” Carrillo Cotten noted in the report.

“But this transformation requires rethinking everything, from quality metrics to architecture decisions and development processes,” she added.

Get that wrong and technical debt climbs. Get it right and AI can actually help you pay debt down. The study quoted Pafford again to that effect: “Technical debt often results from organisational dysfunction rather than tooling choices. AI has improved our technical debt by empowering our engineers to feel more confident in changing areas of the code that they may not have known as well without AI input.”

Banking QA hiring

Based on the report’s findings a profile emerges. Banks still need seasoned automation engineers. They also need model-literate testers who can validate behaviour beyond static outputs, who understand bias and drift, and who can design adversarial tests for LLM-powered features.

They further need SRE-minded QA who treat observability and incident learnings as first-class inputs to test design, and they need compliance-fluent leads who can attach every change to an auditable chain of reasoning and evidence, without turning delivery into a paperwork mill.

Upskilling will do some of that work. Hiring will do the rest. The mix will vary, but the direction is consistent: more system thinkers, fewer tool jockeys; more translators, fewer silo specialists.

One more trap the report flags: the urge to put AI everywhere. That’s not strategy; that’s saturation, Carrillo Cotten stressed, as she described the reality inside many firms as “a greenhouse versus a tornado”: controlled AI experiments and chaotic deployment.

Strategic teams close the gap. They pick use cases where AI’s failure modes are tolerable, instrument them deeply, prove value, then expand. They keep AI out of the places where its ambiguity is a non-starter, at least until controls catch up.

That approach plays to QA’s strengths. Testers already think in terms of risk, scope, guardrails. With AI, that becomes the roadmap: where AI can help reviewers and release managers, where it can generate scaffolding but not final artefacts, where it must be fenced by policy until governance is mature. It’s not “no”; it’s “not yet”, with a clear path to “yes”.

The report concludes by warning that the AI wave won’t slow. In finance, it can’t be allowed to crash as the sector’s advantage has always been discipline, controlled change at scale. AI can strengthen that discipline or shred it, the authors stated. The difference is measurement and talent.

Levy’s point is the north star: outcomes over output. “The financial organisations that get it right will look beyond activity metrics, tracking how AI improves teamwork, accelerates delivery, and drives business results that matter.”


QA FINANCIAL PODCASTS

Listen to Sudeepta Guchhait on Nasdaq’s new Mimic AI testing platform
QA Financial sits down with Sudeepta Guchhait, Senior Director of Product Framework & Quality Engineering at Nasdaq

——–

Listen to Wesley Scheffel and Robin Rain on Schroders’ DevOps strategy
We catch up with Wesley Scheffel, Head of Cloud Platform and Product Engineering at Schroders, and Robin Rain, Head of Cloud Platform Architecture

——–

Listen to Citi’s Jason Morris on Lightspeed and the future of continuous delivery
Jason Morris, Head of Developer Pipelines for Securities Markets and Banking at Citi, talks about Lightspeed


NEXT MONTH


Why not become a QA Financial subscriber?

It’s entirely FREE

* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *

REGISTER HERE TODAY


LAW, REGULATION & COMPLIANCE

Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.


READ MORE


WATCH NOW