In banking and financial services, software testing has moved decisively beyond release validation and functional correctness.
QA teams are now expected to demonstrate that complex, cloud-native systems can survive failure, degrade gracefully under stress, and recover predictably when the unexpected occurs.
As regulators sharpen their focus on operational resilience and customers grow less tolerant of outages, resilience testing is increasingly becoming a board-level concern rather than a purely technical one.
Against this backdrop, chaos testing, once viewed as an experimental practice confined to hyperscalers and digital-native firms, is now being revisited by banks as a practical method for uncovering hidden system weaknesses.
The idea that systems should be designed to fail safely, rather than engineered around the assumption of stability, is reshaping how QA teams think about coverage, risk and confidence.
One of the clearest early examples of this shift in a regulated banking environment came from Starling Bank, a digital challenger bank based in London, founded by former Allied Irish Banks COO, Anne Boden, in 2014.
In a detailed retrospective published on InfoQ, Greg Hawkins, an independent consultant on technology, fintech, cloud and DevOps and a former CTO of Starling Bank, explained how chaos testing was introduced not as a theoretical exercise, but as a pragmatic response to real operational risk.
Hawkins said the initial motivation was to confront a class of risks that traditional testing routinely ignores.
“When Starling Bank started out with chaos, they started simply, and quite unscientifically, by setting out to remove risks from the abyss of ignorable,” he explained.
Hawkins described this as the space where failure scenarios are neither impossible nor common, but are often dismissed because they feel unlikely or uncomfortable to address. “In between the absurd and the mundane lurks what I call the abyss of ignorable.”
“Chaos engineering is acquiring the rigour and trappings of a discipline.”
– Greg Hawkins
Rather than adopting heavyweight tooling, Starling focused on simplicity and direct exposure to failure.
“Starling implemented their own simple chaos daemon, just as they implemented their own core banking system. The reason was the same one that drove so much of the decision making then and today: simplicity.”
The internal tool relied on AWS APIs to randomly terminate servers, forcing the platform to prove it could survive real infrastructure loss rather than simulated test conditions.
For QA and engineering teams, Hawkins argued that the real value of chaos testing lies less in individual experiments and more in the permanent removal of doubt around specific failure classes.
“From this moment onward, forever, you know your system is not vulnerable to a certain class of problems.”
He added that this fundamentally alters engineering behaviour: “Even more importantly, from this moment, you are by default building a system in a way that expects this failure condition, not just paying lip service to it. You have removed the temptation to discard it.”
This shift, from documenting resilience to demonstrating it continuously, is increasingly relevant to financial services QA teams in 2025. Regulatory regimes now emphasise evidence of resilience across end-to-end services, not just individual components.
As banks modernise architectures around microservices, APIs and event-driven platforms, failure modes multiply in ways that scripted testing alone cannot realistically cover.
Focus on chaos
Hawkins framed chaos testing as part of a maturing discipline rather than an ad-hoc practice.
“Chaos engineering is acquiring the rigour and trappings of a discipline, experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production,” he explained.
For QA teams, this aligns with a broader industry move toward continuous assurance, where confidence is built incrementally through live experimentation, observability and repeatable evidence.
In 2025, resilience testing is increasingly intertwined with QA strategy, DevOps pipelines and regulatory reporting. Banks are under pressure to show not only that systems work, but that they fail predictably and recover within defined tolerances.
The experience Hawkins described at Starling illustrates why chaos testing is gaining renewed attention: it turns uncomfortable hypotheticals into observable facts, and replaces assumptions with proof.
For QA leaders navigating today’s resilience expectations, the lesson is clear. Chaos testing is no longer about breaking systems for sport. It is about shrinking the “abyss of ignorable”, and ensuring that what cannot be ignored is tested, understood and designed for, long before customers or regulators find it first.
COMING IN 2026


Why not become a QA Financial subscriber?
It’s entirely FREE
* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *
REGULATION & COMPLIANCE
Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.
READ MORE
- Why real-time monitoring and scenario testing are becoming core QA disciplines
- BankDhofar takes an automated approach to strengthen QA
- Banks warned AI still fails on real-world software testing tasks
- SEC’s AI emphasis drives new QA and testing imperatives for US banks
- Inside the chaos: The new reliability discipline reshaping banking QA
WATCH NOW

