As regulators, boards and customers demanded greater transparency around artificial intelligence, banks were under growing pressure to prove not just how AI worked, but how it was tested, controlled and trusted in production.
The conversation around AI in banking had shifted decisively over the past year. What had once been framed as a race to deploy models and features had increasingly become a question of assurance, validation and risk ownership.
Financial institutions were no longer being judged on how quickly they could roll out AI capabilities, but on how well they could evidence control over them.
Against that backdrop, Commonwealth Bank moved to publish a detailed, organisation-wide account of how it had been ideating, developing, deploying and managing AI systems.
The move marked a notable shift in tone for the sector, signalling that transparency around AI risk and testing had become a strategic priority rather than a compliance afterthought.
Chief executive Matt Comyn said the decision to release the report had been driven by growing stakeholder demand for clarity.
“We’ve heard that stakeholders want to better understand how AI is being used across the Bank and our approach to managing the risks associated with its adoption.”
Comyn argued that “this report outlines our progress and the safeguards we have in place to support responsible use.”
What emerged most clearly from the bank’s approach was that AI testing had expanded far beyond traditional model validation. It had become embedded across the entire lifecycle of an AI system, from early ideation through to live monitoring in production.

The bank described a structured process in which models were tested for accuracy, bias and data quality before deployment, and then subjected to independent review depending on their risk classification.
Validation was no longer a one-off exercise. It had become an ongoing requirement tied to governance and regulatory expectations.
Security testing had also been integrated into this lifecycle. AI-enabled systems were subjected to penetration testing prior to release, with scenarios designed to simulate both real-world and adversarial attack conditions.
This reflected a growing recognition that AI systems introduced new threat surfaces that could not be addressed through traditional software testing alone.
After deployment, models were monitored continuously to ensure they remained fit for purpose. Performance drift, changes in data patterns and emerging risks were assessed through both quantitative and qualitative reviews, reinforcing the idea that AI systems were never truly “finished.”
Real-time validation
The introduction of generative AI had accelerated this shift, particularly in customer-facing environments where incorrect or hallucinated responses could create immediate risk.
Within its chatbot environment, the bank implemented what it described as groundedness guardrails. These mechanisms checked whether model-generated responses were supported by verified data before being delivered to customers. If a response could not be validated, it could be flagged or blocked.
This approach reflected a fundamental change in testing philosophy. Instead of relying solely on pre-release validation, controls were increasingly being pushed into runtime environments.
AI systems were effectively being tested in real time, with validation layers operating alongside the models themselves.
“A new era is emerging where trust, not just speed, is the ultimate competitive advantage.”
– Dr. Daghan Lemi Acay
The bank acknowledged that large language models did not guarantee factual accuracy and could generate outputs based on probabilities rather than certainty.
As a result, testing had to account for uncertainty, context and the potential for error in ways that traditional deterministic systems did not require.
The report made clear that AI had been formally recognised as a material risk category within the bank’s enterprise risk framework. This classification had significant implications for QA and testing teams.

Governance structures had been established to oversee AI risk, including dedicated committees responsible for reviewing higher-risk use cases and approving models before deployment.
The board retained ultimate accountability, with executive leadership tasked with ensuring that AI risks were managed in line with the bank’s risk appetite.
Alex Matthews, executive general manager and lead on the report, said the bank had been balancing opportunity with caution.
“As Australia’s largest bank, trust is fundamental to how we use AI. Our approach is focused on our risk management foundations and guided by our AI principles.”
Those principles, which included fairness, transparency, privacy, reliability and accountability, had effectively been translated into testable requirements. QA teams were expected to demonstrate that models met these criteria, turning abstract governance concepts into measurable outcomes.
Testing at scale
The scale at which AI systems were operating introduced additional complexity for testing teams. The bank had been processing more than 20 million payments per day, using AI models to detect anomalies and generate tens of thousands of alerts for customers.
These systems operated in environments where transaction volumes were high and threat patterns were constantly evolving. Testing had to reflect that reality.
High-frequency transaction environments required systems to be validated under sustained load and real-time conditions.
Dynamic threat patterns, including increasingly sophisticated AI-driven scams, required testing scenarios to evolve continuously.

Continuous learning models introduced the risk of behavioural drift, meaning outputs could change over time without explicit code changes.
Operational drift had to be monitored to ensure models remained aligned with their intended purpose and regulatory expectations.
In this context, QA became less about pre-release assurance and more about ongoing system surveillance.
The bank’s fraud and cyber security use cases highlighted another emerging challenge. AI was being used both to defend against attacks and, increasingly, by attackers themselves.
Models were deployed to detect unusual transaction patterns, identify phishing domains and support scam prevention initiatives. At the same time, the bank acknowledged that malicious actors were using AI to scale and refine their attacks.
This dynamic required testing approaches that could simulate adversarial conditions. Systems had to be evaluated not just for functional performance, but for resilience under attack.
Testing scenarios increasingly included simulated phishing campaigns, synthetic fraud behaviours and attempts to exploit model vulnerabilities.
This reflected a broader industry shift towards threat-led testing, where systems were assessed based on their ability to withstand real-world attack patterns.
Engineering workflows
Beyond testing practices, the report pointed to a deeper structural change. Governance was no longer being applied after the fact. It was being embedded directly into development processes.
The bank introduced toolkits and frameworks to guide developers through responsible AI practices, including data quality assessment, fairness evaluation and explainability requirements.
Pre-screening processes were used to identify higher-risk use cases early, ensuring that additional oversight could be applied before development progressed.
Documentation requirements were strengthened to support transparency and auditability, enabling teams to trace how models had been built and how decisions were made. Approval processes ensured that models could not be deployed without appropriate oversight.

This approach effectively turned governance into an operational discipline, tightly coupled with engineering and testing workflows.
Industry observers framed the move as part of a wider shift in how AI maturity was being defined across banking.
Melbourne-bsaed Dr. Daghan Lemi Acay said the industry had been moving away from a focus on speed of deployment towards a more balanced view of risk and control. “Innovation in banking is racing forward, but who is setting the rules for the road?”
He argued that the competitive landscape had changed fundamentally.
“For years, AI in the sector felt like a race to see who could deploy the fastest feature. But a new era is emerging where trust, not just speed, is the ultimate competitive advantage.”
That shift placed governance and testing at the centre of AI strategy. “AI maturity is no longer measured by how many models you have in production, but by how effectively you govern them,” Lemi Acay said.
He added that long-term leadership in the sector would depend on control rather than complexity.
“True AI leadership in banking is built on the strength of your governance, not just the sophistication of your code,” Lemi Acay concluded.
NEXT WEEK

WHY not become a QA Financial subscriber?
It’s entirely FREE
* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *
READ MORE
- Trust, not speed: Why AI governance is now a testing battleground for banks
- NatWest’s AI trade finance overhaul opens new chapter for QA teams
- Banking UAT moves beyond sign-off as QA takes centre stage in system rollouts
- Citi ramps up AI-driven testing in race to modernise legacy systems
- Lloyds, HSBC and NatWest get OpenAI access amid mounting concerns
WATCH NOW

QA FINANCIAL PODCASTS

CLICK HERE TO LISTEN TO OUR EXCLUSIVE CONVERSATIONS

