Agentic artificial intelligence is moving quickly from concept to deployment across enterprise environments, but for banks and financial institutions, the shift is exposing a new layer of testing and governance complexity.
The technology is rapidly moving from experimental pilots into core banking workflows, where the tolerance for failure is close to zero.
According to Sukrit Kalia, an expert in artificial intelligence and machine learning and a former AI/ML specialist at McKinsey & Company: “Agentic artificial intelligence represents a fundamental shift from assistive AI toward autonomous digital actors capable of planning, reasoning, and executing complex enterprise tasks.”
For QA and software testing teams, that shift is not just technical, but structural. Traditional testing frameworks built around deterministic systems are being stretched by systems that can interpret goals, adapt to changing environments and execute multi-step workflows across core banking infrastructure.

“Unlike traditional automation or generative AI tools, agentic systems operate with multi-step reasoning capabilities, dynamic decision-making, tool and API integration, inter-agent collaboration, and continuous environmental adaptation,” Kalia wrote in a recent analysis.
In banking environments, where systems interact with payments infrastructure, customer data and regulatory controls, this autonomy introduces a fundamentally different risk profile.
“Agents may access sensitive data, initiate transactions, or influence operational outcomes without continuous human supervision,” he noted.
That is forcing a rethink of QA itself. Instead of validating outputs alone, testing teams must now assess how systems behave over time, across workflows and under stress.
Kalia framed this shift clearly: “Risk management must therefore focus not only on model accuracy but also on behavioral control.”
Testing decisions, not merely systems
The implications for financial services QA teams are significant. Autonomous agents blur the line between software functionality and operational decision-making, meaning failures are no longer confined to bugs, but can cascade across systems.
Kalia highlighted that “autonomy fundamentally changes risk exposure,” pointing to risks such as “autonomous planning errors cascading across workflows,” “incorrect tool or API usage,” and “emergent system behavior.”
In highly regulated banking environments, those risks translate directly into operational resilience concerns. A misfiring agent could trigger unauthorised transactions, misuse privileged access or propagate incorrect decisions across interconnected systems.
This is why QA frameworks are being redefined around behaviour. “Traditional AI testing focuses on outputs; agentic QA evaluates behavior,” Kalia explained.
“Risk management must focus not only on model accuracy but also on behavioral control.”
– Sukrit Kalia
India-based Kalia outlined four key dimensions that testing teams must now consider: “Execution, task completion accuracy; compliance, so adherence to policies and permissions; integration, correct system interaction; [and] resilience, the safe recovery from failures.”
For banks, this aligns closely with regulatory expectations around observability, auditability and control. Testing is no longer a pre-deployment checkpoint, but an ongoing process embedded across the lifecycle.
“Lifecycle oversight must detect performance drift, anomalous behavior, and emerging risks,” Kalia stressed, reinforcing the need for continuous monitoring and real-time validation.
This is particularly relevant as banks adopt AI in areas such as fraud detection, customer operations and internal automation, where silent failures or gradual drift can go undetected without robust QA instrumentation.
Governance enters testing
The rise of agentic AI is also collapsing the traditional divide between governance and testing. Controls such as access management, audit logging and human oversight are no longer purely policy concerns, but must be actively validated through testing frameworks.

Kalia is convinced that “governance models must therefore evolve from model governance to autonomy governance,” requiring structured oversight across development, testing and production environments.
For QA teams, this means validating not just system performance, but governance controls themselves.
“Agents should operate under least-privilege access, secure authentication, activity logging, and constrained execution environments,” he explained.
Equally, accountability must be testable. “Each agent must have designated business and technical owners. Humans retain ultimate responsibility and must be able to supervise, intervene, or override decisions.”
This introduces new testing scenarios, from validating kill-switch mechanisms to simulating edge cases where human intervention is required.
“Traditional AI testing focuses on outputs; agentic QA evaluates behavior.”
– Sukrit Kalia
In practice, banks are being pushed toward more advanced testing techniques. Kalia points to approaches such as “reasoning trace analysis,” “multi-agent red teaming,” and “high-fidelity sandbox testing” as necessary to validate complex agent behaviour.
These approaches mirror threat-led testing and resilience frameworks already familiar to financial institutions, but extend them into AI-driven decision systems.
Deployment models are also shifting. Instead of traditional releases, agentic systems require progressive rollout and continuous observability.
“Agent deployment should follow progressive rollout strategies,” Kalia noted, including “canary releases to controlled user groups” and “restricted operational scope during early deployment.”
Real-time validation
For QA teams, this reinforces the move toward live-environment validation. “Real-time telemetry capturing decisions and actions” and “automated alerts triggering human intervention” become core testing artefacts, not just operational tools, Kalia said.
This is particularly critical in financial services, where high-risk actions such as payments, data changes and access controls must be monitored continuously.
“Continuous monitoring must prioritie high-risk actions such as financial operations, data modification, and privileged access,” he wrote.
Post-deployment validation also becomes central. “Post-deployment validation is essential to detect performance drift and silent failures,” Kalia added.
In effect, testing is no longer a phase, but a permanent layer of control embedded in the system.
“Continuous monitoring must prioritie high-risk action.”
– Sukrit Kalia
Ultimately, the success of agentic AI in banking will depend not just on technical performance, but on trust. That trust must be built through transparency, accountability and robust QA processes.
Kalia emphasised that “users must be informed when interacting with AI agents” and that “systems should maintain traceable logs supporting audit and investigation.”
For QA teams, this introduces a new responsibility: ensuring that systems are not only functional, but explainable and auditable under regulatory scrutiny.
At the same time, human oversight remains central. “Trust in agentic AI depends on transparency, education, and shared responsibility between humans and machines,” Kalia argued.
For banks, this reinforces a broader shift already underway. As AI systems become more autonomous, QA and software testing teams are moving from validating code to safeguarding outcomes.
QA FINANCIAL EVENTS

Why not become a QA Financial subscriber?
It’s entirely FREE
* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *
REGULATION & COMPLIANCE
Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.
READ MORE
- Inside banking’s shift to smarter QA to tackle complexity and risk
- SmartBear CPTO on AI in banking QA: ‘Impressive metrics but no critical scenarios’
- Banks push beyond traditional QA as resilience testing gains ground
- Banking QA professionals warn AI still doesn’t know ‘where the bodies are buried’
- RECAP: The QA Financial Healthcare & Insurance Forum Philadelphia 2026
WATCH NOW



QA FINANCIAL PODCASTS

CLICK HERE TO LISTEN TO OUR EXCLUSIVE CONVERSATIONS

