Visual testing with AI: why banks must rethink their strategies

Stefan Dirnstorfer, the CTO of Testup

In the complex world of banking and financial services, visual accuracy in software interfaces isn’t just aesthetic, it’s critical.

Whether it’s ensuring regulatory information is presented correctly or validating that transactional elements behave consistently across screens and platforms, the importance of reliable visual testing continues to grow.

But as Stefan Dirnstorfer, the chief technology officer of Munich, Germany-based Testup, warns that current AI tools are far from perfect when it comes to spotting the differences that matter.

Unlike back-end logic checks, visual testing targets what the user actually sees. In financial systems, where even a slight misalignment or a missing confirmation message can lead to compliance breaches or customer confusion, visual accuracy is essential.

Yet, detecting visual regressions, especially subtle shifts or layout changes, remains a persistent challenge.

Dirnstorfer noted: “Spotting differences between two images is an important task in visual test automation when a screenshot needs to be compared to a previous version or a reference design.”

However, he explained, pixel-based comparison algorithms can be overly sensitive, flagging false positives from even minor misalignments, while generative AI models often overlook layout issues altogether.

This mismatch between what AI can process and what matters in high-stakes environments like banking exposes a fundamental gap in current testing frameworks.


“Pixel-based algorithms report major differences when only a minor displacement by a few pixels occurred.”

– Stefan Dirnstorfer

Dirnstorfer offered the example of two nearly identical map screenshots, where a small but significant element, a missing street, was only a few pixels off in alignment.

All leading generative AI models failed to detect the change. “Claude and Gemini fail directly and answer that there is no substantial change,” he observed.

Even ChatGPT-4, after analysing the images using Python code and multiple libraries, concluded “nothing substantive has changed.”

This inability to identify visual regressions accurately, especially in maps, charts, and irregular layouts, is a serious limitation in fintech QA, where dashboards, customer portals, and trading interfaces often involve dense visual data.

To combat these shortcomings, Dirnstorfer and his team turned to convolutional neural networks (CNNs), which compare small image segments instead of individual pixels. This allows for some tolerance to displacement while reducing the risk of false positives.

A 9×9 pixel region, he explained, “is large enough to consider minor displacements when determining equality and small enough for a lightweight neural network.”

This method is promising for high-frequency financial applications, where screens constantly shift and renderings vary by device. Still, larger distortions, such as entire UI elements moving or resising, remain a challenge. Scaling CNNs to detect these requires complex and computationally expensive algorithms.


“Knowing that a button ‘only’ moved has different implications than if it moved and changed.”

Stefan Dirnstorfer

Dirnstorfer outlined a more advanced solution: combining CNNs with multi-scale image processing. By training AI to estimate displacement vectors and recursively refining comparisons on progressively smaller image regions, the system starts to behave more like the human visual cortex.

“This suggested solution helps to find relevant differences even after the layout changed significantly,” he explained. But even this system has its limits, it still can not fully explain what changed or why, nor can it track objects that swap positions.

In financial QA, such precision is often the difference between a passed and a failed release. While these hybrid solutions can assist testers by narrowing down regions of interest, human oversight remains indispensable.

Implications for banks

As banking interfaces grow more dynamic, with real-time dashboards and responsive UI designs, the need for robust visual testing tools will only increase. The tolerance for regression errors, particularly those that affect legal disclosures, pricing, or transaction visibility, is near zero.

Dirnstorfer concluded that while generative AI has made great strides in areas like code generation and chat-based workflows, it “totally fails” in domains requiring nuanced spatial analysis.

For now, the visual QA process in financial services must combine smart automation with human judgement, especially when the consequences of missing a small visual cue could be costly.


NEXT MONTH


Why not become a QA Financial subscriber?

It’s entirely FREE

* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *

REGISTER HERE TODAY



REGULATION & COMPLIANCE

Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.


READ MORE


WATCH NOW



NEW EVENT!


Why not become a QA Financial subscriber?

It’s entirely FREE

* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *

REGISTER HERE TODAY



REGULATION & COMPLIANCE

Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.


READ MORE


WATCH NOW