Financial-services firms are increasingly turning to synthetic data to solve long-standing problems in software testing, privacy protection and risk-system validation.
As fraud detection, AML monitoring, payments infrastructure and insurance-claims processing become more complex, access to real production data has become more restricted.
Synthetic data, generated through machine-learning models, is emerging as a viable substitute, allowing QA and engineering teams to test realistically at scale without exposing sensitive information.

One of the clearest demonstrations of synthetic data’s impact comes from ING Belgium, where test teams adopted machine-learning-generated payment data to validate their SEPA systems.
Wim Blommaert, Head of Test Data Management at ING Belgium, part of the Dutch banking multinational ING Group, describes how the shift removed one of the bank’s biggest bottlenecks.
He noted that “generating as many as 10K synthetic SEPA payments takes only 2 minutes,” a level of speed and volume impossible with manually created or masked data.
Blommaert recalls showing a live audience two payment screens side by side and asking them to identify the real transaction. “The vote was 50/50,” he said, “but both were synthetic.”
For him, it was a moment that captured “how mature this technology has become.” By using synthetic data, the bank achieved what Blommaert calls “100x the test coverage in 1/10th the time,” while staying within strict privacy and data-handling rules.
Not ‘fake data’
While QA teams have relied on artificial or dummy data for decades, ML-generated synthetic data represents a fundamental shift in capability.
Neha Patki, VP of Product and co-founder of DataCebo, draws a clear line between the two worlds. “Creating fake data is an old concept … but machine learning is a whole new ballgame,” she says.
Patki argues that synthetic data’s power lies in its ability to learn real-data behaviour — correlations, distributions, and structural constraints, and reproduce it without copying any underlying personal or sensitive information.
This allows teams to work with datasets that behave like production, even when dealing with heavily regulated or privacy-sensitive domains such as payments, customer profiles, claims records or fraud alerts.
Financial institutions are increasingly using synthetic data not only to test core payment flows, but also for risk-model calibration and validation.
Synthetic data has enabled banks to generate massive AML datasets, containing millions of artificial transactions and thousands of alerts, so that detection models can be evaluated under realistic but privacy-safe conditions.
Insurers have taken a similar approach, using synthetic data to address the scarcity of fraud examples in homeowner-insurance claims.
By augmenting model-training datasets with synthetic records that follow the same statistical patterns as genuine claims, insurers have been able to improve fraud-detection accuracy while avoiding the risk of exposing policyholder information.
These examples all point toward the same shift: synthetic data is becoming a central ingredient in validating the next generation of automated detection systems, especially where real data is limited, imbalanced or sensitive.
A new discipline for QA teams
The rise of synthetic data brings new responsibilities for QA and engineering teams. Synthetic datasets must be evaluated for fidelity to real-world patterns, particularly in financial systems where correlations and edge cases matter.
As Patki noted, synthetic-data workflows “build on previous work done … but machine learning is a whole new ballgame,” meaning testing teams must adopt new forms of validation beyond traditional functional checks.
Blommaert’s experience at ING also highlights the operational side of adoption. Synthetic data removed the dependency on manually curated or masked datasets, enabling full-system testing earlier in the lifecycle and at a scale not previously possible.
But it also required new governance to ensure that the generated data continued to reflect real-world business logic and constraints.
In summary, synthetic data is quietly becoming foundational to how banks and insurers test critical systems. It allows teams to generate high-volume, statistically realistic datasets for payments, fraud detection, AML alerts and claims workflows, without breaching privacy or regulatory requirements.
The experiences shared by Blommaert and Patki show how synthetic data can expand test coverage, accelerate integration cycles, reduce dependence on production data and unlock new capabilities for model testing.
For QA and software-testing teams across financial services, the message is clear: synthetic data is no longer an experiment. It is becoming essential infrastructure for ensuring accuracy, stability and resilience in the systems underpinning modern banking and insurance.
Why not become a QA Financial subscriber?
It’s entirely FREE!
* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *


REGULATION & COMPLIANCE
Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.
READ MORE
- Leapwork engineering head: Why test automation so often fails to deliver
- World Economic Forum warns financial sector must strengthen AI risk controls
- How Banca Progetto is hard-wiring quality into Italy’s digital banking space
- How Wealthsimple builds quality into the product, not around it
- Digital revamp puts spotlight on internal controls at China Construction Bank
WATCH NOW

