Synthetic data rapidly gaining momentum in financial services QA

Synthetic data, once a niche concept, is now moving into the mainstream of banking technology. For QA teams under pressure to deliver faster releases while ensuring compliance, it offers a way to test applications and train AI systems without exposing sensitive customer records.

With regulators sharpening their focus on data governance, many financial institutions see synthetic data as the bridge between innovation and regulation, thereby becoming a cornerstone of software testing and AI development across the heavily regulated financial services sector to meet compliance, resilience, and speed demands.

“Synthetic data is created algorithmically rather than collected from actual events, allowing companies to develop and test AI models while protecting privacy,” explained Poland-based Kacper Rafalski of Netguru.

“These computer-generated datasets can be tailored to specific needs, larger, smaller, or more diverse than original data.”

For QA teams under pressure from frameworks such as GDPR, the EU AI Act, and DORA, this privacy-preserving quality is particularly appealing. “Synthetic data enables AI development while preserving privacy by algorithmically generating information that resembles real data without exposing sensitive details,” Rafalski noted.

“The technology helps companies comply with data regulations while still allowing them to share information and develop competitive AI applications.”

Testing without exposing production data

Financial institutions often struggle to test applications safely without breaching confidentiality obligations. Synthetic data provides a practical answer. “Software developers use synthetic data to thoroughly test applications before deployment,” Rafalski explained. “This practice ensures systems function properly under various conditions without risking real user information.”

The ability to scale is also critical in financial QA. According to Rafalski: “Organizations can customize synthetic datasets to address specific needs like increasing diversity, removing bias, or expanding limited training data.”

Fraud detection and risk-management teams are also turning to synthetic data to compensate for scarce or incomplete datasets.

“Organisations use synthetic data to train fraud detection systems and develop new detection methods,” Rafalski said. “These artificial datasets mimic patterns found in real financial transactions, helping AI models learn to identify suspicious activities.”

By generating edge-case scenarios on demand, QA teams can better prepare AI systems for rare but high-impact events.

“Synthetic datasets can be shared freely without confidentiality restrictions that typically limit real data exchange,” Rafalski added. “This promotes collaboration and innovation across organisations and borders.”


“Since no real personal data is involved, many of the restrictions on international data transfers may not apply.”

– Kacper Rafalski

Despite its advantages, synthetic data introduces risks that testing teams must evaluate closely. “Synthetic data often struggles with accuracy and realism issues,” Rafalski warned. “When algorithms generate artificial data points, they may create patterns that don’t truly reflect real-world scenarios.”

The source of training data is also key. “The generation process heavily depends on the quality of real data used as a foundation,” Rafalski stressed.

“If the original dataset contains flaws or biases, these issues may be amplified in the synthetic version, creating more significant problems downstream,” he said.

Bias is particularly concerning in regulated industries where fairness is scrutinised.

“Bias represents one of the most serious ethical challenges in synthetic data,” Rafalski pointed out. “If the original data contains societal biases, these will likely transfer to the synthetic version unless specifically addressed.”

For banks operating across multiple jurisdictions, synthetic data can also simplify cross-border testing. “Synthetic data simplifies cross-border data sharing by removing personally identifiable information from the equation,” Rafalski said.

“Since no real personal data is involved, many of the restrictions on international data transfers may not apply.”

Rafalski emphasised that adoption is accelerating. “Market forecasts show expansion from $381.3 million in 2022 to $2.1 billion by 2028, reflecting its increasing importance across industries,” he said.

For QA leaders in financial services, the message is clear: synthetic data offers a compliance-safe route to more robust testing and AI development, provided its limitations are addressed with rigorous governance and validation.


QA FINANCIAL PODCASTS

Listen to Sudeepta Guchhait on Nasdaq’s new Mimic AI testing platform
QA Financial sits down with Sudeepta Guchhait, Senior Director of Product Framework & Quality Engineering at Nasdaq

——–

Listen to Wesley Scheffel and Robin Rain on Schroders’ DevOps strategy
We catch up with Wesley Scheffel, Head of Cloud Platform and Product Engineering at Schroders, and Robin Rain, Head of Cloud Platform Architecture

——–

Listen to Citi’s Jason Morris on Lightspeed and the future of continuous delivery
Jason Morris, Head of Developer Pipelines for Securities Markets and Banking at Citi, talks about Lightspeed


THIS NOVEMBER


Why not become a QA Financial subscriber?

It’s entirely FREE

* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *

REGISTER HERE TODAY


REGULATION & COMPLIANCE

Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.


READ MORE


WATCH NOW