Stanford study offers critical insights for QA and software testing teams

Diyi Yang

AI agents are accelerating workplace transformation, but a new Stanford study reveals their deployment often misaligns with what workers actually want, offering important guidance for QA, testing, and engineering teams in financial services.

A team of Stanford researchers, including Diyi Yang, Assistant Professor in the Computer Science Department, has released one of the first comprehensive audits of worker attitudes toward AI agent adoption.

The study, called Future of Work with AI Agents, draws on survey data from 1,500 U.S. workers across 104 occupations and analyzes 844 tasks to uncover where AI should be deployed, and where it shouldn’t.

The report introduced the AI Agent Worker Outlook & Readiness Knowledge Bank (WORKBank), a new database that combines worker preferences with expert assessments of AI capabilities.

It offered a structured framework for understanding how automation and augmentation should evolve, particularly in complex, compliance-heavy sectors like finance.

“We address this gap by conducting a nationwide audit to understand what workers want AI agents to automate or augment, and how those desires align with the current technological capabilities,” explained Yang.

This growing disconnect between workers’ actual pain points and the focus of AI startups was bluntly summarised by David Villalón, Co-founder and CEO of Maisa, in a widely shared LinkedIn post: “41% of YC AI startups are automating tasks that workers don’t actually want automated.”

Automating the ‘wrong tasks’

One of the study’s most significant findings is the mismatch between what workers want automated and where AI investment is currently directed.

The researchers classified occupational tasks based on whether workers wanted them automated, and whether AI was capable of doing so.

However, much of the current AI investment, particularly from high-profile startup incubators like Y Combinator, is not aligned with the tasks workers most want help with.

Instead, 41% of Y Combinator-funded AI companies are developing tools for tasks that workers say they want to keep.

David Villalón

According to Yang: “The desire-capability landscape of AI agents at work reveals critical mismatches of current AI agent research and investment. 41.0% of Y Combinator company-task mappings are concentrated in the Low Priority Zone and Automation ‘Red Light’ Zone.”

Villalón added further commentary on this disconnect: “New Stanford research just exposed the gap: Workers want AI for repetitive tasks that free them for higher-value work. Meanwhile, startups are trying to automate the parts of jobs people actually enjoy. The disconnect is almost comical.”

This matters especially for QA and automation leads in financial firms, where testing tools that attempt to automate high-stakes or high-enjoyment tasks, such as exploratory testing, compliance oversight, or client-facing risk reviews, may not align with what teams actually need or trust.

Villalón captured this misalignment humorously: “Workers: ‘Please automate my expense reports.’ AI Startups: ‘We’ve built AI to replace your creative strategy work!’ Workers: ‘Can you handle these 500 daily data entries?’ AI Startups: ‘Our AI will do your client relationships for you!’”

Rather than proposing a one-size-fits-all automation model, the Stanford framework emphasizes human-AI collaboration. To capture this nuance, the researchers developed a Human Agency Scale, ranging from H1, no human involvement, to H5, human involvement essential.

This new metric complements existing automation frameworks by quantifying how much workers believe they should be involved in specific tasks.

“We introduce the Human Agency Scale (H1–H5) to quantify the degree of human involvement required for completing occupational tasks and ensuring their quality,” the study explained.

Among the occupations studied, workers most commonly selected H3, equal partnership between human and AI, as the ideal model. This supports the idea that AI should be a collaborative tool rather than a full replacement.

However, the study also found that workers and experts often disagreed on how much human involvement is really needed. In nearly half of the tasks studied, workers preferred more human involvement than AI experts believed was necessary. In more than 16% of cases, this preference gap was two full levels higher on the Human Agency Scale.

For QA teams, this translates into a need for tools that enhance, not replace, tester autonomy, for example, using AI for test coverage analysis or log summarisation while preserving manual ownership of high-risk areas such as fraud detection or client transaction workflows.

Villalón’s take is clear: “I see this daily. Founders so obsessed with ’10x disruption’ they forget to ask users what’s actually painful.”


“The unsexy automation opportunities are massive. Every knowledge worker loses 10+ hours a week to mind-numbing tasks. But that’s not venture-scale sexy, right?”

David Villalón

The Stanford team also explored how AI is reshaping which skills are most valuable in the workplace. They used data from the U.S. Department of Labor’s O*NET database to match tasks to their underlying skills and then analysed how those skills ranked in terms of both average wage and required human agency.

Their findings suggest that the demand for information-processing skills, such as data analysis, categorizing information, or researching, may decline as AI becomes more capable of handling these tasks.

In contrast, interpersonal and organizational skills, such as coordination, team leadership, and decision-making, are more closely associated with tasks that workers believe require high levels of human involvement.

This is a key insight for engineering and QA teams in financial services, where the ability to interpret complex results, coordinate across teams, and respond to fast-changing regulatory demands often outweighs the value of raw technical speed.

As Yang and her co-authors wrote: “Human agency level and average wage analysis reveals a potential shift in valued human competencies, from information-processing skills to interpersonal skills.”

Villalón summed it up powerfully: “The biggest wins aren’t human vs. AI. They’re human + AI. But that requires admitting that workers might actually know what they need better than founders in a garage.”

The Stanford study ultimately reinforces a message that software testers and quality engineers are already beginning to understand. Successful AI implementation depends not just on what’s technically feasible but on what users value, trust, and want to work with.

QA and testing teams should prioritize the automation of repetitive, low-enjoyment tasks, such as test execution, logging, documentation, and data prep, while preserving human involvement in areas where judgment, compliance, and creativity are key. Augmentation, not replacement, should be the default.

As Yang emphasised: “AI agents are reshaping the workplace, but that reshaping must be done with the worker’s voice front and center.”

Villalón echoes this with a final challenge to the industry: “What if we built AI that workers actually asked for?”


Why not become a QA Financial subscriber?

It’s entirely FREE

* Receive our weekly newsletter every Wednesday * Get priority invitations to our Forum events *

REGISTER HERE TODAY




REGULATION & COMPLIANCE

Looking for more news on regulations and compliance requirements driving developments in software quality engineering at financial firms? Visit our dedicated Regulation & Compliance page here.


READ MORE


WATCH NOW


QA FINANCIAL PODCASTS

Listen to Sudeepta Guchhait on Nasdaq’s new Mimic AI testing platform
QA Financial sits down with Sudeepta Guchhait, Senior Director of Product Framework & Quality Engineering at Nasdaq

——–

Listen to Wesley Scheffel and Robin Rain on Schroders’ DevOps strategy
We catch up with Wesley Scheffel, Head of Cloud Platform and Product Engineering at Schroders, and Robin Rain, Head of Cloud Platform Architecture

——–

Listen to Citi’s Jason Morris on Lightspeed and the future of continuous delivery
Jason Morris, Head of Developer Pipelines for Securities Markets and Banking at Citi, talks about Lightspeed