US lawmaker working on bill to introduce third-party AI testing

Senator John Hickenlooper
John Hickenlooper

A US senator is planning to propose new rules that would harmonise and expand testing standards and procedures for AI-powered applications and solutions.

Senator John Hickenlooper, a Democrat from the state of Colorado, is currently in the process of finalising the ‘Validation and Evaluation for Trustworthy Artificial Intelligence Act’, which primary aim is to speed up and standardise the validation for AI systems.

Third-party testing and external audits of AI applications should become standard, according to proposals in the draft bill.

The political veteran is said to consider introducing the bill when the US Senate returns from recess at the end of this summer, because Hickenlooper thinks “it’s about time” AI platforms and automated systems should be tested before being deployed.

Impact on financial services

If introduced and approved, the bill could have a significant impact on banks, financial services firms, insurance companies and other finance players as they are rolling out and integrating AI-powered tools and features at an unprecedented rate.

That is exactly why Hickenlooper decided to introduce this law now. “AI is moving faster than any of us thought it would two years ago,” he stressed.

“But we have to move just as fast to get sensible guardrails in place to develop AI responsibly before it’s too late. Otherwise, AI could bring more harm than good to our lives.”

He thinks it’s “paramount” that external auditors and third-party developers and engineers verify risk management claims of AI companies and compliance with AI guardrails.


“We have to move just as fast to get sensible AI guardrails in place.”

– Senator John Hickenlooper

Hickenlooper, who is chair of the Senate Subcommittee on Consumer Protection, Product Safety, and Data Security, said the act would direct the US National Institute of Standards and Technology (NIST) to work with federal agencies and stakeholders across industry.

Collectively, they should develop detailed specifications, guidelines, and recommendations for the certification of third-party evaluators to work with AI companies to provide robust independent external assurance and verification of their systems.

Currently, many AI companies make claims about how they train, conduct safety red-team exercises, and carry out risk management on their AI models without any external verification, Hickenlooper said.

“My bill would create a pathway for independent evaluators, with a function similar to those in the financial industry and other sectors, to work with companies as a neutral third-party to verify their development, testing, and use of AI is in compliance with established guardrails,” he argued.

As Congress moves to establish AI regulations, benchmarks to independently validate AI companies’ claims will only become more essential, Hickenlooper continued.

In a February speech at Silicon Flatirons, Hickenlooper proposed his “Trust, but Verify Framework”, establishing auditing standards for Artificial Intelligence (AI) to increase transparency and adoption of AI.

Details of the bill

Hickenlooper’s bill would direct NIST, in coordination with the US Department of Energy and National Science Foundation, to develop specifications and guidelines for developers and deployers of AI systems to conduct internal assurance and work with third parties on external assurance regarding the verification and red-teaming of AI systems.

The act would also establish a collaborative Advisory Committee to review and recommend criteria for individuals or organisations seeking to obtain certification of their ability to conduct internal or external assurance for AI systems.

Finally, the law would require NIST to conduct a study examining various aspects of the ecosystem of AI assurance, including the current capabilities and methodologies used, facilities or resources needed, and overall market demand for internal and external AI assurance.

Global efforts

Standard third-party tests are practically non-existent and there are no uniform testing models for AI yet, despite banks, insurers and other financial services firms rushing to roll out and implement AI-powered software in their digital infrastructure.

As a result, many AI solution providers have gone onto developing their own standards and testing practices, which are far from transparent or uniform, and often lack impartiality.

However, Hickenlooper’s bill is not the only initiative to introduce oversight int he crowded, fragmented AI space as the lack of global, uniform AI standard testing has not gone entirely unnoticed.

In both the U.S., UK and EU legislators are gradually starting to call for a set of standards or are even actively looking into quality assurance issues with regards to AI.

In the U.S. and Britain, for example, collaborations are underway to jointly test AI models and develop common standards.

Groups in both countries have said they are working to use the same tools, infrastructure and approach when it comes to AI testing.

Meanwhile, the European Parliament approved earlier this year what is believed the world’s most comprehensive regulatory framework for the use and rollout of artificial intelligence.

However, the Act did fail to spell out any stipulations or rules with regards to testing AI applications or monitoring the implementation of AI tech in the financial services space.

The Artificial Intelligence Act bans certain AI applications, with new obligations for high-risk AI systems, which include banking and insurance and certain systems in law enforcement.

“Such systems must maintain use logs, be transparent and accurate, and ensure human oversight,” said MEP Dragos Tudorache, who worked on the new EU legislation.

Therefore, Tudorache, who is also the EP’s Civil Liberties Committee co-rapporteur, did acknowledge “much work lies ahead that goes beyond the AI Act itself.”

He told QA Financial that: “AI will push us to rethink. The AI Act is a starting point for a new model of governance built around technology. We must now focus on putting this law into practice with testing being a major element.”

Meanwhile, in Britain, the UK AI Safety Institute developed recently a testing platform designed to measure and determine the quality and reliability of artificial intelligence-powered applications.

Called Inspect, the open-source solution is said to be the world’s first government-funded AI safety testing toolkit to be made publicly available.

Ian Hogarth

The UK’s Connect solution allows for a standardised way to assess the capabilities of individual AI models across various aspects, including technical capabilities, core knowledge, reasoning abilities and autonomous functionalities.

The toolset functions through three core components, namely datasets provide sample test scenarios, solvers execute the tests and scorers analyse the results and generate metrics.

“The platform is extensible, with a permissive MIT licence Third-party developers can create and integrate additional testing methods using Python, allowing Inspect to adapt and evolve alongside the rapidly-changing AI landscape,” AI Safety Institute chair Ian Hogarth explained.

He stressed that when it comes to testing the quality and reliability of artificial intelligence applications, and the larger language models that power them, the QA space still has a long way to go.

Hogarth added that “collaboration on AI safety testing means having a shared, accessible approach to testing and evaluations, and we hope Inspect can be a building block.”

QA firms roll out tests

As the current AI testing landscape is still so fragmented, this has prompted a number of young developers in California to set up a new startup to develop a standard test for artificial intelligence applications and the larger language models they use.

The founder said the firm, called Vals.ai, aims to create a global test for AI apps, with a specific focus on the financial services industry, corporate finance and legal services such as contract law and tax law.

CTO and founder Langston Nashold, based in Palo Alto, California, launched up the company with Red Havaei and Rayan Krishnan after the three completed Stanford’s masters program in artificial intelligence.

Nashold and his team are working to develop a comprehensive, third-party test to review large language models.

“Model benchmarks today are currently self-reported: there’s a concerning amount of dataset cherry-picking when results are shared. Moreover, the models are often inadvertently trained on evaluation sets, compromising the integrity of results,” Nashold stressed.

He continued: “To combat this, our first initiative is a public benchmark. We’ve rigorously tested 15 LLMs on four domains, ensuring that two of the datasets remain completely private to prevent any data leakage.”

GenAI testing

While Vals.ai takes a global approach and targets various sectors, including financial services, California-based Akto also spotted the need for AI application testing within the rapidly-evolving banking and insurance space.

Chief technology officer Ankush Jain told QA Financial that Akto’s new platform, GenAI Security Testing, is “the world’s first proactive generative artificial intelligence security testing platform” as he stressed the system has been designed to specifically target the security challenges many banks, insurers investors and other big finance players face.

The software developer said the new solution should better protect the security of large language models by testing for flaws and vulnerabilities in LLMs and their security layers, thereby picking up on attempts, or detected attempts, to link malicious codes for remote access, which often lead to outright hacks.

The system also focuses on cross-site scripting and other potential hacks that may allow the hackers to obtain access or information.

The primary aim is to constantly test the LLMs the financial institution uses in order to confirm whether the models are vulnerable to creating or producing incorrect or irrelevant outputs.


“New threats have emerged due to overreliance on AI outputs without proper verification.”

– Ankush Jain

Jain pointed out that the solution includes a range of features, including over 60 test cases that cover a range of elements that may confirm system vulnerabilities in the GenAI infrastructure.

Examples include overdependency on specific data sets and prompt injection of untrustworthy data.

“Our generative AI security experts have developed these test cases to ensure protection for financial firms looking to deploy generative AI models,” he explained.

“The tests try to exploit LLM vulnerabilities through different encoding methods, separators and markers,” he continued.

“This specially detects weak security practices where developers encode the input or put special markers around the input.”

Jain agreed with the Vals.ai team that there is a clear demand for a more uniform and standardised approach to AI testing in the wider financial services space as banks, insurers and other firms are rolling out AI and LLMs as never before, “driven by a desire for more efficient, automated workflows.”

However, “new threats have emerged, such as unauthorised prompt injections, denial-of-service attacks and data inaccuracies due to overreliance on AI outputs without proper verification,” he said.

“As hackers continue to find more creative ways to exploit LLMs, the need has arisen for security teams to discover a new, automated way to secure LLMs at scale,” Jain noted.

Akto is a venture-capital back startup that was launched by current CEO Milin Desai in 2022. Among its investors are Accel Partners, Notion Labs’ founder Akshay Kothari and Tenable’s founder Renaud Deraison.


UPCOMING QA FINANCIAL EVENTS

SECURE YOUR SPOT TODAY

READ MORE


Become a QA Financial subscriber – for FREE

News and interviews * Receive our weekly newsletter * Get priority invitations to our Forum events

REGISTER HERE TODAY