Leapwork co-founder warns ‘AI is not in a state we can rely on’

Claus Topholt

By far most banks and financial services firms think their current testing practices are inefficient. Therefore many QA teams are racing to embrace AI and roll out large language models left, right and centre, primarily to boost efficiency, reduce costs and automate more testing processes.

While the reasons are understandable, and the technology deserves full integration into the banktech space, AI still has a long way to go, while LLMs are far from accurate and precise, warns Claus Topholt, CPO and co-founder of Denmark-based Leapwork.

The automation and RPA platform, which broke into the U.S. market about five years ago after raising $30 million, serves clients that include BNP Paribas, PayPal and Visma.

The industry veteran stressed the importance, and potential, for improved natural language understanding to transform software requirements into actionable models.

He argued during a recent fireside chat that this could lead to the automatic generation of relevant test cases based on these requirements, allowing dynamic adaptation to change and stressed the importance to prioritise the “what” over the “how.”

Zooming in on large language models, Topholt questioned its quality and effectiveness.

“In normal situations, the precision and accuracy and robustness of large language models to solve difficult problems decrease from 90% to 65, maybe 70%,” he said.

“That is still the state of the art today. That is also the state of the art when it comes to this agentic AI that now everyone is talking about, which is incredibly interesting, incredibly exciting,” Topholt continued.

He firmly believe AI will transform the way QA teams work. “It’s going to transform the way we think about testing.”

Although Topholt was keen to point out that “the technology is a forward facing future thing. It’s not in a state today where we can put it into our stable category and say, this is the stuff that you can rely on. In fact, a little thought experiment.”


“AI is not in a state today where we can put it into our stable category and say, this is the stuff that you can rely on.”

– Claus Topholt

He urged the industry to imagine that it had the state of the art model and it was made to do testing for us “in an agentic way.”

“We basically write down what we want and then we let the LLM go do it. And we do that ten thousand times.”

“Which is not an unreasonable amount of test runs over the course of a week or over the course of a month in a real enterprise setup,” Topholt explained.

So with such the current state of the art model, he stressed this would lead to thousands of random errors.
“And that’s kind of problematic because most of those errors would show up in random places and it would clog up your ability to do kind of bug hunting,” he noted.

“But maybe the worst part of it is that some of them would be false positives. That the large language model implementation of the test case would just basically go and say, yes, everything’s fine, but it ended up on a weird sidetrack that it wasn’t supposed to end up on.”

Topholt said that a tester would not be able to know this because the test run simply reported that everything’s fine.

“Right now where we want to be is somewhere above ninety-nine percent accuracy. So language models are expanding and they’re coming up with new features, we’re moving in that direction,” he continued.

“There’s a lot of interesting stuff happening because as soon as you get to ninety-eight percent or ninety-nine percent, you can start thinking about bridging the gaps in precision with something else,” Topholt shared.

“You can start thinking about solving it with a visual language where you have to acknowledge that what the LLM has done makes sense before you allow it to do more, but then it can repeat the stuff that you’ve already approved,” he said.

However, Topholt did stress that “these are things that we can only do once we reach a certain level of accuracy. They are unsolved roadblocks. That’s not a reason to give up on this exciting new technology. It’s actually more of a reason to invest in it,” Topholt argued.

Addressing his firm’s capabilities to test AI use cases, such as chatbots or similar agents, he pointed out that Leapwork has “an AI validate blog in the product today.”

“We’re going to mark it more clearly as experimental, but it’s there. If AI bots always answer in exactly the same way, you can just compare the two pieces of text, like what the AI bot responded with and what you expected it to respond with,” he elaborated.

“But that’s not how they work. They elaborate on language and so on. So, what you can do with this building block is you can kind of say, do an AI compare the stuff that you got and the stuff you expected, do they mean the same thing?” he asked.

Topholt argued, however, that testers should remember that testing AI is just like testing any other type of software.

“You have input that you need answers to and it has to be robust in the way it responds,” he said.

Beyond AI

Taking the long view, and discussing what may lie beyond AI and ChatGPT in the technological landscape, for Topholt that would be about understanding software requirements and being able to turn those requirements into a model.

“Instead of just throwing the hardest problem you had at an LLM, which can’t do logical reasoning, you can chop the problem up into various pieces and you can solve 85% of it with known algorithms, with good stuff that you can just compute,” he said.

“And then you only use the large language model for what it’s exactly it’s good at. Then you can achieve something really special.”

Also he thinks testers can expect to have “a conversation” with the AI in the future about testing issues and you they can visualize this on the screen.

“I think that’s super exciting,” Topholt concluded.


THIS MONTH


Why not become a QA Financial subscriber?

It’s entirely FREE

* Receive our weekly newsletter on Wednesday * Get priority invitations to our Forum events *

REGISTER HERE TODAY


NEXT MONTH IN CANADA

QA Financial is delighted to announce that Tal Barmeir will join us as a speaker at the QA Financial Forum Toronto 2025 Places are limited – register today.


WATCH NOW


READ MORE