QA Financial Forum Chicago | 9 April 2024 | BOOK TICKETS
Close this search box.

Research Review: Reinforcement learning for LLMs


Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation”, a paper authored by Benjamin Steenhoek [pictured] in conjunction with researchers at Microsoft and published via arXiv (3rd October, 2023), proposes Reinforcement Learning from Static Quality Metrics (RLSQM) as a method for using reinforcement learning to improve the quality of unit tests generated by LLMs (Large Language Models).

Reinforcement learning is a type of machine learning in which a programme, referred to as an agent, learns to make decisions by performing actions in its environment and adapting its behaviour in order to maximise the reward it receives. The reward is designed to encourage desirable behaviours, enabling the agent to better operate within its environment and improve performance on the given task.

Large language models (LLMs) are advanced AI programmes that process and generate text, learning from vast amounts of data to understand and predict language patterns. LLMs are also able to generate code and have been employed in the automatic generation of software test scripts.

Steenhoek et al. begin by providing an overview of the use of LLMs for the generation of software testing scripts, before proposing Reinforcement Learning from Static Quality Metrics (RLSQM). RLSQM uses reinforcement learning as a feedback mechanism to update the parameters of the agent (here the LLM), in order to optimise its performance and improve the quality of its generated test cases.

Upon comparison of the proposed RLSQM framework with GPT-4 as a baseline LLM, Steenhoek et al. reported test script quality improvements of 21% against the baseline. The RLSQM also outperformed GPT-4 on four of seven pre-defined performance metrics. The authors conclude that these results indicate the potential for future work in this field and the scope for the implementation of reinforcement learning as a method for improving the code quality output of LLMs.

[Image Source: Iowa State University]