18 May 2024 | Zhenyu Mao1, Jialong Li1*, Dongming Jin2, Munan Li3, and Kenji Tei4
This paper introduces a multi-role approach to enhance vulnerability detection using large language models (LLMs). The approach simulates a real-life code review process by having LLMs act as different roles, including both developers and testers, to engage in discussions and reach a consensus on the existence and classification of vulnerabilities. The method consists of three stages: initialization, discussion, and conclusion. In the initialization stage, the tester provides an initial judgment with reasoning. The discussion stage involves iterative exchanges between the tester and the developer to resolve differing opinions. The conclusion stage summarizes the discussions and outputs the final judgment. Preliminary evaluations using a C/C++ dataset show a 13.48% increase in precision rate, an 18.25% increase in recall rate, and a 16.13% increase in F1 score compared to a single-role approach. The results highlight the effectiveness of the multi-role approach, especially when the dataset contains a higher proportion of vulnerable code. Future work aims to integrate in-context learning to further improve collaboration among LLMs.This paper introduces a multi-role approach to enhance vulnerability detection using large language models (LLMs). The approach simulates a real-life code review process by having LLMs act as different roles, including both developers and testers, to engage in discussions and reach a consensus on the existence and classification of vulnerabilities. The method consists of three stages: initialization, discussion, and conclusion. In the initialization stage, the tester provides an initial judgment with reasoning. The discussion stage involves iterative exchanges between the tester and the developer to resolve differing opinions. The conclusion stage summarizes the discussions and outputs the final judgment. Preliminary evaluations using a C/C++ dataset show a 13.48% increase in precision rate, an 18.25% increase in recall rate, and a 16.13% increase in F1 score compared to a single-role approach. The results highlight the effectiveness of the multi-role approach, especially when the dataset contains a higher proportion of vulnerable code. Future work aims to integrate in-context learning to further improve collaboration among LLMs.