Multi-role Consensus through LLMs Discussions for Vulnerability Detection

Multi-role Consensus through LLMs Discussions for Vulnerability Detection

18 May 2024 | Zhenyu Mao, Jialong Li, Dongming Jin, Munan Li, and Kenji Tei
This paper introduces a multi-role approach for vulnerability detection using large language models (LLMs), simulating a real-life code review process by having LLMs act as different roles (e.g., testers and developers) to reach a consensus on the existence and classification of vulnerabilities. The approach involves three stages: initialization, discussion, and conclusion. In the initialization stage, the tester provides an initial judgment based on a prompt. In the discussion stage, the tester and developer engage in iterative dialogue to refine their judgments. The conclusion stage records the final consensus. Preliminary evaluations show that this approach improves vulnerability detection performance, with a 13.48% increase in precision, 18.25% increase in recall, and 16.13% increase in F1 score compared to a single-role approach. The multi-role approach also increases computational costs by 484% due to the need for role-based discussions. The results indicate that the approach is particularly effective when the proportion of vulnerable data in the testing dataset is higher, as the discussions allow LLMs to explore a broader range of potential vulnerabilities. The paper concludes that this approach enhances vulnerability detection by integrating diverse perspectives, and future work should focus on improving collaboration through in-context learning.This paper introduces a multi-role approach for vulnerability detection using large language models (LLMs), simulating a real-life code review process by having LLMs act as different roles (e.g., testers and developers) to reach a consensus on the existence and classification of vulnerabilities. The approach involves three stages: initialization, discussion, and conclusion. In the initialization stage, the tester provides an initial judgment based on a prompt. In the discussion stage, the tester and developer engage in iterative dialogue to refine their judgments. The conclusion stage records the final consensus. Preliminary evaluations show that this approach improves vulnerability detection performance, with a 13.48% increase in precision, 18.25% increase in recall, and 16.13% increase in F1 score compared to a single-role approach. The multi-role approach also increases computational costs by 484% due to the need for role-based discussions. The results indicate that the approach is particularly effective when the proportion of vulnerable data in the testing dataset is higher, as the discussions allow LLMs to explore a broader range of potential vulnerabilities. The paper concludes that this approach enhances vulnerability detection by integrating diverse perspectives, and future work should focus on improving collaboration through in-context learning.
Reach us at info@study.space