Source Code Clone Detection Using Unsupervised Similarity Measures

Source Code Clone Detection Using Unsupervised Similarity Measures

6 Feb 2024 | Jorge Martinez-Gil
This paper presents a comparative analysis of unsupervised similarity measures for source code clone detection. The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses, and to guide software engineers in selecting appropriate methods for their specific use cases. The study evaluates various unsupervised strategies, including token comparison and embedding comparison, using a benchmark dataset to assess their performance in terms of accuracy, time consumption, and practical feasibility. The results indicate that several measures can be effective for source code clone detection, but the choice of measure depends on the specific requirements and constraints of the task. The paper also discusses the importance of unsupervised measures in addressing the challenges of code duplication and provides future research directions, such as hybrid approaches and transfer learning techniques.This paper presents a comparative analysis of unsupervised similarity measures for source code clone detection. The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses, and to guide software engineers in selecting appropriate methods for their specific use cases. The study evaluates various unsupervised strategies, including token comparison and embedding comparison, using a benchmark dataset to assess their performance in terms of accuracy, time consumption, and practical feasibility. The results indicate that several measures can be effective for source code clone detection, but the choice of measure depends on the specific requirements and constraints of the task. The paper also discusses the importance of unsupervised measures in addressing the challenges of code duplication and provides future research directions, such as hybrid approaches and transfer learning techniques.
Reach us at info@study.space