Investigating the Efficacy of Large Language Models for Code Clone Detection

Investigating the Efficacy of Large Language Models for Code Clone Detection

April 2024 | Mohamad Khajezade, Jie JW Wu, Fatemeh Hendijani Fard, Gema Rodríguez-Pérez, Mohamed Sami Shehata
This paper investigates the effectiveness of Large Language Models (LLMs) for Code Clone Detection (CCD), a non-generative task. The study focuses on Type-4 code clones, which are challenging due to their syntactic and structural differences but similar functionality. The researchers built a mono-lingual and cross-lingual CCD dataset from CodeNet and evaluated ChatGPT's performance in detecting Type-4 clones in Java-Java and Java-Ruby pairs using a zero-shot setting. They designed two prompts to guide ChatGPT: one that directly asks if two code snippets are clones, and another that asks if they solve the same problem with the same inputs and outputs. The results showed that ChatGPT outperformed baselines in cross-language CCD, achieving an F1-score of 0.877, and performed comparably to fully fine-tuned models in mono-lingual CCD, with an F1-score of 0.878. The performance of ChatGPT was also influenced by the prompt and problem difficulty. The study highlights the potential of LLMs for CCD, particularly for Type-4 clones, and suggests that further research is needed to understand the factors affecting their performance. The findings indicate that while LLMs show promise, they still face challenges in cross-language CCD and may require additional training or fine-tuning for better results. The study also emphasizes the importance of problem complexity and the need for further investigation into the underlying reasons for the observed performance differences.This paper investigates the effectiveness of Large Language Models (LLMs) for Code Clone Detection (CCD), a non-generative task. The study focuses on Type-4 code clones, which are challenging due to their syntactic and structural differences but similar functionality. The researchers built a mono-lingual and cross-lingual CCD dataset from CodeNet and evaluated ChatGPT's performance in detecting Type-4 clones in Java-Java and Java-Ruby pairs using a zero-shot setting. They designed two prompts to guide ChatGPT: one that directly asks if two code snippets are clones, and another that asks if they solve the same problem with the same inputs and outputs. The results showed that ChatGPT outperformed baselines in cross-language CCD, achieving an F1-score of 0.877, and performed comparably to fully fine-tuned models in mono-lingual CCD, with an F1-score of 0.878. The performance of ChatGPT was also influenced by the prompt and problem difficulty. The study highlights the potential of LLMs for CCD, particularly for Type-4 clones, and suggests that further research is needed to understand the factors affecting their performance. The findings indicate that while LLMs show promise, they still face challenges in cross-language CCD and may require additional training or fine-tuning for better results. The study also emphasizes the importance of problem complexity and the need for further investigation into the underlying reasons for the observed performance differences.
Reach us at info@study.space
Understanding Investigating the Efficacy of Large Language Models for Code Clone Detection