30 Jan 2024 | Mohamad Khajezade, Jie JW Wu, Fatemeh Hendijani Fard, Gema Rodriguez-Perez, Mohamed Sami Shehata
This study investigates the efficacy of Large Language Models (LLMs) for Code Clone Detection (CCD), specifically focusing on Type-4 code clones, which are functionally identical but syntactically different. The researchers used ChatGPT to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot setting. They built a mono-lingual and cross-lingual CCD dataset from CodeNet and evaluated ChatGPT's performance using different prompts. The results show that ChatGPT outperforms baselines in cross-language CCD with an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD with an F1-score of 0.878. The study also found that the prompt and problem difficulty level significantly impact ChatGPT's performance. The researchers discuss the implications of their findings and suggest future directions, including using other LLMs pre-trained on code and exploring the relationship between performance and specific programming languages.This study investigates the efficacy of Large Language Models (LLMs) for Code Clone Detection (CCD), specifically focusing on Type-4 code clones, which are functionally identical but syntactically different. The researchers used ChatGPT to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot setting. They built a mono-lingual and cross-lingual CCD dataset from CodeNet and evaluated ChatGPT's performance using different prompts. The results show that ChatGPT outperforms baselines in cross-language CCD with an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD with an F1-score of 0.878. The study also found that the prompt and problem difficulty level significantly impact ChatGPT's performance. The researchers discuss the implications of their findings and suggest future directions, including using other LLMs pre-trained on code and exploring the relationship between performance and specific programming languages.