This paper presents a novel method for detecting AI-generated code assignments using perplexity analysis and targeted perturbations. The proposed AIGCode detector leverages a combination of masking and scoring techniques to distinguish between AI-generated and human-written code. The method involves masking segments of code with higher perplexity and then using a fine-tuned CodeBERT model to fill in the masked portions. The resulting code is then scored based on perplexity, standard deviation of perplexity across lines, and burstiness. A higher score for the original code suggests it is more likely to be AI-generated, as AI-generated code typically has lower perplexity and is less affected by perturbations.
The AIGCode detector outperforms existing open-source and commercial text detectors, particularly in detecting code submissions generated by OpenAI's text-davinci-003, raising the average AUC from 0.56 (GPTZero baseline) to 0.87. The method is robust against various adversarial attacks, including regular rewrites, sampling techniques, smarter prompts, and code blending. The detector's performance is evaluated across six programming languages and compared with other detectors, showing high AUC, low FPR, and FNR.
The paper also discusses the limitations of the current approach, including its reliance on a single generative model and the potential for bias. Future work aims to evaluate the method across a broader range of code generation models and address data imbalance issues. The AIGCode detector is an innovative tool for maintaining academic integrity and promoting the responsible use of AI in programming education.This paper presents a novel method for detecting AI-generated code assignments using perplexity analysis and targeted perturbations. The proposed AIGCode detector leverages a combination of masking and scoring techniques to distinguish between AI-generated and human-written code. The method involves masking segments of code with higher perplexity and then using a fine-tuned CodeBERT model to fill in the masked portions. The resulting code is then scored based on perplexity, standard deviation of perplexity across lines, and burstiness. A higher score for the original code suggests it is more likely to be AI-generated, as AI-generated code typically has lower perplexity and is less affected by perturbations.
The AIGCode detector outperforms existing open-source and commercial text detectors, particularly in detecting code submissions generated by OpenAI's text-davinci-003, raising the average AUC from 0.56 (GPTZero baseline) to 0.87. The method is robust against various adversarial attacks, including regular rewrites, sampling techniques, smarter prompts, and code blending. The detector's performance is evaluated across six programming languages and compared with other detectors, showing high AUC, low FPR, and FNR.
The paper also discusses the limitations of the current approach, including its reliance on a single generative model and the potential for bias. Future work aims to evaluate the method across a broader range of code generation models and address data imbalance issues. The AIGCode detector is an innovative tool for maintaining academic integrity and promoting the responsible use of AI in programming education.