[slides] Detecting AI-Generated Code Assignments Using Perplexity of Large Language Models

The paper "Detecting AI-Generated Code Assignments Using Perplexity of Large Language Models" by Zhenyu Xu and Victor S. Sheng addresses the challenge of identifying AI-generated code in programming assignments. The authors propose a method that leverages targeted masking perturbation and comprehensive scoring to distinguish between AI-generated and human-composed code. Their approach involves: 1. **Perturbation Mechanism**: Masking code segments with higher perplexity (PPL) more intensely, using a fine-tuned CodeBERT model to fill in the masked portions. 2. **Scoring Mechanism**: Evaluating the overall perplexity, variation of code line perplexity, and burstiness to generate a unified score. Higher scores indicate a higher likelihood of AI-generated code. The paper compares the proposed method with existing detectors, including GPT2-Detector, DetectGPT, RoBERTa-QA, GPTZero, and The Writer AI Detector. The AIGCode Detector outperforms these detectors, achieving higher AUC scores and lower FPR and FNR. The authors also evaluate the detector's robustness against various adversarial attacks, such as regular rewrites, sampling techniques, smarter prompts, and code blending. The study uses the CodeNet dataset and a specialized AIGCode dataset to train and test the detector. The results show that the AIGCode Detector is effective in detecting AI-generated code, with high sensitivity and specificity. The authors discuss limitations, such as the dependency on a single generative model and the need for further evaluation across a broader set of models. Future work aims to improve the detector's efficiency and reliability in real-world applications.The paper "Detecting AI-Generated Code Assignments Using Perplexity of Large Language Models" by Zhenyu Xu and Victor S. Sheng addresses the challenge of identifying AI-generated code in programming assignments. The authors propose a method that leverages targeted masking perturbation and comprehensive scoring to distinguish between AI-generated and human-composed code. Their approach involves: 1. **Perturbation Mechanism**: Masking code segments with higher perplexity (PPL) more intensely, using a fine-tuned CodeBERT model to fill in the masked portions. 2. **Scoring Mechanism**: Evaluating the overall perplexity, variation of code line perplexity, and burstiness to generate a unified score. Higher scores indicate a higher likelihood of AI-generated code. The paper compares the proposed method with existing detectors, including GPT2-Detector, DetectGPT, RoBERTa-QA, GPTZero, and The Writer AI Detector. The AIGCode Detector outperforms these detectors, achieving higher AUC scores and lower FPR and FNR. The authors also evaluate the detector's robustness against various adversarial attacks, such as regular rewrites, sampling techniques, smarter prompts, and code blending. The study uses the CodeNet dataset and a specialized AIGCode dataset to train and test the detector. The results show that the AIGCode Detector is effective in detecting AI-generated code, with high sensitivity and specificity. The authors discuss limitations, such as the dependency on a single generative model and the need for further evaluation across a broader set of models. Future work aims to improve the detector's efficiency and reliability in real-world applications.

Detecting AI-Generated Code Assignments Using Perplexity of Large Language Models

2024 | Zhenyu Xu, Victor S. Sheng