18 Jan 2024 | Hao Li*, Gopi Krishnan Rajbhadur†, Dayi Lin†, Cor-Paul Bezemer*, and Zhen Ming (Jack) Jiang†
This paper addresses the challenge of overfitting in deep learning (DL) models used for critical tasks in software engineering, such as bug detection and code review. Overfitting can lead to inaccurate predictions, misleading feature importance, and wasted resources. Current methods for preventing and detecting overfitting often require modifying the model structure or consuming high computational resources. The authors propose OverfitGuard, a novel approach that leverages training history to both detect and prevent overfitting. OverfitGuard trains a time series classifier on simulated training histories of overfit models to identify overfitting in trained models. The classifier can also be used to prevent overfitting by identifying the optimal point to stop training. The approach is evaluated on real-world samples, showing an F1 score of 0.91, which is at least 5% higher than the best non-intrusive overfitting detection approach. Additionally, OverfitGuard can stop training at least 32% earlier than early stopping while maintaining or improving the rate of reaching the optimal model. The paper provides a replication package containing the trained classifiers and labeled training histories for researchers to use.This paper addresses the challenge of overfitting in deep learning (DL) models used for critical tasks in software engineering, such as bug detection and code review. Overfitting can lead to inaccurate predictions, misleading feature importance, and wasted resources. Current methods for preventing and detecting overfitting often require modifying the model structure or consuming high computational resources. The authors propose OverfitGuard, a novel approach that leverages training history to both detect and prevent overfitting. OverfitGuard trains a time series classifier on simulated training histories of overfit models to identify overfitting in trained models. The classifier can also be used to prevent overfitting by identifying the optimal point to stop training. The approach is evaluated on real-world samples, showing an F1 score of 0.91, which is at least 5% higher than the best non-intrusive overfitting detection approach. Additionally, OverfitGuard can stop training at least 32% earlier than early stopping while maintaining or improving the rate of reaching the optimal model. The paper provides a replication package containing the trained classifiers and labeled training histories for researchers to use.