18 Jan 2024 | Hao Li, Gopi Krishnan Rajbahadur, Dayi Lin, Cor-Paul Bezemer, and Zhen Ming (Jack) Jiang
This paper proposes OverfitGuard, a history-based approach to detect and prevent overfitting in deep learning (DL) models. Overfitting is a critical issue in DL models used in software engineering (SE), as it leads to poor generalization, inaccurate predictions, and wasted resources. Current approaches for overfitting detection and prevention, such as correlation-based methods and early stopping, have limitations, including high computational costs, intrusive modifications, and suboptimal performance. OverfitGuard leverages training history (i.e., validation loss curves) to detect overfitting and prevent it by training a time series classifier on labeled training histories of overfit and non-overfit DL models. The classifier is then used to detect overfitting in trained models and to determine the optimal point to stop training.
The approach is evaluated on a real-world dataset of labeled training histories collected from papers published in top AI venues over the past five years. The results show that OverfitGuard achieves an F1 score of 0.91, which is at least 5% higher than the current best-performing non-intrusive overfitting detection approach. Additionally, OverfitGuard can stop training earlier than early stopping, at least 32% of the time, while maintaining or improving the rate of returning the best model.
The paper also discusses the limitations of the approach, including the computational cost of time series classifiers and the difficulty of collecting authoritative examples of overfit training histories. However, the authors suggest that future work could optimize time series classifiers for real-time overfitting detection and prevention with smaller delays, and explore the use of additional real-world examples or online training to improve performance. OverfitGuard is implemented as a replication package, providing trained classifiers and labeled training histories for reuse by other researchers.This paper proposes OverfitGuard, a history-based approach to detect and prevent overfitting in deep learning (DL) models. Overfitting is a critical issue in DL models used in software engineering (SE), as it leads to poor generalization, inaccurate predictions, and wasted resources. Current approaches for overfitting detection and prevention, such as correlation-based methods and early stopping, have limitations, including high computational costs, intrusive modifications, and suboptimal performance. OverfitGuard leverages training history (i.e., validation loss curves) to detect overfitting and prevent it by training a time series classifier on labeled training histories of overfit and non-overfit DL models. The classifier is then used to detect overfitting in trained models and to determine the optimal point to stop training.
The approach is evaluated on a real-world dataset of labeled training histories collected from papers published in top AI venues over the past five years. The results show that OverfitGuard achieves an F1 score of 0.91, which is at least 5% higher than the current best-performing non-intrusive overfitting detection approach. Additionally, OverfitGuard can stop training earlier than early stopping, at least 32% of the time, while maintaining or improving the rate of returning the best model.
The paper also discusses the limitations of the approach, including the computational cost of time series classifiers and the difficulty of collecting authoritative examples of overfit training histories. However, the authors suggest that future work could optimize time series classifiers for real-time overfitting detection and prevention with smaller delays, and explore the use of additional real-world examples or online training to improve performance. OverfitGuard is implemented as a replication package, providing trained classifiers and labeled training histories for reuse by other researchers.