December 12, 1997 | Lutz Prechelt (prechelt@ira.uka.de)
The paper by Lutz Prechelt explores the use of cross-validation to detect overfitting during the training of neural networks and to implement early stopping. Overfitting occurs when a model performs well on the training data but poorly on unseen data, and early stopping can help mitigate this issue. However, the choice of the stopping criterion is often made ad-hoc or interactively, leading to inconsistent results. To address this, Prechelt evaluates 14 different automatic stopping criteria from three classes: generalization loss (GL), progress quotient (PQ), and upcrossing (UP) criteria. These criteria are tested on 12 classification and approximation tasks using multi-layer perceptrons with RPROP training.
The study aims to provide quantitative data to guide the selection of an appropriate stopping criterion. The results show that slower criteria generally lead to better generalization (on average, about 4% improvement), but they also require significantly longer training times (about four times longer). The paper discusses the trade-offs between training time and generalization error, suggesting that the best criteria depend on the specific goals: using GL criteria maximizes the probability of finding a good solution, while PQ and UP criteria minimize the average quality of solutions.
The study concludes by recommending the use of fast stopping criteria unless small improvements in network performance are worth significant increases in training time. For maximizing the probability of finding a good solution, GL criteria are suggested, and for minimizing the average quality of solutions, PQ or UP criteria are recommended based on the extent of overfitting. Future work should aim to validate these findings across different training algorithms, error functions, and problem domains.The paper by Lutz Prechelt explores the use of cross-validation to detect overfitting during the training of neural networks and to implement early stopping. Overfitting occurs when a model performs well on the training data but poorly on unseen data, and early stopping can help mitigate this issue. However, the choice of the stopping criterion is often made ad-hoc or interactively, leading to inconsistent results. To address this, Prechelt evaluates 14 different automatic stopping criteria from three classes: generalization loss (GL), progress quotient (PQ), and upcrossing (UP) criteria. These criteria are tested on 12 classification and approximation tasks using multi-layer perceptrons with RPROP training.
The study aims to provide quantitative data to guide the selection of an appropriate stopping criterion. The results show that slower criteria generally lead to better generalization (on average, about 4% improvement), but they also require significantly longer training times (about four times longer). The paper discusses the trade-offs between training time and generalization error, suggesting that the best criteria depend on the specific goals: using GL criteria maximizes the probability of finding a good solution, while PQ and UP criteria minimize the average quality of solutions.
The study concludes by recommending the use of fast stopping criteria unless small improvements in network performance are worth significant increases in training time. For maximizing the probability of finding a good solution, GL criteria are suggested, and for minimizing the average quality of solutions, PQ or UP criteria are recommended based on the extent of overfitting. Future work should aim to validate these findings across different training algorithms, error functions, and problem domains.