2012 September | Sjors H.W. Scheres and Shaoxia Chen
In single-particle cryo-EM structure determination, overfitting of data is a major concern. Overfitting occurs when the model fits the noise in the data rather than the true structure, leading to inflated resolution estimates. A common method to prevent overfitting is low-pass filtering, but the effective frequencies of these filters are often based on suboptimal Fourier Shell Correlation (FSC) procedures. The suboptimal procedure uses a single model to determine the relative orientations of all particles, which can introduce bias and inflate resolution estimates. To illustrate this, a simulated dataset of GroEL particles showed that the reported resolution was 4.6 Å, while the true resolution was 7.8 Å. The authors propose two solutions to prevent overfitting: refining two models independently (gold-standard FSC) or limiting the data used for orientation determination to a user-specified frequency. However, these methods are not widely used due to concerns about reduced orientation accuracy. Analysis of simulated data with realistic signal-to-noise ratios showed that excluding high-frequency terms or using a model reconstructed from only half of the particles did not affect orientation accuracy. The authors tested their hypothesis using three cryo-EM datasets and found that the gold-standard procedure did not result in lower resolution compared to the conventional procedure. In fact, for the β-galactosidase data, the gold-standard procedure yielded a structure that correlated up to higher frequencies with the crystal structure than the conventional procedure, which suffered from severe overfitting. The frequency at which the gold-standard FSC drops below 0.143 is a good indicator of the true resolution. The authors conclude that overfitting using suboptimal FSCs leads to worse orientations and structures, while gold-standard FSCs provide a more accurate estimate of the true signal. The proposed procedures are straightforward to implement and will help eliminate the hazards of overfitting in cryo-EM structure determination.In single-particle cryo-EM structure determination, overfitting of data is a major concern. Overfitting occurs when the model fits the noise in the data rather than the true structure, leading to inflated resolution estimates. A common method to prevent overfitting is low-pass filtering, but the effective frequencies of these filters are often based on suboptimal Fourier Shell Correlation (FSC) procedures. The suboptimal procedure uses a single model to determine the relative orientations of all particles, which can introduce bias and inflate resolution estimates. To illustrate this, a simulated dataset of GroEL particles showed that the reported resolution was 4.6 Å, while the true resolution was 7.8 Å. The authors propose two solutions to prevent overfitting: refining two models independently (gold-standard FSC) or limiting the data used for orientation determination to a user-specified frequency. However, these methods are not widely used due to concerns about reduced orientation accuracy. Analysis of simulated data with realistic signal-to-noise ratios showed that excluding high-frequency terms or using a model reconstructed from only half of the particles did not affect orientation accuracy. The authors tested their hypothesis using three cryo-EM datasets and found that the gold-standard procedure did not result in lower resolution compared to the conventional procedure. In fact, for the β-galactosidase data, the gold-standard procedure yielded a structure that correlated up to higher frequencies with the crystal structure than the conventional procedure, which suffered from severe overfitting. The frequency at which the gold-standard FSC drops below 0.143 is a good indicator of the true resolution. The authors conclude that overfitting using suboptimal FSCs leads to worse orientations and structures, while gold-standard FSCs provide a more accurate estimate of the true signal. The proposed procedures are straightforward to implement and will help eliminate the hazards of overfitting in cryo-EM structure determination.