[slides and audio] Do ImageNet Classifiers Generalize to ImageNet%3F

The paper investigates the generalization of image classification models on the CIFAR-10 and ImageNet datasets by creating new test sets that closely follow the original dataset creation processes. The authors find that current models experience significant accuracy drops (3% to 15% on CIFAR-10 and 11% to 14% on ImageNet) when evaluated on these new test sets, but the relative order of model performance remains largely unchanged. This suggests that the accuracy drops are not due to adaptivity but rather to the models' inability to generalize to slightly harder images. The study also demonstrates that the accuracy gains on the original test sets translate to larger gains on the new test sets, indicating that the models' robustness improves with increasing accuracy. The authors propose that the distribution gap, rather than adaptivity, is the primary cause of the accuracy drops, highlighting the importance of careful data cleaning and annotation processes in machine learning research. The paper concludes with suggestions for future work, including understanding adaptive overfitting, characterizing the distribution gap, and developing more robust models.The paper investigates the generalization of image classification models on the CIFAR-10 and ImageNet datasets by creating new test sets that closely follow the original dataset creation processes. The authors find that current models experience significant accuracy drops (3% to 15% on CIFAR-10 and 11% to 14% on ImageNet) when evaluated on these new test sets, but the relative order of model performance remains largely unchanged. This suggests that the accuracy drops are not due to adaptivity but rather to the models' inability to generalize to slightly harder images. The study also demonstrates that the accuracy gains on the original test sets translate to larger gains on the new test sets, indicating that the models' robustness improves with increasing accuracy. The authors propose that the distribution gap, rather than adaptivity, is the primary cause of the accuracy drops, highlighting the importance of careful data cleaning and annotation processes in machine learning research. The paper concludes with suggestions for future work, including understanding adaptive overfitting, characterizing the distribution gap, and developing more robust models.

Do ImageNet Classifiers Generalize to ImageNet?

12 Jun 2019 | Benjamin Recht*, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar