4 Mar 2021 | Dan Hendrycks, Kevin Zhao*, Steven Basart*, Jacob Steinhardt, Dawn Song
The paper introduces two challenging datasets, IMAGENet-A and IMAGENet-O, designed to reliably degrade the performance of machine learning models. These datasets are created using a simple adversarial filtration technique to ensure they contain limited spurious cues, making the models' performance more realistic and harder to improve with standard techniques. IMAGENet-A is similar to the ImageNet test set but is significantly more difficult for existing models, with a DenseNet-121 achieving only about 2% accuracy. IMAGENet-O is an out-of-distribution detection dataset, the first of its kind for ImageNet models, where models struggle to detect anomalies. The paper demonstrates that existing data augmentation techniques and additional training data have limited impact on improving performance, while architectural changes, such as increasing model size or using different architectures, show promise. The datasets highlight shared weaknesses in current models and provide a valuable tool for researchers to study robustness and generalization in computer vision models.The paper introduces two challenging datasets, IMAGENet-A and IMAGENet-O, designed to reliably degrade the performance of machine learning models. These datasets are created using a simple adversarial filtration technique to ensure they contain limited spurious cues, making the models' performance more realistic and harder to improve with standard techniques. IMAGENet-A is similar to the ImageNet test set but is significantly more difficult for existing models, with a DenseNet-121 achieving only about 2% accuracy. IMAGENet-O is an out-of-distribution detection dataset, the first of its kind for ImageNet models, where models struggle to detect anomalies. The paper demonstrates that existing data augmentation techniques and additional training data have limited impact on improving performance, while architectural changes, such as increasing model size or using different architectures, show promise. The datasets highlight shared weaknesses in current models and provide a valuable tool for researchers to study robustness and generalization in computer vision models.