4 Mar 2021 | Dan Hendrycks, Kevin Zhao*, Steven Basart*, Jacob Steinhardt, Dawn Song
We introduce two challenging datasets, IMAGENET-A and IMAGENET-O, that reliably degrade the performance of machine learning models. These datasets are created using adversarial filtration to remove examples with spurious cues, ensuring that models must rely on true features for accurate classification. IMAGENET-A contains real-world images that are challenging for existing models, while IMAGENET-O contains anomalies not present in ImageNet-1K, testing models' ability to detect out-of-distribution examples. Both datasets demonstrate that computer vision models have shared weaknesses and fail to generalize well to unseen data.
IMAGENET-A examples are designed to be challenging for models, often leading to classification errors despite their natural appearance. These examples transfer reliably across different models, indicating that models have unappreciated shared weaknesses. IMAGENET-O examples are designed to be out-of-distribution, causing models to misclassify them as in-distribution examples with high confidence. These examples are the first of their kind specifically created for testing ImageNet models.
Our experiments show that data augmentation techniques and increasing training data do not significantly improve performance on these datasets. However, improving model architectures provides a promising path toward more robust models. We find that larger and more complex models achieve better performance on both datasets, demonstrating that architectural improvements can enhance robustness.
We also show that vision Transformers, such as DeiT, can transfer to these datasets and achieve performance that is still far from optimal. This indicates that the challenges posed by these datasets are significant and require further research to develop more robust models. Overall, our datasets serve as a benchmark for evaluating model performance under distribution shifts, highlighting the need for more robust and reliable machine learning models.We introduce two challenging datasets, IMAGENET-A and IMAGENET-O, that reliably degrade the performance of machine learning models. These datasets are created using adversarial filtration to remove examples with spurious cues, ensuring that models must rely on true features for accurate classification. IMAGENET-A contains real-world images that are challenging for existing models, while IMAGENET-O contains anomalies not present in ImageNet-1K, testing models' ability to detect out-of-distribution examples. Both datasets demonstrate that computer vision models have shared weaknesses and fail to generalize well to unseen data.
IMAGENET-A examples are designed to be challenging for models, often leading to classification errors despite their natural appearance. These examples transfer reliably across different models, indicating that models have unappreciated shared weaknesses. IMAGENET-O examples are designed to be out-of-distribution, causing models to misclassify them as in-distribution examples with high confidence. These examples are the first of their kind specifically created for testing ImageNet models.
Our experiments show that data augmentation techniques and increasing training data do not significantly improve performance on these datasets. However, improving model architectures provides a promising path toward more robust models. We find that larger and more complex models achieve better performance on both datasets, demonstrating that architectural improvements can enhance robustness.
We also show that vision Transformers, such as DeiT, can transfer to these datasets and achieve performance that is still far from optimal. This indicates that the challenges posed by these datasets are significant and require further research to develop more robust models. Overall, our datasets serve as a benchmark for evaluating model performance under distribution shifts, highlighting the need for more robust and reliable machine learning models.