The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

24 Jul 2021 | Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer
This paper introduces four new real-world distribution shift datasets and a new data augmentation method to evaluate the robustness of models to out-of-distribution (OOD) generalization. The datasets include ImageNet-Renditions (ImageNet-R), StreetView StoreFronts (SVSF), DeepFashion Remixed (DFR), and Real Blurry Images, each capturing different types of natural distribution shifts. The new data augmentation method, DeepAugment, improves robustness on these datasets and outperforms models pretrained on 1000× more labeled data. The study evaluates four methods for improving robustness: larger models, self-attention, diverse data augmentation, and pretraining. Results show that larger models and diverse data augmentation improve robustness on real-world distribution shifts, contrary to some prior claims. However, these methods do not consistently improve robustness across all types of distribution shifts, such as geographic changes. The study also finds that synthetic robustness benchmarks can transfer to real-world distribution shifts, contradicting some prior work. The paper highlights that robustness is not a single scalar property but varies across different distribution shifts. It shows that some methods, like diverse data augmentation, improve robustness on certain types of distribution shifts but not others. The study also demonstrates that synthetic corruptions in ImageNet-C correlate with real-world corruptions, suggesting that ImageNet-C can serve as a proxy for real-world robustness. The results indicate that future research must study multiple distribution shifts simultaneously, as no single method consistently improves robustness across all scenarios. The paper concludes that robustness research requires more thorough evaluation using diverse and realistic datasets. The findings challenge some prior assumptions and emphasize the importance of considering multiple distribution shifts in robustness studies.This paper introduces four new real-world distribution shift datasets and a new data augmentation method to evaluate the robustness of models to out-of-distribution (OOD) generalization. The datasets include ImageNet-Renditions (ImageNet-R), StreetView StoreFronts (SVSF), DeepFashion Remixed (DFR), and Real Blurry Images, each capturing different types of natural distribution shifts. The new data augmentation method, DeepAugment, improves robustness on these datasets and outperforms models pretrained on 1000× more labeled data. The study evaluates four methods for improving robustness: larger models, self-attention, diverse data augmentation, and pretraining. Results show that larger models and diverse data augmentation improve robustness on real-world distribution shifts, contrary to some prior claims. However, these methods do not consistently improve robustness across all types of distribution shifts, such as geographic changes. The study also finds that synthetic robustness benchmarks can transfer to real-world distribution shifts, contradicting some prior work. The paper highlights that robustness is not a single scalar property but varies across different distribution shifts. It shows that some methods, like diverse data augmentation, improve robustness on certain types of distribution shifts but not others. The study also demonstrates that synthetic corruptions in ImageNet-C correlate with real-world corruptions, suggesting that ImageNet-C can serve as a proxy for real-world robustness. The results indicate that future research must study multiple distribution shifts simultaneously, as no single method consistently improves robustness across all scenarios. The paper concludes that robustness research requires more thorough evaluation using diverse and realistic datasets. The findings challenge some prior assumptions and emphasize the importance of considering multiple distribution shifts in robustness studies.
Reach us at info@study.space