9 Nov 2022 | Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, Wieland Brendel
The paper investigates the hypothesis that Convolutional Neural Networks (CNNs) primarily recognize objects by learning complex representations of object shapes, versus the alternative view that textures play a more significant role. Through a series of experiments, the authors find that ImageNet-trained CNNs are biased towards recognizing textures rather than shapes, which contrasts with human behavioral evidence. They demonstrate that a standard CNN architecture (ResNet-50) can learn a shape-based representation when trained on 'Stylized-ImageNet,' a version of ImageNet where textures are replaced with artistic styles. This shape-based representation improves performance in various tasks, including object detection and robustness to image distortions, while maintaining or surpassing human performance in psychophysical experiments. The findings highlight the importance of shape in CNN object recognition and suggest that a shape-based representation can enhance the robustness and accuracy of deep learning models.The paper investigates the hypothesis that Convolutional Neural Networks (CNNs) primarily recognize objects by learning complex representations of object shapes, versus the alternative view that textures play a more significant role. Through a series of experiments, the authors find that ImageNet-trained CNNs are biased towards recognizing textures rather than shapes, which contrasts with human behavioral evidence. They demonstrate that a standard CNN architecture (ResNet-50) can learn a shape-based representation when trained on 'Stylized-ImageNet,' a version of ImageNet where textures are replaced with artistic styles. This shape-based representation improves performance in various tasks, including object detection and robustness to image distortions, while maintaining or surpassing human performance in psychophysical experiments. The findings highlight the importance of shape in CNN object recognition and suggest that a shape-based representation can enhance the robustness and accuracy of deep learning models.