19 Feb 2014 | Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus
Deep neural networks have achieved state-of-the-art performance in speech and visual recognition tasks, but their expressive power also leads to counter-intuitive properties. This paper reports two such properties:
1. **Semantic Information in High Layers**: The paper finds that there is no distinction between individual high-level units and random linear combinations of these units. Instead, it suggests that the space of activations, rather than individual units, contains the semantic information in the high layers of neural networks.
2. **Discontinuity in Input-Output Mappings**: Deep neural networks learn input-output mappings that are discontinuous to a significant extent. Small, imperceptible perturbations can cause the network to misclassify an image, and these perturbations are not random artifacts but can be found by maximizing the network's prediction error. These adversarial examples are robust across different networks, even when trained on different subsets of the dataset.
The paper also discusses the framework and experimental results, including the use of adversarial examples to improve model generalization. The findings suggest that deep neural networks have intrinsic blind spots and non-intuitive characteristics, which are connected to the data distribution in a non-obvious way.Deep neural networks have achieved state-of-the-art performance in speech and visual recognition tasks, but their expressive power also leads to counter-intuitive properties. This paper reports two such properties:
1. **Semantic Information in High Layers**: The paper finds that there is no distinction between individual high-level units and random linear combinations of these units. Instead, it suggests that the space of activations, rather than individual units, contains the semantic information in the high layers of neural networks.
2. **Discontinuity in Input-Output Mappings**: Deep neural networks learn input-output mappings that are discontinuous to a significant extent. Small, imperceptible perturbations can cause the network to misclassify an image, and these perturbations are not random artifacts but can be found by maximizing the network's prediction error. These adversarial examples are robust across different networks, even when trained on different subsets of the dataset.
The paper also discusses the framework and experimental results, including the use of adversarial examples to improve model generalization. The findings suggest that deep neural networks have intrinsic blind spots and non-intuitive characteristics, which are connected to the data distribution in a non-obvious way.