Intriguing properties of neural networks

Intriguing properties of neural networks

19 Feb 2014 | Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus
Deep neural networks, despite their high expressiveness and success in speech and visual recognition, can learn uninterpretable and counter-intuitive properties. This paper reports two such properties: (1) high-level units in neural networks are semantically indistinguishable from random linear combinations, suggesting that semantic information resides in the space rather than individual units. (2) Neural networks can be highly sensitive to small, imperceptible perturbations in input, leading to adversarial examples that misclassify images. These examples are not random but are robust across different networks and training data, indicating a fundamental property of deep learning models. The first property challenges the assumption that neural networks disentangle variation factors across coordinates. Instead, the entire space of activations contains the semantic information. This is supported by experiments showing that random directions in the activation space produce similarly interpretable semantic properties as the natural basis. The second property highlights the instability of neural networks with respect to small input perturbations. Adversarial examples, which are imperceptibly different from original inputs, can cause networks to misclassify images. These examples are not artifacts of training but are robust across different networks and training data, suggesting that deep neural networks have intrinsic blind spots. The paper also discusses the implications of these findings for model interpretation and robustness. Adversarial examples challenge the assumption that neural networks are robust to small input changes, which is crucial for applications where safety and reliability are important. The results suggest that deep neural networks have non-intuitive characteristics that are connected to the data distribution in a non-obvious way. The paper concludes that further research is needed to understand and address these properties.Deep neural networks, despite their high expressiveness and success in speech and visual recognition, can learn uninterpretable and counter-intuitive properties. This paper reports two such properties: (1) high-level units in neural networks are semantically indistinguishable from random linear combinations, suggesting that semantic information resides in the space rather than individual units. (2) Neural networks can be highly sensitive to small, imperceptible perturbations in input, leading to adversarial examples that misclassify images. These examples are not random but are robust across different networks and training data, indicating a fundamental property of deep learning models. The first property challenges the assumption that neural networks disentangle variation factors across coordinates. Instead, the entire space of activations contains the semantic information. This is supported by experiments showing that random directions in the activation space produce similarly interpretable semantic properties as the natural basis. The second property highlights the instability of neural networks with respect to small input perturbations. Adversarial examples, which are imperceptibly different from original inputs, can cause networks to misclassify images. These examples are not artifacts of training but are robust across different networks and training data, suggesting that deep neural networks have intrinsic blind spots. The paper also discusses the implications of these findings for model interpretation and robustness. Adversarial examples challenge the assumption that neural networks are robust to small input changes, which is crucial for applications where safety and reliability are important. The results suggest that deep neural networks have non-intuitive characteristics that are connected to the data distribution in a non-obvious way. The paper concludes that further research is needed to understand and address these properties.
Reach us at info@study.space
Understanding Intriguing properties of neural networks