Network Dissection: Quantifying Interpretability of Deep Visual Representations

Network Dissection: Quantifying Interpretability of Deep Visual Representations

19 Apr 2017 | David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba
Network Dissection is a framework for quantifying the interpretability of latent representations in deep neural networks by evaluating the alignment between individual hidden units and semantic concepts. The method uses a broad dataset of visual concepts to score the semantics of hidden units in each intermediate convolutional layer. It identifies units that align with various concepts, including objects, parts, scenes, textures, materials, and colors. The framework is applied to compare latent representations of different networks trained on various tasks, including supervised and self-supervised learning. It also examines the impact of training iterations, network depth, width, dropout, and batch normalization on interpretability. The results show that interpretability is not an inevitable result of discriminative power but a separate quality that must be measured. The method is tested on various CNNs, including AlexNet, VGG, GoogLeNet, and ResNet, and demonstrates that interpretability varies across different architectures and training conditions. The study also shows that interpretability is axis-aligned, and that random rotations can destroy it. The framework is validated using human evaluations and shows that interpretability is not solely determined by discriminative power but by the alignment of hidden units with human-interpretable concepts. The results indicate that different training conditions and architectures affect interpretability, and that batch normalization can significantly reduce it. The study concludes that interpretability is a distinct property that must be measured separately from discriminative power.Network Dissection is a framework for quantifying the interpretability of latent representations in deep neural networks by evaluating the alignment between individual hidden units and semantic concepts. The method uses a broad dataset of visual concepts to score the semantics of hidden units in each intermediate convolutional layer. It identifies units that align with various concepts, including objects, parts, scenes, textures, materials, and colors. The framework is applied to compare latent representations of different networks trained on various tasks, including supervised and self-supervised learning. It also examines the impact of training iterations, network depth, width, dropout, and batch normalization on interpretability. The results show that interpretability is not an inevitable result of discriminative power but a separate quality that must be measured. The method is tested on various CNNs, including AlexNet, VGG, GoogLeNet, and ResNet, and demonstrates that interpretability varies across different architectures and training conditions. The study also shows that interpretability is axis-aligned, and that random rotations can destroy it. The framework is validated using human evaluations and shows that interpretability is not solely determined by discriminative power but by the alignment of hidden units with human-interpretable concepts. The results indicate that different training conditions and architectures affect interpretability, and that batch normalization can significantly reduce it. The study concludes that interpretability is a distinct property that must be measured separately from discriminative power.
Reach us at info@study.space
[slides] Network Dissection%3A Quantifying Interpretability of Deep Visual Representations | StudySpace