12 Aug 2024 | Sukrut Rao*, 1, 2, Sweta Mahajan*, 1, 2, Moritz Böhle1, and Bernt Schiele1
The paper "Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery" introduces a novel approach called Discover-then-Name CBM (DN-CBM) to address the "black-box" issue of deep neural networks. DN-CBM leverages sparse autoencoders to discover and automatically name concepts learned by CLIP, a vision-language model, and then uses these concepts as a concept bottleneck for classification. The key contributions of the paper are:
1. **Discovering Concepts**: The authors use sparse autoencoders to extract disentangled concepts from CLIP features. These concepts are then named by matching the dictionary vectors with the closest text embeddings in CLIP space from a large concept set.
2. **Automated Concept Naming**: The named concepts are used to train linear classifiers for classification tasks. The method is task-agnostic, meaning it does not require task-specific concept sets or external language models.
3. **Performance and Interpretability**: The proposed method yields semantically meaningful concepts, appropriate names, and performant and interpretable CBMs across multiple datasets and CLIP architectures.
The paper evaluates the effectiveness of the discovered and named concepts through qualitative and quantitative assessments, including user studies and comparisons with state-of-the-art methods. The results show that DN-CBM outperforms baselines in classification accuracy and provides interpretable explanations for model decisions. The method is also shown to be robust to different datasets and feature extractors, demonstrating its generalizability and efficiency.The paper "Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery" introduces a novel approach called Discover-then-Name CBM (DN-CBM) to address the "black-box" issue of deep neural networks. DN-CBM leverages sparse autoencoders to discover and automatically name concepts learned by CLIP, a vision-language model, and then uses these concepts as a concept bottleneck for classification. The key contributions of the paper are:
1. **Discovering Concepts**: The authors use sparse autoencoders to extract disentangled concepts from CLIP features. These concepts are then named by matching the dictionary vectors with the closest text embeddings in CLIP space from a large concept set.
2. **Automated Concept Naming**: The named concepts are used to train linear classifiers for classification tasks. The method is task-agnostic, meaning it does not require task-specific concept sets or external language models.
3. **Performance and Interpretability**: The proposed method yields semantically meaningful concepts, appropriate names, and performant and interpretable CBMs across multiple datasets and CLIP architectures.
The paper evaluates the effectiveness of the discovered and named concepts through qualitative and quantitative assessments, including user studies and comparisons with state-of-the-art methods. The results show that DN-CBM outperforms baselines in classification accuracy and provides interpretable explanations for model decisions. The method is also shown to be robust to different datasets and feature extractors, demonstrating its generalizability and efficiency.