12 Aug 2024 | Sukrut Rao*, Sweta Mahajan*, Moritz Böhle, and Bernt Schiele
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
This paper introduces a novel concept bottleneck model (CBM) called Discover-then-Name-CBM (DN-CBM), which automatically discovers and names concepts learned by CLIP without prior knowledge of the task. The approach involves three steps: (1) using sparse autoencoders (SAEs) to extract disentangled concepts from CLIP feature extractors, (2) automatically naming the extracted concepts by matching dictionary vectors with the closest text embeddings in CLIP space, and (3) using the named concept extractor layer as a concept bottleneck to create CBMs for classification on different datasets. The method is efficient and agnostic to the downstream task, and uses concepts already known to the model. The paper evaluates the method across multiple datasets and CLIP architectures, showing that it yields semantically meaningful concepts, assigns appropriate names, and produces performant and interpretable CBMs. The code is available at https://github.com/neuroexplicit-saar/discover-then-name.
Concept Bottleneck Models (CBMs) are inherently interpretable models that express their prediction as a linear combination of simpler but human-interpretable concepts detected from the input features. While typically constrained by the need of a labelled attribute dataset for training, recent CBMs leverage large-language models (LLMs) such as GPT-3 to generate class-specific concepts and vision-language models (VLMs) such as CLIP to learn the mapping from inputs to concepts in an attribute-label-free manner. However, such methods still require querying LLMs based on the classification task, and it is unclear if the concepts one wants the model to detect can be detected at all; in fact, recent works have suggested that while plausible, explanations from such CBMs may not be faithful.
To address this, the paper inverts the typical CBM paradigm, and aims to discover concepts the model knows, name them, and then perform classification. The approach uses CLIP feature extractors to leverage vision-language alignment for automated naming of concepts. While raw features of a network are typically uninterpretable, sparse autoencoders (SAEs) have been shown to be a promising tool in the context of language models wherein they disentangle learned representations into a sparse set of human-understandable concepts. This is achieved by decomposing the representations into a sparse linear combination of a set of learned dictionary vectors. The paper extends this to vision and finds it to be similarly promising, and surprisingly, finds that the dictionary vectors appear to align well with text embeddings of concepts they represent in CLIP space, thus making their corresponding concepts nameable. Finally, the paper uses this latent concept space as a concept bottleneck, and shows that, once learnt, it can be frozen and used 'as is' to train classifiers to construct performant CBMs for a variety of downstream classification tasks.
The approach is also computationally efficient sinceDiscover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
This paper introduces a novel concept bottleneck model (CBM) called Discover-then-Name-CBM (DN-CBM), which automatically discovers and names concepts learned by CLIP without prior knowledge of the task. The approach involves three steps: (1) using sparse autoencoders (SAEs) to extract disentangled concepts from CLIP feature extractors, (2) automatically naming the extracted concepts by matching dictionary vectors with the closest text embeddings in CLIP space, and (3) using the named concept extractor layer as a concept bottleneck to create CBMs for classification on different datasets. The method is efficient and agnostic to the downstream task, and uses concepts already known to the model. The paper evaluates the method across multiple datasets and CLIP architectures, showing that it yields semantically meaningful concepts, assigns appropriate names, and produces performant and interpretable CBMs. The code is available at https://github.com/neuroexplicit-saar/discover-then-name.
Concept Bottleneck Models (CBMs) are inherently interpretable models that express their prediction as a linear combination of simpler but human-interpretable concepts detected from the input features. While typically constrained by the need of a labelled attribute dataset for training, recent CBMs leverage large-language models (LLMs) such as GPT-3 to generate class-specific concepts and vision-language models (VLMs) such as CLIP to learn the mapping from inputs to concepts in an attribute-label-free manner. However, such methods still require querying LLMs based on the classification task, and it is unclear if the concepts one wants the model to detect can be detected at all; in fact, recent works have suggested that while plausible, explanations from such CBMs may not be faithful.
To address this, the paper inverts the typical CBM paradigm, and aims to discover concepts the model knows, name them, and then perform classification. The approach uses CLIP feature extractors to leverage vision-language alignment for automated naming of concepts. While raw features of a network are typically uninterpretable, sparse autoencoders (SAEs) have been shown to be a promising tool in the context of language models wherein they disentangle learned representations into a sparse set of human-understandable concepts. This is achieved by decomposing the representations into a sparse linear combination of a set of learned dictionary vectors. The paper extends this to vision and finds it to be similarly promising, and surprisingly, finds that the dictionary vectors appear to align well with text embeddings of concepts they represent in CLIP space, thus making their corresponding concepts nameable. Finally, the paper uses this latent concept space as a concept bottleneck, and shows that, once learnt, it can be frozen and used 'as is' to train classifiers to construct performant CBMs for a variety of downstream classification tasks.
The approach is also computationally efficient since