[slides and audio] Beyond Concept Bottleneck Models%3A How to Make Black Boxes Intervenable%3F

This paper explores the concept of *intervenability* in black-box neural networks, aiming to enable instance-specific interventions based on high-level, human-understandable concepts. The authors introduce a method to perform such interventions on pretrained neural networks without explicit concept labels, using only a small validation set with concept labels. They formalize *intervenability* as a measure of the effectiveness of these interventions and propose a fine-tuning procedure to improve it. Empirical evaluations on synthetic tabular, natural image, and medical imaging datasets show that their approach significantly enhances the effectiveness of interventions compared to other baselines. The method is particularly effective in scenarios where concept labels are generated using vision-language models (VLMs), reducing the need for human annotation. The authors also demonstrate the practical utility of their techniques in chest X-ray classification, where black-box models are not naturally intervenable. Overall, the paper contributes to the field of interpretable machine learning by providing a novel approach to making black-box models more interactive and interpretable.This paper explores the concept of *intervenability* in black-box neural networks, aiming to enable instance-specific interventions based on high-level, human-understandable concepts. The authors introduce a method to perform such interventions on pretrained neural networks without explicit concept labels, using only a small validation set with concept labels. They formalize *intervenability* as a measure of the effectiveness of these interventions and propose a fine-tuning procedure to improve it. Empirical evaluations on synthetic tabular, natural image, and medical imaging datasets show that their approach significantly enhances the effectiveness of interventions compared to other baselines. The method is particularly effective in scenarios where concept labels are generated using vision-language models (VLMs), reducing the need for human annotation. The authors also demonstrate the practical utility of their techniques in chest X-ray classification, where black-box models are not naturally intervenable. Overall, the paper contributes to the field of interpretable machine learning by providing a novel approach to making black-box models more interactive and interpretable.

Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

26 Oct 2024 | Sonia Laguna, Ričards Marcinkevičs, Moritz Vandenhirtz, Julia E. Vogt