[slides] pyvene%3A A Library for Understanding and Improving PyTorch Models via Interventions

pyvene is an open-source Python library that enables customizable interventions on various PyTorch models. It supports complex intervention schemes with an intuitive configuration format, allowing interventions to be static or include trainable parameters. The library provides a unified and extensible framework for performing interventions on neural models and sharing intervened models with others. It is used for interpretability analyses using causal abstraction and knowledge localization. pyvene is published through Python Package Index (PyPI) and provides code, documentation, and tutorials at https://github.com/stanfordnlp/pyvene. The library supports interventions at multiple locations, involving arbitrary subsets of neurons, and can be performed in parallel or in sequence. It supports recurrent and non-recurrent models, including simple feed-forward networks, Transformers, and recurrent and convolutional neural models. pyvene allows for interventions such as zero-out, interchange, addition, and activation collection. It also supports trainable interventions, such as Distributed Alignment Search (DAS), and can be used for training models to be robust to certain noising processes. In Case Study I, pyvene is used to replicate the main result in Meng et al. (2022)'s work on locating factual associations in GPT2-XL. In Case Study II, pyvene is used to demonstrate intervention and probe training with Pythia-6.9B for localizing gender in hidden representations. The library is designed to support complex intervention schemes, but this comes at the cost of computational efficiency. As language models grow larger, the goal is to investigate how to scale intervention efficiency with multi-node and multi-GPU training. pyvene is an open-source Python library that supports intervention-based research on neural models. It supports customizable interventions with complex intervention schemes as well as different families of model architectures, and intervened models are shareable with others through online model hubs such as HuggingFace. The hope is that pyvene can be a powerful tool for discovering new ways in which interventions can help us explain and improve models.pyvene is an open-source Python library that enables customizable interventions on various PyTorch models. It supports complex intervention schemes with an intuitive configuration format, allowing interventions to be static or include trainable parameters. The library provides a unified and extensible framework for performing interventions on neural models and sharing intervened models with others. It is used for interpretability analyses using causal abstraction and knowledge localization. pyvene is published through Python Package Index (PyPI) and provides code, documentation, and tutorials at https://github.com/stanfordnlp/pyvene. The library supports interventions at multiple locations, involving arbitrary subsets of neurons, and can be performed in parallel or in sequence. It supports recurrent and non-recurrent models, including simple feed-forward networks, Transformers, and recurrent and convolutional neural models. pyvene allows for interventions such as zero-out, interchange, addition, and activation collection. It also supports trainable interventions, such as Distributed Alignment Search (DAS), and can be used for training models to be robust to certain noising processes. In Case Study I, pyvene is used to replicate the main result in Meng et al. (2022)'s work on locating factual associations in GPT2-XL. In Case Study II, pyvene is used to demonstrate intervention and probe training with Pythia-6.9B for localizing gender in hidden representations. The library is designed to support complex intervention schemes, but this comes at the cost of computational efficiency. As language models grow larger, the goal is to investigate how to scale intervention efficiency with multi-node and multi-GPU training. pyvene is an open-source Python library that supports intervention-based research on neural models. It supports customizable interventions with complex intervention schemes as well as different families of model architectures, and intervened models are shareable with others through online model hubs such as HuggingFace. The hope is that pyvene can be a powerful tool for discovering new ways in which interventions can help us explain and improve models.

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

2024 | Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts