12 Mar 2024 | Zhengxuan Wu†, Atticus Geiger‡, Aryaman Arora†, Jing Huang‡, Zheng Wang‡, Noah D. Goodman†, Christopher D. Manning†, Christopher Potts‡
**pyvene** is an open-source Python library designed to facilitate interventions on PyTorch models, supporting a wide range of neural architectures. The library enables customizable and complex interventions, including static and trainable parameters, and provides a unified framework for performing and sharing these interventions. Key features of **pyvene** include:
1. **Intervention as Primitive**: Interventions are specified using a dict-based format, making it easier to share and maintain intervention schemes.
2. **Complex Intervention Schemes**: Supports interventions at multiple locations, involving arbitrary subsets of neurons, and can be performed in parallel or sequence.
3. **Support for Various Models**: Compatible with simple feed-forward networks, Transformers, and recurrent and convolutional neural models.
4. **Serialization and Sharing**: Intervention schemes and models are serializable, allowing for easy sharing through platforms like HuggingFace.
The library includes detailed documentation, tutorials, and examples, such as reproducing the findings from Meng et al. (2022) on locating factual associations in GPT2-XL and demonstrating intervention and probe training with Pythia-6.9B to localize gender information. **pyvene** aims to be a powerful tool for researchers and practitioners in the field of AI, particularly in areas like model editing, steering, robustness, and interpretability.**pyvene** is an open-source Python library designed to facilitate interventions on PyTorch models, supporting a wide range of neural architectures. The library enables customizable and complex interventions, including static and trainable parameters, and provides a unified framework for performing and sharing these interventions. Key features of **pyvene** include:
1. **Intervention as Primitive**: Interventions are specified using a dict-based format, making it easier to share and maintain intervention schemes.
2. **Complex Intervention Schemes**: Supports interventions at multiple locations, involving arbitrary subsets of neurons, and can be performed in parallel or sequence.
3. **Support for Various Models**: Compatible with simple feed-forward networks, Transformers, and recurrent and convolutional neural models.
4. **Serialization and Sharing**: Intervention schemes and models are serializable, allowing for easy sharing through platforms like HuggingFace.
The library includes detailed documentation, tutorials, and examples, such as reproducing the findings from Meng et al. (2022) on locating factual associations in GPT2-XL and demonstrating intervention and probe training with Pythia-6.9B to localize gender information. **pyvene** aims to be a powerful tool for researchers and practitioners in the field of AI, particularly in areas like model editing, steering, robustness, and interpretability.