Multi-property Steering of Large Language Models with Dynamic Activation Composition

Multi-property Steering of Large Language Models with Dynamic Activation Composition

25 Jun 2024 | Daniel Scalen, Gabriele Sarti, Malvina Nissim
This paper introduces Dynamic Activation Composition (Dyn), a method for multi-property steering of large language models (LLMs) that dynamically modulates steering intensity during generation. Activation steering methods have been shown to be effective in conditioning LLM generation by modifying intermediate representations, but previous evaluations have focused on single properties and synthetic settings. The authors propose Dyn, an information-theoretic approach that dynamically composes property-specific steering vectors to maintain high conditioning while minimizing the impact on generation fluency. The paper evaluates various activation steering strategies, highlighting the property-dependent nature of optimal parameters. It demonstrates that Dyn successfully maintains high conditioning while preserving fluency across multiple properties. The method involves extracting activation vectors from contrasting examples and dynamically adjusting steering intensity based on the information gain from these vectors. The authors conduct experiments on multi-property steering, showing that Dyn achieves strong conditioning for all selected properties while maintaining high fluency. They compare Dyn with other strategies such as Start, Fixed, and Dim, finding that Dyn provides the best balance between conditioning accuracy and generation fluency. The paper also discusses the limitations of existing activation steering techniques, including the need for property-specific calibration and the potential for oversteering. The authors propose Dyn as a solution to these limitations, allowing for adaptive steering intensity based on the expected steering effect. The study highlights the importance of property-dependent steering and the need for adaptive methods to ensure both conditioning accuracy and fluency. The results show that Dyn outperforms other methods in multi-property steering, achieving high conditioning accuracy and maintaining fluency. The paper concludes that Dyn is a promising approach for multi-property steering of LLMs, offering a balance between conditioning and fluency.This paper introduces Dynamic Activation Composition (Dyn), a method for multi-property steering of large language models (LLMs) that dynamically modulates steering intensity during generation. Activation steering methods have been shown to be effective in conditioning LLM generation by modifying intermediate representations, but previous evaluations have focused on single properties and synthetic settings. The authors propose Dyn, an information-theoretic approach that dynamically composes property-specific steering vectors to maintain high conditioning while minimizing the impact on generation fluency. The paper evaluates various activation steering strategies, highlighting the property-dependent nature of optimal parameters. It demonstrates that Dyn successfully maintains high conditioning while preserving fluency across multiple properties. The method involves extracting activation vectors from contrasting examples and dynamically adjusting steering intensity based on the information gain from these vectors. The authors conduct experiments on multi-property steering, showing that Dyn achieves strong conditioning for all selected properties while maintaining high fluency. They compare Dyn with other strategies such as Start, Fixed, and Dim, finding that Dyn provides the best balance between conditioning accuracy and generation fluency. The paper also discusses the limitations of existing activation steering techniques, including the need for property-specific calibration and the potential for oversteering. The authors propose Dyn as a solution to these limitations, allowing for adaptive steering intensity based on the expected steering effect. The study highlights the importance of property-dependent steering and the need for adaptive methods to ensure both conditioning accuracy and fluency. The results show that Dyn outperforms other methods in multi-property steering, achieving high conditioning accuracy and maintaining fluency. The paper concludes that Dyn is a promising approach for multi-property steering of LLMs, offering a balance between conditioning and fluency.
Reach us at info@study.space
[slides and audio] Multi-property Steering of Large Language Models with Dynamic Activation Composition