Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback

Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback

2 Jun 2024 | Omar Shaikh*, Michelle Lam*, Joey Hejna*, Yijia Shao, Michael Bernstein, Diyi Yang
The article introduces DITTO, a novel method for aligning large language models (LLMs) with specific user behaviors using a small number of demonstrations. Unlike traditional methods that require large datasets, DITTO leverages user-provided examples to generate online comparison data, enabling effective customization of LLMs. The approach is grounded in online imitation learning, where user demonstrations are treated as preferred over model outputs, allowing the model to learn fine-grained style and task alignment across various domains. DITTO outperforms existing methods like few-shot prompting, supervised fine-tuning, and self-play methods by an average of 19% in benchmark tests and user studies. The method is evaluated on author-specific writing tasks and real-world scenarios, demonstrating its effectiveness in aligning LLMs to individual preferences. DITTO's sample efficiency is highlighted, as it requires significantly fewer demonstrations than traditional pairwise preference methods. The paper also discusses the broader implications of using demonstrations for alignment, noting potential risks and the need for careful consideration of feedback collection practices. Overall, DITTO offers a cost-effective and efficient way to customize LLMs for specific tasks and users.The article introduces DITTO, a novel method for aligning large language models (LLMs) with specific user behaviors using a small number of demonstrations. Unlike traditional methods that require large datasets, DITTO leverages user-provided examples to generate online comparison data, enabling effective customization of LLMs. The approach is grounded in online imitation learning, where user demonstrations are treated as preferred over model outputs, allowing the model to learn fine-grained style and task alignment across various domains. DITTO outperforms existing methods like few-shot prompting, supervised fine-tuning, and self-play methods by an average of 19% in benchmark tests and user studies. The method is evaluated on author-specific writing tasks and real-world scenarios, demonstrating its effectiveness in aligning LLMs to individual preferences. DITTO's sample efficiency is highlighted, as it requires significantly fewer demonstrations than traditional pairwise preference methods. The paper also discusses the broader implications of using demonstrations for alignment, noting potential risks and the need for careful consideration of feedback collection practices. Overall, DITTO offers a cost-effective and efficient way to customize LLMs for specific tasks and users.
Reach us at info@study.space