[slides] Aligning LLM Agents by Learning Latent Preference from User Edits

This paper introduces a framework for aligning large language model (LLM) agents with user preferences through learning from user edits. The authors propose a method called PRELUDE, which infers a description of the user's latent preference based on historical edit data. This inferred preference is then used to generate responses in the future, avoiding the need for costly fine-tuning of the agent. To address the challenge of learning complex and context-dependent user preferences, the authors propose CIPHER, a simple yet effective algorithm that leverages the LLM to infer user preferences for a given context based on user edits. CIPHER retrieves inferred preferences from the k-closest contexts in the history and forms an aggregate preference for response generation. The authors evaluate their method on two interactive tasks—summarization and email writing—using a GPT-4 simulated user. CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. The analysis shows that the user preferences learned by CIPHER show significant similarity to the ground truth latent preferences. The authors conclude that their method provides a cheap, effective, and interpretable approach to aligning LLM agents with user preferences through learning from user edits.This paper introduces a framework for aligning large language model (LLM) agents with user preferences through learning from user edits. The authors propose a method called PRELUDE, which infers a description of the user's latent preference based on historical edit data. This inferred preference is then used to generate responses in the future, avoiding the need for costly fine-tuning of the agent. To address the challenge of learning complex and context-dependent user preferences, the authors propose CIPHER, a simple yet effective algorithm that leverages the LLM to infer user preferences for a given context based on user edits. CIPHER retrieves inferred preferences from the k-closest contexts in the history and forms an aggregate preference for response generation. The authors evaluate their method on two interactive tasks—summarization and email writing—using a GPT-4 simulated user. CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. The analysis shows that the user preferences learned by CIPHER show significant similarity to the ground truth latent preferences. The authors conclude that their method provides a cheap, effective, and interpretable approach to aligning LLM agents with user preferences through learning from user edits.

Aligning LLM Agents by Learning Latent Preference from User Edits

23 Nov 2024 | Ge Gao, Alexey Taymanov, Eduardo Salinas, Paul Mineiro, Dipendra Misra