[slides and audio] Discovering Preference Optimization Algorithms with and for Large Language Models

The paper "Discovering Preference Optimization Algorithms with and for Large Language Models" by Chris Lu addresses the challenge of enhancing and controlling the quality of Large Language Model (LLM) outputs through offline preference optimization. Traditional methods rely on manually crafted convex loss functions, which are constrained by human creativity and ingenuity. To overcome these limitations, the authors propose an LLM-driven approach to automatically discover new state-of-the-art preference optimization algorithms without human intervention. The key contributions of the paper include: 1. **LLM-Driven Objective Discovery**: The authors detail a pipeline where an LLM is used to propose and evaluate new objective functions for offline preference optimization. This process involves iterative refinement, where the LLM generates new loss functions based on previously evaluated performance metrics. 2. **Discovered Preference Optimization (DiscoPOP)**: One of the discovered algorithms, DiscoPOP, is a novel algorithm that adaptively blends logistic and exponential losses. It achieves strong performance across multiple held-out evaluation tasks, including multi-turn dialogue, text summarization, and positive sentiment generation. 3. **Performance Evaluation**: The paper evaluates the discovered algorithms using various benchmarks, demonstrating that DiscoPOP outperforms existing methods in terms of win rates and other metrics. The authors also discuss the limitations of their approach, such as the need for further tuning and the potential for instability in certain parameter ranges. They suggest future work directions, including the use of visual language models for more effective objective proposal and the exploration of multiple floating-point parameters in the objective function. Overall, the paper provides a comprehensive framework for automating the discovery of preference optimization algorithms, leveraging the capabilities of LLMs to explore a wide range of loss functions and identify those that perform best.The paper "Discovering Preference Optimization Algorithms with and for Large Language Models" by Chris Lu addresses the challenge of enhancing and controlling the quality of Large Language Model (LLM) outputs through offline preference optimization. Traditional methods rely on manually crafted convex loss functions, which are constrained by human creativity and ingenuity. To overcome these limitations, the authors propose an LLM-driven approach to automatically discover new state-of-the-art preference optimization algorithms without human intervention. The key contributions of the paper include: 1. **LLM-Driven Objective Discovery**: The authors detail a pipeline where an LLM is used to propose and evaluate new objective functions for offline preference optimization. This process involves iterative refinement, where the LLM generates new loss functions based on previously evaluated performance metrics. 2. **Discovered Preference Optimization (DiscoPOP)**: One of the discovered algorithms, DiscoPOP, is a novel algorithm that adaptively blends logistic and exponential losses. It achieves strong performance across multiple held-out evaluation tasks, including multi-turn dialogue, text summarization, and positive sentiment generation. 3. **Performance Evaluation**: The paper evaluates the discovered algorithms using various benchmarks, demonstrating that DiscoPOP outperforms existing methods in terms of win rates and other metrics. The authors also discuss the limitations of their approach, such as the need for further tuning and the potential for instability in certain parameter ranges. They suggest future work directions, including the use of visual language models for more effective objective proposal and the exploration of multiple floating-point parameters in the objective function. Overall, the paper provides a comprehensive framework for automating the discovery of preference optimization algorithms, leveraging the capabilities of LLMs to explore a wide range of loss functions and identify those that perform best.

Discovering Preference Optimization Algorithms with and for Large Language Models

12 Jun 2024 | Chris Lu, Samuel Holt, Claudio Fanconi, Alex J. Chan, Jakob Foerster, Mihaela van der Schaar, Robert Tjarko Lange