LangProp: A code optimization framework using Large Language Models applied to driving

LangProp: A code optimization framework using Large Language Models applied to driving

2024 | Shu Ishida, Gianluca Corrado, George Fedoseev, Hudson Yeo, Lloyd Russell, Jamie Shotton, João F. Henriques & Anthony Hu
LangProp is a code optimization framework that uses large language models (LLMs) to iteratively improve code generated for tasks such as Sudoku, CartPole, and autonomous driving. The framework leverages LLMs to generate and refine code based on performance metrics and feedback from a dataset of input-output pairs. It operates in a data-driven and metric-based training paradigm, allowing for the adaptation of traditional machine learning techniques like imitation learning, DAgger, and reinforcement learning. LangProp automatically evaluates code performance, catches exceptions, and feeds results back to the LLM for iterative improvement. It generates interpretable and transparent policies that can be verified and improved in a data-driven manner. The framework is applicable to various domains, including autonomous driving, where it demonstrates the ability to generate driving policies that outperform existing ones. LangProp's approach allows for the use of LLMs in a training loop, where the LLM is used to optimize code, not for inference. The framework is implemented with a modular structure, allowing for easy adaptation to different tasks and domains. The code is available for open-source use. The framework has been tested on various tasks, including generalized Sudoku, CartPole, and autonomous driving in CARLA, demonstrating its effectiveness in improving code performance through iterative optimization. The results show that LangProp can generate policies that perform well in complex tasks, and that the training paradigms used in traditional machine learning can be directly applied to LangProp. The framework's ability to iteratively optimize code using LLMs represents a significant advancement in code generation and optimization.LangProp is a code optimization framework that uses large language models (LLMs) to iteratively improve code generated for tasks such as Sudoku, CartPole, and autonomous driving. The framework leverages LLMs to generate and refine code based on performance metrics and feedback from a dataset of input-output pairs. It operates in a data-driven and metric-based training paradigm, allowing for the adaptation of traditional machine learning techniques like imitation learning, DAgger, and reinforcement learning. LangProp automatically evaluates code performance, catches exceptions, and feeds results back to the LLM for iterative improvement. It generates interpretable and transparent policies that can be verified and improved in a data-driven manner. The framework is applicable to various domains, including autonomous driving, where it demonstrates the ability to generate driving policies that outperform existing ones. LangProp's approach allows for the use of LLMs in a training loop, where the LLM is used to optimize code, not for inference. The framework is implemented with a modular structure, allowing for easy adaptation to different tasks and domains. The code is available for open-source use. The framework has been tested on various tasks, including generalized Sudoku, CartPole, and autonomous driving in CARLA, demonstrating its effectiveness in improving code performance through iterative optimization. The results show that LangProp can generate policies that perform well in complex tasks, and that the training paradigms used in traditional machine learning can be directly applied to LangProp. The framework's ability to iteratively optimize code using LLMs represents a significant advancement in code generation and optimization.
Reach us at info@study.space