6 Jun 2024 | Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu
**Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization**
**Authors:** Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu
**Institution:** Zhejiang University, Chinese Academy of Sciences, Nanjing University of Posts and Telecommunications, Nanjing University of Information Science and Technology, Beijing University of Technology, University of Chinese Academy of Sciences, Nanjing
**Abstract:**
Large Language Models (LLMs) excel in solving diverse tasks, but most LLM-based agents are designed as specific task solvers, requiring sophisticated prompt engineering. This paper introduces Agent-Pro, an LLM-based agent capable of learning and evolving through interactions. Agent-Pro features a dynamic belief generation and reflection process for policy evolution, allowing it to "fine-tune" its beliefs and optimize its behavior policy. The agent uses depth-first search for policy optimization, ensuring continuous improvement. Evaluations in Blackjack and Texas Hold'em show that Agent-Pro outperforms vanilla LLMs and specialized models, demonstrating its ability to learn and evolve in complex and dynamic scenarios.
**Key Contributions:**
1. **Belief-Aware Decision-Making:** Agent-Pro updates its self-belief and world-belief to make more coherent decisions in dynamic and imperfect game scenarios.
2. **Policy-Level Reflection:** Agent-Pro reflects on past trajectories and beliefs, correcting irrational beliefs and optimizing its policy.
3. **DFS-based Policy Evolution:** Agent-Pro uses depth-first search to evaluate and update its policy, ensuring continuous improvement.
**Methods:**
- **Belief-Aware Decision-Making:** Agent-Pro updates its beliefs about itself and the environment, enhancing decision-making in dynamic and imperfect games.
- **Policy-Level Reflection:** Agent-Pro reflects on past experiences, corrects irrational beliefs, and summarizes reflections into behavioral guidelines and world modeling.
- **DFS-based Policy Evolution:** Agent-Pro evaluates and updates its policy using depth-first search, ensuring generalization and continuous improvement.
**Results:**
- **Blackjack:** Agent-Pro significantly outperforms baseline agents, demonstrating improved rationality and decision-making.
- **Texas Hold'em:** Agent-Pro consistently outperforms RL-based agents and surpasses other LLM-based agents, showing advanced strategic skills.
**Conclusion:**
Agent-Pro is a novel LLM-based agent capable of learning and evolving in complex interactive tasks. It constructs dynamic beliefs, reflects on past experiences, and optimizes its policy, leading to significant improvements in decision-making capabilities.**Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization**
**Authors:** Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu
**Institution:** Zhejiang University, Chinese Academy of Sciences, Nanjing University of Posts and Telecommunications, Nanjing University of Information Science and Technology, Beijing University of Technology, University of Chinese Academy of Sciences, Nanjing
**Abstract:**
Large Language Models (LLMs) excel in solving diverse tasks, but most LLM-based agents are designed as specific task solvers, requiring sophisticated prompt engineering. This paper introduces Agent-Pro, an LLM-based agent capable of learning and evolving through interactions. Agent-Pro features a dynamic belief generation and reflection process for policy evolution, allowing it to "fine-tune" its beliefs and optimize its behavior policy. The agent uses depth-first search for policy optimization, ensuring continuous improvement. Evaluations in Blackjack and Texas Hold'em show that Agent-Pro outperforms vanilla LLMs and specialized models, demonstrating its ability to learn and evolve in complex and dynamic scenarios.
**Key Contributions:**
1. **Belief-Aware Decision-Making:** Agent-Pro updates its self-belief and world-belief to make more coherent decisions in dynamic and imperfect game scenarios.
2. **Policy-Level Reflection:** Agent-Pro reflects on past trajectories and beliefs, correcting irrational beliefs and optimizing its policy.
3. **DFS-based Policy Evolution:** Agent-Pro uses depth-first search to evaluate and update its policy, ensuring continuous improvement.
**Methods:**
- **Belief-Aware Decision-Making:** Agent-Pro updates its beliefs about itself and the environment, enhancing decision-making in dynamic and imperfect games.
- **Policy-Level Reflection:** Agent-Pro reflects on past experiences, corrects irrational beliefs, and summarizes reflections into behavioral guidelines and world modeling.
- **DFS-based Policy Evolution:** Agent-Pro evaluates and updates its policy using depth-first search, ensuring generalization and continuous improvement.
**Results:**
- **Blackjack:** Agent-Pro significantly outperforms baseline agents, demonstrating improved rationality and decision-making.
- **Texas Hold'em:** Agent-Pro consistently outperforms RL-based agents and surpasses other LLM-based agents, showing advanced strategic skills.
**Conclusion:**
Agent-Pro is a novel LLM-based agent capable of learning and evolving in complex interactive tasks. It constructs dynamic beliefs, reflects on past experiences, and optimizes its policy, leading to significant improvements in decision-making capabilities.