This paper presents a method for improving the performance of large language models (LLMs) when used as agents by incorporating negative examples during fine-tuning. The authors argue that negative examples, which represent failed interactions between LLMs and environments, contain valuable insights that can help LLMs learn more effectively. They propose a negative-aware training (NAT) paradigm that explicitly tells the model whether the following trajectories are correct or incorrect by adding prefixes or suffixes to the queries. This approach allows LLMs to learn from both positive and negative examples, leading to better performance on tasks such as mathematical reasoning, multi-hop question answering, and strategic question answering.
The authors conducted experiments on various tasks, including mathematical reasoning and question-answering, and found that NAT outperforms traditional methods that only use positive examples or naively combine positive and negative examples. They also found that NAT provides a better trade-off between valuable information and errors in negative examples. The results show that NAT is particularly effective in data-scarce scenarios, where the number of positive examples is limited.
The paper also discusses the importance of data quality in NAT, finding that high-quality negative examples lead to better performance. Additionally, the authors explore the effectiveness of NAT in different scenarios, including fine-grained NAT, which divides negative examples into different groups based on their quality. They also demonstrate that NAT is applicable to various prompting strategies, including Chain-of-Thought (CoT) prompting.
The authors conclude that NAT is a simple and effective method for integrating failed trajectories in fine-tuning agents. They argue that NAT is agent-agnostic and reasoning strategy-agnostic, making it compatible with various agent strategies. The paper highlights the potential of NAT for improving the performance of LLMs as agents and for developing better agent-tuning methods and low-resource data usage techniques.This paper presents a method for improving the performance of large language models (LLMs) when used as agents by incorporating negative examples during fine-tuning. The authors argue that negative examples, which represent failed interactions between LLMs and environments, contain valuable insights that can help LLMs learn more effectively. They propose a negative-aware training (NAT) paradigm that explicitly tells the model whether the following trajectories are correct or incorrect by adding prefixes or suffixes to the queries. This approach allows LLMs to learn from both positive and negative examples, leading to better performance on tasks such as mathematical reasoning, multi-hop question answering, and strategic question answering.
The authors conducted experiments on various tasks, including mathematical reasoning and question-answering, and found that NAT outperforms traditional methods that only use positive examples or naively combine positive and negative examples. They also found that NAT provides a better trade-off between valuable information and errors in negative examples. The results show that NAT is particularly effective in data-scarce scenarios, where the number of positive examples is limited.
The paper also discusses the importance of data quality in NAT, finding that high-quality negative examples lead to better performance. Additionally, the authors explore the effectiveness of NAT in different scenarios, including fine-grained NAT, which divides negative examples into different groups based on their quality. They also demonstrate that NAT is applicable to various prompting strategies, including Chain-of-Thought (CoT) prompting.
The authors conclude that NAT is a simple and effective method for integrating failed trajectories in fine-tuning agents. They argue that NAT is agent-agnostic and reasoning strategy-agnostic, making it compatible with various agent strategies. The paper highlights the potential of NAT for improving the performance of LLMs as agents and for developing better agent-tuning methods and low-resource data usage techniques.