SaySelf is a training framework that teaches large language models (LLMs) to express more accurate and fine-grained confidence estimates and generate self-reflective rationales. The framework consists of two stages: supervised fine-tuning and reinforcement learning from task supervision. In the supervised fine-tuning stage, a model-specific dataset is constructed by analyzing multiple sampled responses from LLMs. This dataset includes self-reflective rationales and confidence estimates. In the reinforcement learning stage, a reward function is designed to further calibrate the confidence estimates, encouraging LLMs to produce accurate and high-confidence predictions while penalizing overconfidence in incorrect responses.
The supervised fine-tuning stage involves clustering multiple sampled responses based on semantic similarity and retaining one instance per cluster. GPT-4 is then used to summarize the uncertainties in specific knowledge from a first-person perspective, generating self-reflective rationales. The confidence estimates are derived based on the consistency of the reasoning chains. The reinforcement learning stage uses a reward function that encourages accurate predictions and penalizes overconfidence, leading to improved confidence calibration.
Experiments on multiple knowledge-extensive question-answering tasks show that SaySelf significantly reduces confidence calibration error and maintains task performance. The generated self-reflective rationales are reasonable and can further improve calibration. The framework has potential applications in both academic research and real-world scenarios, including enhancing AI trustworthiness, guiding LLMs to perform better interactions, and improving training protocols. The code for SaySelf is publicly available.SaySelf is a training framework that teaches large language models (LLMs) to express more accurate and fine-grained confidence estimates and generate self-reflective rationales. The framework consists of two stages: supervised fine-tuning and reinforcement learning from task supervision. In the supervised fine-tuning stage, a model-specific dataset is constructed by analyzing multiple sampled responses from LLMs. This dataset includes self-reflective rationales and confidence estimates. In the reinforcement learning stage, a reward function is designed to further calibrate the confidence estimates, encouraging LLMs to produce accurate and high-confidence predictions while penalizing overconfidence in incorrect responses.
The supervised fine-tuning stage involves clustering multiple sampled responses based on semantic similarity and retaining one instance per cluster. GPT-4 is then used to summarize the uncertainties in specific knowledge from a first-person perspective, generating self-reflective rationales. The confidence estimates are derived based on the consistency of the reasoning chains. The reinforcement learning stage uses a reward function that encourages accurate predictions and penalizes overconfidence, leading to improved confidence calibration.
Experiments on multiple knowledge-extensive question-answering tasks show that SaySelf significantly reduces confidence calibration error and maintains task performance. The generated self-reflective rationales are reasonable and can further improve calibration. The framework has potential applications in both academic research and real-world scenarios, including enhancing AI trustworthiness, guiding LLMs to perform better interactions, and improving training protocols. The code for SaySelf is publicly available.