SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

5 Jun 2024 | Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao
SaySelf is a training framework that teaches large language models (LLMs) to express more accurate and fine-grained confidence estimates and generate self-reflective rationales. The framework consists of two stages: supervised fine-tuning and reinforcement learning from task supervision. In the supervised fine-tuning stage, a model-specific dataset is constructed by analyzing multiple sampled responses from LLMs. This dataset includes self-reflective rationales and confidence estimates. In the reinforcement learning stage, a reward function is designed to further calibrate the confidence estimates, encouraging LLMs to produce accurate and high-confidence predictions while penalizing overconfidence in incorrect responses. The supervised fine-tuning stage involves clustering multiple sampled responses based on semantic similarity and retaining one instance per cluster. GPT-4 is then used to summarize the uncertainties in specific knowledge from a first-person perspective, generating self-reflective rationales. The confidence estimates are derived based on the consistency of the reasoning chains. The reinforcement learning stage uses a reward function that encourages accurate predictions and penalizes overconfidence, leading to improved confidence calibration. Experiments on multiple knowledge-extensive question-answering tasks show that SaySelf significantly reduces confidence calibration error and maintains task performance. The generated self-reflective rationales are reasonable and can further improve calibration. The framework has potential applications in both academic research and real-world scenarios, including enhancing AI trustworthiness, guiding LLMs to perform better interactions, and improving training protocols. The code for SaySelf is publicly available.SaySelf is a training framework that teaches large language models (LLMs) to express more accurate and fine-grained confidence estimates and generate self-reflective rationales. The framework consists of two stages: supervised fine-tuning and reinforcement learning from task supervision. In the supervised fine-tuning stage, a model-specific dataset is constructed by analyzing multiple sampled responses from LLMs. This dataset includes self-reflective rationales and confidence estimates. In the reinforcement learning stage, a reward function is designed to further calibrate the confidence estimates, encouraging LLMs to produce accurate and high-confidence predictions while penalizing overconfidence in incorrect responses. The supervised fine-tuning stage involves clustering multiple sampled responses based on semantic similarity and retaining one instance per cluster. GPT-4 is then used to summarize the uncertainties in specific knowledge from a first-person perspective, generating self-reflective rationales. The confidence estimates are derived based on the consistency of the reasoning chains. The reinforcement learning stage uses a reward function that encourages accurate predictions and penalizes overconfidence, leading to improved confidence calibration. Experiments on multiple knowledge-extensive question-answering tasks show that SaySelf significantly reduces confidence calibration error and maintains task performance. The generated self-reflective rationales are reasonable and can further improve calibration. The framework has potential applications in both academic research and real-world scenarios, including enhancing AI trustworthiness, guiding LLMs to perform better interactions, and improving training protocols. The code for SaySelf is publicly available.
Reach us at info@study.space
Understanding SaySelf%3A Teaching LLMs to Express Confidence with Self-Reflective Rationales