SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

5 Jun 2024 | Tianyang Xu1*, Shujin Wu3*, Shizhe Diao1, Xiaoze Liu1 Xingyao Wang2, Yangyi Chen2†, Jing Gao1†
The paper introduces SaySelf, a training framework designed to teach large language models (LLMs) to generate more accurate and fine-grained confidence estimates, along with self-reflective rationales. SaySelf addresses the common issues of LLMs producing inaccurate or fabricated information and failing to indicate their confidence levels. The framework consists of two main stages: supervised fine-tuning and reinforcement learning from task supervision. In the supervised fine-tuning stage, SaySelf constructs a model-specific dataset by analyzing multiple sampled responses from LLMs, clustering them based on semantic similarity, and generating self-reflective rationales and confidence estimates. In the reinforcement learning stage, a carefully designed reward function is used to calibrate the confidence estimates, encouraging LLMs to produce accurate, high-confidence predictions while penalizing overconfidence in incorrect outputs. Experimental results on various datasets demonstrate that SaySelf effectively reduces calibration errors, maintains task performance, and generates reasonable self-reflective rationales. The code for SaySelf is made publicly available.The paper introduces SaySelf, a training framework designed to teach large language models (LLMs) to generate more accurate and fine-grained confidence estimates, along with self-reflective rationales. SaySelf addresses the common issues of LLMs producing inaccurate or fabricated information and failing to indicate their confidence levels. The framework consists of two main stages: supervised fine-tuning and reinforcement learning from task supervision. In the supervised fine-tuning stage, SaySelf constructs a model-specific dataset by analyzing multiple sampled responses from LLMs, clustering them based on semantic similarity, and generating self-reflective rationales and confidence estimates. In the reinforcement learning stage, a carefully designed reward function is used to calibrate the confidence estimates, encouraging LLMs to produce accurate, high-confidence predictions while penalizing overconfidence in incorrect outputs. Experimental results on various datasets demonstrate that SaySelf effectively reduces calibration errors, maintains task performance, and generates reasonable self-reflective rationales. The code for SaySelf is made publicly available.
Reach us at info@study.space
[slides] SaySelf%3A Teaching LLMs to Express Confidence with Self-Reflective Rationales | StudySpace