When to Trust LLMs: Aligning Confidence with Response Quality

When to Trust LLMs: Aligning Confidence with Response Quality

9 Jun 2024 | Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun, Jinyang Gao, Huawei Shen, Bolin Ding
This paper proposes CONQORD, a confidence-quality-order-preserving alignment approach for large language models (LLMs). The method uses reinforcement learning with a dual-component reward function to align confidence levels with response quality. The reward function includes a quality reward component that evaluates response quality and an order-preserving alignment reward component that ensures consistency between confidence and response quality. The order-preserving alignment reward encourages the model to generate higher confidence for higher quality responses, thereby aligning confidence with response quality. Experiments show that CONQORD significantly improves the alignment between confidence and response accuracy without causing over-cautiousness. The aligned confidence provided by CONQORD informs when to trust LLMs and acts as a determinant for initiating the retrieval process of external knowledge. Aligning confidence with response quality ensures more transparent and reliable responses, providing better trustworthiness. The method is evaluated on benchmark datasets, including TruthfulQA and NQ, and demonstrates superior performance in confidence alignment compared to existing methods. The results show that CONQORD's confidence alignment reliably dictates the trustworthiness of LLM outputs. The method is also tested in adaptive retrieval tasks, where confidence scores are used to guide the activation of external knowledge. The results confirm that CONQORD's confidence alignment reliably dictates the trustworthiness of LLM outputs. The method is shown to be more robust and generalizable compared to previous approaches. The study highlights the importance of aligning confidence with response quality to ensure the reliability and transparency of LLM outputs. The research contributes to the field of confidence calibration in LLMs and provides a new approach for improving the reliability of LLM outputs.This paper proposes CONQORD, a confidence-quality-order-preserving alignment approach for large language models (LLMs). The method uses reinforcement learning with a dual-component reward function to align confidence levels with response quality. The reward function includes a quality reward component that evaluates response quality and an order-preserving alignment reward component that ensures consistency between confidence and response quality. The order-preserving alignment reward encourages the model to generate higher confidence for higher quality responses, thereby aligning confidence with response quality. Experiments show that CONQORD significantly improves the alignment between confidence and response accuracy without causing over-cautiousness. The aligned confidence provided by CONQORD informs when to trust LLMs and acts as a determinant for initiating the retrieval process of external knowledge. Aligning confidence with response quality ensures more transparent and reliable responses, providing better trustworthiness. The method is evaluated on benchmark datasets, including TruthfulQA and NQ, and demonstrates superior performance in confidence alignment compared to existing methods. The results show that CONQORD's confidence alignment reliably dictates the trustworthiness of LLM outputs. The method is also tested in adaptive retrieval tasks, where confidence scores are used to guide the activation of external knowledge. The results confirm that CONQORD's confidence alignment reliably dictates the trustworthiness of LLM outputs. The method is shown to be more robust and generalizable compared to previous approaches. The study highlights the importance of aligning confidence with response quality to ensure the reliability and transparency of LLM outputs. The research contributes to the field of confidence calibration in LLMs and provides a new approach for improving the reliability of LLM outputs.
Reach us at info@study.space
[slides] When to Trust LLMs%3A Aligning Confidence with Response Quality | StudySpace