Can AI Assistants Know What They Don’t Know?

Can AI Assistants Know What They Don’t Know?

28 Jan 2024 | Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, Shimin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu
This paper explores whether AI assistants can recognize and express their lack of knowledge through natural language. The authors construct a model-specific "I don't know" (Idk) dataset for an AI assistant, which contains known and unknown questions. By aligning the assistant with this Idk dataset, the researchers observe if the assistant can refuse to answer unknown questions. The results show that after alignment, the assistant can refuse to answer most unknown questions, and its accuracy for known questions significantly improves. The study also investigates the impact of different methods (prompting, supervised fine-tuning, and preference-aware optimization) on the assistant's ability to recognize and express its lack of knowledge. The findings suggest that larger models are better at distinguishing between known and unknown questions, and that preference-aware optimization can mitigate the model's tendency to refuse known questions. Overall, the paper demonstrates that aligning AI assistants with an Idk dataset can enhance their truthfulness and reduce hallucinations.This paper explores whether AI assistants can recognize and express their lack of knowledge through natural language. The authors construct a model-specific "I don't know" (Idk) dataset for an AI assistant, which contains known and unknown questions. By aligning the assistant with this Idk dataset, the researchers observe if the assistant can refuse to answer unknown questions. The results show that after alignment, the assistant can refuse to answer most unknown questions, and its accuracy for known questions significantly improves. The study also investigates the impact of different methods (prompting, supervised fine-tuning, and preference-aware optimization) on the assistant's ability to recognize and express its lack of knowledge. The findings suggest that larger models are better at distinguishing between known and unknown questions, and that preference-aware optimization can mitigate the model's tendency to refuse known questions. Overall, the paper demonstrates that aligning AI assistants with an Idk dataset can enhance their truthfulness and reduce hallucinations.
Reach us at info@study.space
[slides and audio] Can AI Assistants Know What They Don't Know%3F