Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

14 Mar 2024 | Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju
This paper explores the tension between faithfulness and plausibility in self-explanations generated by Large Language Models (LLMs). While LLMs are capable of producing explanations that appear logical and coherent to humans (plausibility), these explanations may not accurately reflect the model's internal reasoning process (faithfulness). The authors argue that faithfulness is critical in high-stakes applications where accurate decision-making is essential. They emphasize the need for a systematic understanding of the faithfulness and plausibility requirements of different real-world applications to ensure that explanations meet these needs. While there are many approaches to improving plausibility, improving faithfulness remains an open challenge. The paper calls for the development of novel methods to enhance the faithfulness of self-explanations, enabling transparent deployment of LLMs in diverse high-stakes settings. The authors highlight the importance of balancing plausibility and faithfulness, noting that in some applications, faithful explanations are more important than plausible ones, and vice versa. They propose three potential directions for future research: fine-tuning approaches, in-context learning, and mechanistic interpretability. The paper concludes by calling on the community to prioritize the development of reliable metrics and strategies to enhance the faithfulness of LLM explanations, ensuring that they are both accurate and transparent.This paper explores the tension between faithfulness and plausibility in self-explanations generated by Large Language Models (LLMs). While LLMs are capable of producing explanations that appear logical and coherent to humans (plausibility), these explanations may not accurately reflect the model's internal reasoning process (faithfulness). The authors argue that faithfulness is critical in high-stakes applications where accurate decision-making is essential. They emphasize the need for a systematic understanding of the faithfulness and plausibility requirements of different real-world applications to ensure that explanations meet these needs. While there are many approaches to improving plausibility, improving faithfulness remains an open challenge. The paper calls for the development of novel methods to enhance the faithfulness of self-explanations, enabling transparent deployment of LLMs in diverse high-stakes settings. The authors highlight the importance of balancing plausibility and faithfulness, noting that in some applications, faithful explanations are more important than plausible ones, and vice versa. They propose three potential directions for future research: fine-tuning approaches, in-context learning, and mechanistic interpretability. The paper concludes by calling on the community to prioritize the development of reliable metrics and strategies to enhance the faithfulness of LLM explanations, ensuring that they are both accurate and transparent.
Reach us at info@study.space