[slides and audio] Faithfulness vs. Plausibility%3A On the (Un)Reliability of Explanations from Large Language Models

This paper explores the dichotomy between faithfulness and plausibility in self-explanations (SEs) generated by Large Language Models (LLMs). While LLMs are adept at producing plausible explanations that align with human reasoning, these explanations may not always accurately reflect the model's internal decision-making processes, raising concerns about their faithfulness. The paper highlights the importance of ensuring faithfulness in high-stakes applications, such as healthcare, finance, and legal contexts, where incorrect explanations can lead to severe consequences. It discusses the current trend of prioritizing plausibility over faithfulness and the need for systematic characterization of faithfulness-plausibility requirements for different applications. The paper also reviews various techniques for generating SEs, including chain-of-thought reasoning, token importance, and counterfactual explanations, and proposes methods to enhance faithfulness, such as fine-tuning on domain-specific datasets, in-context learning, and developing more interpretable models. The authors call for the community to develop novel methods to improve the faithfulness of SEs, ensuring transparent and trustworthy deployment of LLMs in diverse high-stakes settings.This paper explores the dichotomy between faithfulness and plausibility in self-explanations (SEs) generated by Large Language Models (LLMs). While LLMs are adept at producing plausible explanations that align with human reasoning, these explanations may not always accurately reflect the model's internal decision-making processes, raising concerns about their faithfulness. The paper highlights the importance of ensuring faithfulness in high-stakes applications, such as healthcare, finance, and legal contexts, where incorrect explanations can lead to severe consequences. It discusses the current trend of prioritizing plausibility over faithfulness and the need for systematic characterization of faithfulness-plausibility requirements for different applications. The paper also reviews various techniques for generating SEs, including chain-of-thought reasoning, token importance, and counterfactual explanations, and proposes methods to enhance faithfulness, such as fine-tuning on domain-specific datasets, in-context learning, and developing more interpretable models. The authors call for the community to develop novel methods to improve the faithfulness of SEs, ensuring transparent and trustworthy deployment of LLMs in diverse high-stakes settings.

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

14 Mar 2024 | Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju