From Understanding to Utilization: A Survey on Explainability for Large Language Models

From Understanding to Utilization: A Survey on Explainability for Large Language Models

22 Feb 2024 | Haoyan Luo, Lucia Specia
This survey explores the importance of explainability in Large Language Models (LLMs) and presents a comprehensive overview of current methods for understanding and utilizing these models. As LLMs become more integrated into various applications, their "black-box" nature raises concerns about transparency and ethical use. The paper focuses on pre-trained Transformer-based LLMs, such as LLaMA, which pose unique interpretability challenges due to their scale and complexity. It categorizes existing explainability methods into local and global analyses, based on their explanatory objectives. Local analysis includes feature attribution and transformer block analysis, while global analysis encompasses probing-based methods and mechanistic interpretability. The paper also explores applications of these insights in enhancing LLM capabilities, focusing on model editing, capability enhancement, and controlled generation. Additionally, it examines representative evaluation metrics and datasets, highlighting their advantages and limitations. The goal is to bridge the gap between theoretical and empirical understanding with practical implementation, proposing exciting avenues for explanatory techniques and their applications in the LLM era. The survey discusses how explainability can be used as a tool to debug and improve models, highlighting methods such as model editing, enhancing model capability, and controllable generation. It also addresses the challenges of hallucination and ethical alignment in LLMs, proposing techniques to reduce hallucination and mitigate social biases. Finally, the paper evaluates the plausibility and truthfulness of explanations, emphasizing the need for evaluation methods and datasets to assess the performance of explainability techniques. The survey concludes that explainability is crucial for ensuring that LLMs are transparent, fair, and beneficial, and highlights open problems and directions for future research.This survey explores the importance of explainability in Large Language Models (LLMs) and presents a comprehensive overview of current methods for understanding and utilizing these models. As LLMs become more integrated into various applications, their "black-box" nature raises concerns about transparency and ethical use. The paper focuses on pre-trained Transformer-based LLMs, such as LLaMA, which pose unique interpretability challenges due to their scale and complexity. It categorizes existing explainability methods into local and global analyses, based on their explanatory objectives. Local analysis includes feature attribution and transformer block analysis, while global analysis encompasses probing-based methods and mechanistic interpretability. The paper also explores applications of these insights in enhancing LLM capabilities, focusing on model editing, capability enhancement, and controlled generation. Additionally, it examines representative evaluation metrics and datasets, highlighting their advantages and limitations. The goal is to bridge the gap between theoretical and empirical understanding with practical implementation, proposing exciting avenues for explanatory techniques and their applications in the LLM era. The survey discusses how explainability can be used as a tool to debug and improve models, highlighting methods such as model editing, enhancing model capability, and controllable generation. It also addresses the challenges of hallucination and ethical alignment in LLMs, proposing techniques to reduce hallucination and mitigate social biases. Finally, the paper evaluates the plausibility and truthfulness of explanations, emphasizing the need for evaluation methods and datasets to assess the performance of explainability techniques. The survey concludes that explainability is crucial for ensuring that LLMs are transparent, fair, and beneficial, and highlights open problems and directions for future research.
Reach us at info@study.space
[slides] From Understanding to Utilization%3A A Survey on Explainability for Large Language Models | StudySpace