HILL: A Hallucination Identifier for Large Language Models

HILL: A Hallucination Identifier for Large Language Models

May 11–16, 2024, Honolulu, HI, USA | Florian Leiser, Sven Eckhardt, Valentin Leuthe, Merlin Knaeble, Alexander Maedche, Gerhard Schwabe, Ali Sunyaev
**HILL: A Hallucination Identifier for Large Language Models** **Abstract:** Large language models (LLMs) are prone to generating hallucinations, which are nonsensical, unfaithful, and undesirable responses. Users often overrely on these responses, leading to misinterpretations and errors. To address this issue, the authors propose HILL, the *Hallucination Identifier for Large Language Models*. They first identified design features for HILL through a Wizard of Oz (WoZ) study with nine participants. Subsequently, they implemented HILL based on these features and evaluated its interface design through a survey of 17 participants. The functionality of HILL was also tested using an existing question-answering dataset and five user interviews. The results show that HILL can correctly identify and highlight hallucinations in LLM responses, enabling users to handle these responses with more caution. The authors propose an easy-to-implement adaptation to existing LLMs and emphasize the importance of user-centered designs in AI artifacts. **Keywords:** ChatGPT, Large Language Models, Artificial Hallucinations, Wizard of Oz, Artifact Development **Contributions:** - **Design Features:** HILL incorporates design features such as confidence scores, source links, monetary interest indicators, and political spectrum disclosures to help users identify potential hallucinations. - **Implementation:** HILL is developed as a web application that integrates with OpenAI's ChatGPT API, providing an intuitive user interface for identifying and addressing hallucinations. - **Evaluation:** HILL is evaluated through a survey, performance validation using the Stanford Question Answering Dataset (SQuAD 2.0), and user interviews, demonstrating its effectiveness in reducing overreliance on LLMs. **Conclusion:** HILL addresses the issue of overreliance on LLMs by providing users with tools to identify and handle hallucinations. The proposed design features and implementation of HILL offer a practical solution for improving the reliability and trustworthiness of LLM responses.**HILL: A Hallucination Identifier for Large Language Models** **Abstract:** Large language models (LLMs) are prone to generating hallucinations, which are nonsensical, unfaithful, and undesirable responses. Users often overrely on these responses, leading to misinterpretations and errors. To address this issue, the authors propose HILL, the *Hallucination Identifier for Large Language Models*. They first identified design features for HILL through a Wizard of Oz (WoZ) study with nine participants. Subsequently, they implemented HILL based on these features and evaluated its interface design through a survey of 17 participants. The functionality of HILL was also tested using an existing question-answering dataset and five user interviews. The results show that HILL can correctly identify and highlight hallucinations in LLM responses, enabling users to handle these responses with more caution. The authors propose an easy-to-implement adaptation to existing LLMs and emphasize the importance of user-centered designs in AI artifacts. **Keywords:** ChatGPT, Large Language Models, Artificial Hallucinations, Wizard of Oz, Artifact Development **Contributions:** - **Design Features:** HILL incorporates design features such as confidence scores, source links, monetary interest indicators, and political spectrum disclosures to help users identify potential hallucinations. - **Implementation:** HILL is developed as a web application that integrates with OpenAI's ChatGPT API, providing an intuitive user interface for identifying and addressing hallucinations. - **Evaluation:** HILL is evaluated through a survey, performance validation using the Stanford Question Answering Dataset (SQuAD 2.0), and user interviews, demonstrating its effectiveness in reducing overreliance on LLMs. **Conclusion:** HILL addresses the issue of overreliance on LLMs by providing users with tools to identify and handle hallucinations. The proposed design features and implementation of HILL offer a practical solution for improving the reliability and trustworthiness of LLM responses.
Reach us at info@study.space