The paper "ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence" by Kevin Wu explores the challenges and behaviors of large language models (LLMs) when presented with conflicting information from their internal knowledge and retrieved external content. The study introduces ClashEval, a dataset of over 1200 questions across six domains, each with perturbed answers ranging from subtle to blatant errors. Six top-performing LLMs, including GPT-4o, are benchmarked on this dataset to assess their ability to handle conflicting information.
Key findings include:
1. **Context Bias**: LLMs often override their correct prior knowledge over 60% of the time when presented with incorrect external content.
2. **Error Magnitude**: The more deviated the retrieved content is from the truth, the less likely the model is to adopt it.
3. **Model Confidence**: Models are more likely to adopt incorrect external content when they are less confident in their initial responses.
4. **Improvement Methods**: A simple method using token probabilities to correct for prior vs. context conflict improves overall accuracy by 14% and context bias by 20%.
The paper highlights the need for LLMs to better discern and reject incorrect external content while maintaining their internal knowledge, and provides a benchmark dataset and evaluations to facilitate future research in this area. The dataset and evaluations are open-sourced to encourage further development and improvement of LLMs' robustness and calibration.The paper "ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence" by Kevin Wu explores the challenges and behaviors of large language models (LLMs) when presented with conflicting information from their internal knowledge and retrieved external content. The study introduces ClashEval, a dataset of over 1200 questions across six domains, each with perturbed answers ranging from subtle to blatant errors. Six top-performing LLMs, including GPT-4o, are benchmarked on this dataset to assess their ability to handle conflicting information.
Key findings include:
1. **Context Bias**: LLMs often override their correct prior knowledge over 60% of the time when presented with incorrect external content.
2. **Error Magnitude**: The more deviated the retrieved content is from the truth, the less likely the model is to adopt it.
3. **Model Confidence**: Models are more likely to adopt incorrect external content when they are less confident in their initial responses.
4. **Improvement Methods**: A simple method using token probabilities to correct for prior vs. context conflict improves overall accuracy by 14% and context bias by 20%.
The paper highlights the need for LLMs to better discern and reject incorrect external content while maintaining their internal knowledge, and provides a benchmark dataset and evaluations to facilitate future research in this area. The dataset and evaluations are open-sourced to encourage further development and improvement of LLMs' robustness and calibration.