This paper introduces two novel approaches for identifying knowledge gaps in large language models (LLMs): COOPERATE and COMPETE, which leverage multi-LLM collaboration to improve the accuracy of abstaining from answering questions when knowledge gaps exist. The authors first review existing methods for LLM abstention, including calibration-based, training-based, prompting-based, and self-consistency-based approaches. However, these methods often rely on held-out sets and may not be reliable due to hallucinations and confirmation biases. To address these limitations, the authors propose COOPERATE and COMPETE, which involve multiple LLMs working together to evaluate the accuracy of generated answers. In COOPERATE, different expert LLMs provide feedback on the generated answer, and a judge LLM decides whether to abstain. In COMPETE, LLMs are challenged by other LLMs with conflicting knowledge, and the LLM abstains if it cannot stick to the original answer. The authors evaluate these approaches on four knowledge-intensive QA tasks with three LLMs and find that both COOPERATE and COMPETE achieve up to 19.3% improvements in abstain accuracy compared to the strongest baseline. The results show that these approaches are effective in identifying knowledge gaps in retrieval augmentation and multi-hop reasoning. The authors also discuss the limitations of their approach, including computational overhead and potential biases in LLMs. Overall, the study highlights the importance of identifying knowledge gaps in LLMs to improve their reliability and reduce hallucinations.This paper introduces two novel approaches for identifying knowledge gaps in large language models (LLMs): COOPERATE and COMPETE, which leverage multi-LLM collaboration to improve the accuracy of abstaining from answering questions when knowledge gaps exist. The authors first review existing methods for LLM abstention, including calibration-based, training-based, prompting-based, and self-consistency-based approaches. However, these methods often rely on held-out sets and may not be reliable due to hallucinations and confirmation biases. To address these limitations, the authors propose COOPERATE and COMPETE, which involve multiple LLMs working together to evaluate the accuracy of generated answers. In COOPERATE, different expert LLMs provide feedback on the generated answer, and a judge LLM decides whether to abstain. In COMPETE, LLMs are challenged by other LLMs with conflicting knowledge, and the LLM abstains if it cannot stick to the original answer. The authors evaluate these approaches on four knowledge-intensive QA tasks with three LLMs and find that both COOPERATE and COMPETE achieve up to 19.3% improvements in abstain accuracy compared to the strongest baseline. The results show that these approaches are effective in identifying knowledge gaps in retrieval augmentation and multi-hop reasoning. The authors also discuss the limitations of their approach, including computational overhead and potential biases in LLMs. Overall, the study highlights the importance of identifying knowledge gaps in LLMs to improve their reliability and reduce hallucinations.