This paper introduces a new benchmark, RetrievalQA, to evaluate adaptive retrieval-augmented generation (ARAG) methods for short-form open-domain question answering (QA). RetrievalQA consists of 1,271 questions covering new world and long-tail knowledge, where the necessary information to answer is absent from large language models (LLMs). The authors find that calibration-based methods heavily rely on threshold tuning, while vanilla prompting is inadequate for guiding LLMs to make reliable retrieval decisions. They propose Time-Aware Adaptive REtrieval (TA-ARE), a simple yet effective method that helps LLMs assess the necessity of retrieval without calibration or additional training. TA-ARE uses in-context learning to enhance models' awareness of time and provides relevant demonstrations, significantly improving retrieval accuracy and answer match accuracy. The paper also discusses the limitations of the dataset and future work, emphasizing the need for more efficient and rigorous filtering methods and further research on prompt tuning.This paper introduces a new benchmark, RetrievalQA, to evaluate adaptive retrieval-augmented generation (ARAG) methods for short-form open-domain question answering (QA). RetrievalQA consists of 1,271 questions covering new world and long-tail knowledge, where the necessary information to answer is absent from large language models (LLMs). The authors find that calibration-based methods heavily rely on threshold tuning, while vanilla prompting is inadequate for guiding LLMs to make reliable retrieval decisions. They propose Time-Aware Adaptive REtrieval (TA-ARE), a simple yet effective method that helps LLMs assess the necessity of retrieval without calibration or additional training. TA-ARE uses in-context learning to enhance models' awareness of time and provides relevant demonstrations, significantly improving retrieval accuracy and answer match accuracy. The paper also discusses the limitations of the dataset and future work, emphasizing the need for more efficient and rigorous filtering methods and further research on prompt tuning.