This paper compares Retrieval Augmented Generation (RAG) and long-context (LC) Large Language Models (LLMs) to leverage their strengths in handling lengthy contexts. RAG has been effective in processing long contexts by retrieving relevant information, while recent LLMs like Gemini-1.5 and GPT-4 can understand long contexts directly. The study benchmarks RAG and LC on various public datasets using three latest LLMs: Gemini-1.5-Pro, GPT-4O, and GPT-3.5-Turbo. Results show that LC consistently outperforms RAG when resources are sufficient, but RAG remains cost-efficient. To combine the benefits of both, the authors propose SELF-ROUTE, a method that routes queries to RAG or LC based on model self-reflection, significantly reducing computational costs while maintaining comparable performance to LC. The findings provide guidelines for practical applications of long-context LLMs using RAG and LC.This paper compares Retrieval Augmented Generation (RAG) and long-context (LC) Large Language Models (LLMs) to leverage their strengths in handling lengthy contexts. RAG has been effective in processing long contexts by retrieving relevant information, while recent LLMs like Gemini-1.5 and GPT-4 can understand long contexts directly. The study benchmarks RAG and LC on various public datasets using three latest LLMs: Gemini-1.5-Pro, GPT-4O, and GPT-3.5-Turbo. Results show that LC consistently outperforms RAG when resources are sufficient, but RAG remains cost-efficient. To combine the benefits of both, the authors propose SELF-ROUTE, a method that routes queries to RAG or LC based on model self-reflection, significantly reducing computational costs while maintaining comparable performance to LC. The findings provide guidelines for practical applications of long-context LLMs using RAG and LC.