7 Mar 2024 | Heydar Soudani, Evangelos Kanoulas, Faegheh Hasibi
This paper compares Retrieval-Augmented Generation (RAG) and fine-tuning (FT) for improving the performance of large language models (LLMs) on low-frequency factual knowledge. The study evaluates these two approaches on the POPQA dataset, which contains questions about less popular entities. The results show that fine-tuning significantly improves performance across all entity popularity levels, especially for the most and least popular entities. RAG consistently outperforms fine-tuning, particularly when combined with FT in smaller models. The effectiveness of both methods increases with improvements in retrieval and data augmentation techniques. The study also highlights the importance of synthetic data quality, showing that high-quality synthetic data leads to better performance. The findings suggest that RAG is more effective for less popular knowledge, especially when used in combination with fine-tuning. However, the advantage of RAG diminishes in larger models. The study concludes that both RAG and FT benefit from improvements in retrieval and data augmentation models. Future work will focus on developing more effective methods for synthetic data generation.This paper compares Retrieval-Augmented Generation (RAG) and fine-tuning (FT) for improving the performance of large language models (LLMs) on low-frequency factual knowledge. The study evaluates these two approaches on the POPQA dataset, which contains questions about less popular entities. The results show that fine-tuning significantly improves performance across all entity popularity levels, especially for the most and least popular entities. RAG consistently outperforms fine-tuning, particularly when combined with FT in smaller models. The effectiveness of both methods increases with improvements in retrieval and data augmentation techniques. The study also highlights the importance of synthetic data quality, showing that high-quality synthetic data leads to better performance. The findings suggest that RAG is more effective for less popular knowledge, especially when used in combination with fine-tuning. However, the advantage of RAG diminishes in larger models. The study concludes that both RAG and FT benefit from improvements in retrieval and data augmentation models. Future work will focus on developing more effective methods for synthetic data generation.