16 Mar 2024 | Antonio Jimeno Yepes, Yao You, Jan Milczek, Sebastian Laverde, and Leah Li
The paper "Financial Report Chunking for Effective Retrieval Augmented Generation" by Antonio Jimeno Yepes, Yao You, Jan Milczek, Sebastian Laverde, and Leah Li explores the importance of effective chunking in Retrieval Augmented Generation (RAG) for financial reports. The authors propose an expanded approach to chunking by focusing on structural elements of documents rather than just paragraph-level chunking. They introduce a novel framework that evaluates how chunking based on element types annotated by document understanding models contributes to the overall context and accuracy of the information retrieved. The study uses the FinanceBench dataset, which includes 84 unique reports with 141 questions, to evaluate different chunking strategies. The results show that element-based chunking strategies outperform basic chunking methods in terms of retrieval accuracy and question-answering (Q&A) accuracy. The element-based method also demonstrates better efficiency, requiring fewer chunks to achieve superior retrieval scores. The authors conclude that their approach, which they call "element-based chunking," improves state-of-the-art Q&A performance and provides a more generalizable solution for RAG tasks. Future work includes evaluating the method in other domains and studying the impact of RAG configuration and additional element types.The paper "Financial Report Chunking for Effective Retrieval Augmented Generation" by Antonio Jimeno Yepes, Yao You, Jan Milczek, Sebastian Laverde, and Leah Li explores the importance of effective chunking in Retrieval Augmented Generation (RAG) for financial reports. The authors propose an expanded approach to chunking by focusing on structural elements of documents rather than just paragraph-level chunking. They introduce a novel framework that evaluates how chunking based on element types annotated by document understanding models contributes to the overall context and accuracy of the information retrieved. The study uses the FinanceBench dataset, which includes 84 unique reports with 141 questions, to evaluate different chunking strategies. The results show that element-based chunking strategies outperform basic chunking methods in terms of retrieval accuracy and question-answering (Q&A) accuracy. The element-based method also demonstrates better efficiency, requiring fewer chunks to achieve superior retrieval scores. The authors conclude that their approach, which they call "element-based chunking," improves state-of-the-art Q&A performance and provides a more generalizable solution for RAG tasks. Future work includes evaluating the method in other domains and studying the impact of RAG configuration and additional element types.