FinTextQA: A Dataset for Long-form Financial Question Answering

FinTextQA: A Dataset for Long-form Financial Question Answering

16 May 2024 | Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang
**FinTextQA: A Dataset for Long-form Financial Question Answering** This paper introduces *FinTextQA*, a novel dataset designed for long-form question answering (LFQA) in the finance domain. The dataset comprises 1,262 high-quality, source-attributed QA pairs extracted from finance textbooks and government agency websites, covering six question types with an average text length of 19.7k words. *FinTextQA* is the first LFQA dataset specifically tailored for finance, addressing the lack of scope diversity and question complexity in existing financial QA datasets. The authors also develop a Retrieval-Augmented Generation (RAG)-based LFQA system, consisting of an embedder, retriever, reranker, and generator. They conduct a comprehensive evaluation using human ranking, automatic metrics, and GPT-4 scoring to benchmark the performance of different system configurations under noisy conditions. Key findings include: 1. **Model Performance**: Baichuan2-7B competes closely with GPT-3.5-turbo in accuracy. 2. **Best System Configuration**: The most effective configuration involves Ada2 as the embedder, Automated Merged Retrieval as the retriever, Bge-Reranker-Base as the reranker, and Baichuan2-7B as the generator. 3. **Noise Resistance**: Models are less susceptible to noise when the length of contexts reaches a specific threshold. The paper highlights the importance of integrating financial regulations and policies into LFQA tasks and provides a rich framework for building and assessing general finance LFQA systems. The experimental analysis underscores the need for enhancing current methodologies to improve both precision and explicability in financial question-answering systems.**FinTextQA: A Dataset for Long-form Financial Question Answering** This paper introduces *FinTextQA*, a novel dataset designed for long-form question answering (LFQA) in the finance domain. The dataset comprises 1,262 high-quality, source-attributed QA pairs extracted from finance textbooks and government agency websites, covering six question types with an average text length of 19.7k words. *FinTextQA* is the first LFQA dataset specifically tailored for finance, addressing the lack of scope diversity and question complexity in existing financial QA datasets. The authors also develop a Retrieval-Augmented Generation (RAG)-based LFQA system, consisting of an embedder, retriever, reranker, and generator. They conduct a comprehensive evaluation using human ranking, automatic metrics, and GPT-4 scoring to benchmark the performance of different system configurations under noisy conditions. Key findings include: 1. **Model Performance**: Baichuan2-7B competes closely with GPT-3.5-turbo in accuracy. 2. **Best System Configuration**: The most effective configuration involves Ada2 as the embedder, Automated Merged Retrieval as the retriever, Bge-Reranker-Base as the reranker, and Baichuan2-7B as the generator. 3. **Noise Resistance**: Models are less susceptible to noise when the length of contexts reaches a specific threshold. The paper highlights the importance of integrating financial regulations and policies into LFQA tasks and provides a rich framework for building and assessing general finance LFQA systems. The experimental analysis underscores the need for enhancing current methodologies to improve both precision and explicability in financial question-answering systems.
Reach us at info@study.space