Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

17 Oct 2022 | Pan Lu1,3, Swaroop Mishra2,3, Tony Xia1, Liang Qiu1, Kai-Wei Chang1, Song-Chun Zhu1, Oyvind Tafjord3, Peter Clark3, Ashwin Kalyan3
The paper introduces SCIENCEQA, a new benchmark dataset for science question answering that includes 21,208 multimodal multiple-choice questions with diverse science topics and annotations of answers, lectures, and explanations. The dataset aims to bridge the gap in existing datasets by providing rich domain diversity and detailed explanations. The authors design language models to learn to generate lectures and explanations as the *chain of thought* (CoT) to mimic human reasoning processes. Experiments show that CoT improves the performance of language models on SCIENCEQA, with UnifiedQA achieving a 3.99% improvement in fine-tuning and GPT-3 achieving a 1.20% improvement in few-shot learning. Additionally, the authors explore the upper bound of models by feeding explanations into the input, finding an 18.96% improvement in GPT-3's few-shot performance. The analysis further shows that language models benefit from explanations, learning effectively with less data. The paper contributes to the development of reliable AI systems capable of multi-hop reasoning and generating coherent explanations.The paper introduces SCIENCEQA, a new benchmark dataset for science question answering that includes 21,208 multimodal multiple-choice questions with diverse science topics and annotations of answers, lectures, and explanations. The dataset aims to bridge the gap in existing datasets by providing rich domain diversity and detailed explanations. The authors design language models to learn to generate lectures and explanations as the *chain of thought* (CoT) to mimic human reasoning processes. Experiments show that CoT improves the performance of language models on SCIENCEQA, with UnifiedQA achieving a 3.99% improvement in fine-tuning and GPT-3 achieving a 1.20% improvement in few-shot learning. Additionally, the authors explore the upper bound of models by feeding explanations into the input, finding an 18.96% improvement in GPT-3's few-shot performance. The analysis further shows that language models benefit from explanations, learning effectively with less data. The paper contributes to the development of reliable AI systems capable of multi-hop reasoning and generating coherent explanations.
Reach us at info@study.space