13 Jun 2024 | Zhaochen Su, Juntao Li, Jun Zhang, Tong Zhu, Xiaoye Qu, Pan Zhou, Bowen Yan, Yu Cheng, Min Zhang
Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?
This paper introduces CoTEMP-QA, a comprehensive co-temporal Question Answering benchmark with four scenarios (Equal, Overlap, During, Mix) and 4,748 samples to evaluate large language models (LLMs) in co-temporal reasoning. Current LLMs struggle with co-temporal tasks, even with Chain of Thought (CoT) methods. The authors find that mathematical reasoning plays a key role in handling co-temporal events and propose a math-based strategy (MR-CoT) to improve LLMs' co-temporal reasoning. Experiments show that MR-CoT improves performance by 10.8 points over existing baselines. However, LLMs still fall short of human-level performance, indicating the need for further improvements. The paper also discusses the limitations of current datasets and the importance of exploring concurrent temporal relationships in real-world scenarios. The authors hope that their CoTEMPQA dataset will encourage further research into improving LLMs' co-temporal reasoning abilities.Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?
This paper introduces CoTEMP-QA, a comprehensive co-temporal Question Answering benchmark with four scenarios (Equal, Overlap, During, Mix) and 4,748 samples to evaluate large language models (LLMs) in co-temporal reasoning. Current LLMs struggle with co-temporal tasks, even with Chain of Thought (CoT) methods. The authors find that mathematical reasoning plays a key role in handling co-temporal events and propose a math-based strategy (MR-CoT) to improve LLMs' co-temporal reasoning. Experiments show that MR-CoT improves performance by 10.8 points over existing baselines. However, LLMs still fall short of human-level performance, indicating the need for further improvements. The paper also discusses the limitations of current datasets and the importance of exploring concurrent temporal relationships in real-world scenarios. The authors hope that their CoTEMPQA dataset will encourage further research into improving LLMs' co-temporal reasoning abilities.