Large Language Models Can Learn Temporal Reasoning

Large Language Models Can Learn Temporal Reasoning

11 Jun 2024 | Siheng Xiong, Ali Payani, Ramana Kompella, Faramarz Fekri
This paper introduces TG-LLM, a novel framework for language-based temporal reasoning (TR) that enhances the ability of large language models (LLMs) to perform TR tasks. The framework involves two main steps: (1) translating text into a temporal graph (TG) representation, and (2) performing reasoning over the TG. A synthetic dataset, TGQA, is constructed to facilitate the fine-tuning of LLMs on the text-to-TG translation task. This dataset is fully controllable and requires minimal supervision, enabling the transfer of learned TR capabilities to other tasks and benchmarks. The framework leverages Chain-of-Thought (CoT) bootstrapping and graph data augmentation to improve the reliability of CoTs and final results. CoT bootstrapping involves generating multiple CoTs and selecting the most useful ones for training, while graph data augmentation introduces variations in the TG to enhance model performance. The proposed methods are evaluated on multiple datasets, including TGQA, TimeQA, and TempReason, demonstrating that TG-LLM achieves superior performance compared to existing approaches. The paper also presents an ablation study showing that explicit temporal graphs and external knowledge significantly improve model performance. Additionally, the framework is shown to generalize well to different tasks, indicating that the learned TR capabilities are transferable across various data distributions. The study highlights the importance of structured reasoning and the potential of TG-LLM in improving the performance of LLMs on complex temporal reasoning tasks. The results suggest that the two-step framework provides a more effective approach to temporal reasoning, offering a promising direction for future research in this area.This paper introduces TG-LLM, a novel framework for language-based temporal reasoning (TR) that enhances the ability of large language models (LLMs) to perform TR tasks. The framework involves two main steps: (1) translating text into a temporal graph (TG) representation, and (2) performing reasoning over the TG. A synthetic dataset, TGQA, is constructed to facilitate the fine-tuning of LLMs on the text-to-TG translation task. This dataset is fully controllable and requires minimal supervision, enabling the transfer of learned TR capabilities to other tasks and benchmarks. The framework leverages Chain-of-Thought (CoT) bootstrapping and graph data augmentation to improve the reliability of CoTs and final results. CoT bootstrapping involves generating multiple CoTs and selecting the most useful ones for training, while graph data augmentation introduces variations in the TG to enhance model performance. The proposed methods are evaluated on multiple datasets, including TGQA, TimeQA, and TempReason, demonstrating that TG-LLM achieves superior performance compared to existing approaches. The paper also presents an ablation study showing that explicit temporal graphs and external knowledge significantly improve model performance. Additionally, the framework is shown to generalize well to different tasks, indicating that the learned TR capabilities are transferable across various data distributions. The study highlights the importance of structured reasoning and the potential of TG-LLM in improving the performance of LLMs on complex temporal reasoning tasks. The results suggest that the two-step framework provides a more effective approach to temporal reasoning, offering a promising direction for future research in this area.
Reach us at info@study.space