Language Models Still Struggle to Zero-shot Reason about Time Series

Language Models Still Struggle to Zero-shot Reason about Time Series

2024-04-17 | Mike A. Merrill, Mingtian Tan, Vinayak Gupta, Tom Hartvigsen, Tim Althoff
Language models still struggle to reason about time series in zero-shot settings. This study introduces a novel evaluation framework for time series reasoning, including formal tasks and a dataset of multi-scale time series paired with text captions across ten domains. The framework assesses three forms of time series reasoning: etiological reasoning (identifying the most likely scenario that created a time series), question answering (answering factual questions about time series), and context-aided forecasting (using textual context to improve time series forecasts). Results show that despite their strong performance in other tasks, language models perform poorly in these time series reasoning tasks, scoring marginally above random on etiological and question answering tasks and showing modest success in using context to improve forecasting. These findings indicate that time series reasoning is an underdeveloped area for language model research. The dataset and code are publicly available at https://github.com/behavioral-data/TSandLanguage.Language models still struggle to reason about time series in zero-shot settings. This study introduces a novel evaluation framework for time series reasoning, including formal tasks and a dataset of multi-scale time series paired with text captions across ten domains. The framework assesses three forms of time series reasoning: etiological reasoning (identifying the most likely scenario that created a time series), question answering (answering factual questions about time series), and context-aided forecasting (using textual context to improve time series forecasts). Results show that despite their strong performance in other tasks, language models perform poorly in these time series reasoning tasks, scoring marginally above random on etiological and question answering tasks and showing modest success in using context to improve forecasting. These findings indicate that time series reasoning is an underdeveloped area for language model research. The dataset and code are publicly available at https://github.com/behavioral-data/TSandLanguage.
Reach us at info@study.space