TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

2024 | Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, Bin Yang
TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. We propose TFB, an automated benchmark for Time Series Forecasting (TSF) methods. TFB addresses shortcomings related to datasets, comparison methods, and evaluation pipelines: 1) insufficient coverage of data domains, 2) stereotype bias against traditional methods, and 3) inconsistent and inflexible pipelines. To achieve better domain coverage, we include datasets from 10 different domains: traffic, electricity, energy, the environment, nature, economic, stock markets, banking, health, and the web. We also provide a time series characterization to ensure that the selected datasets are comprehensive. To remove biases against some methods, we include a diverse range of methods, including statistical learning, machine learning, and deep learning methods, and we also support a variety of evaluation strategies and metrics to ensure a more comprehensive evaluation of different methods. To support the integration of different methods into the benchmark and enable fair comparisons, TFB features a flexible and scalable pipeline that eliminates biases. Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8,068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets. The results offer a deeper understanding of the forecasting methods, allowing us to better select the ones that are most suitable for particular datasets and settings. Overall, TFB and this evaluation provide researchers with improved means of designing new TSF methods. TFB is a comprehensive and fair benchmarking tool for time series forecasting methods. It includes a diverse set of datasets from 10 different domains, covering a wide range of characteristics. TFB supports a variety of evaluation strategies and metrics, and features a flexible and scalable pipeline that eliminates biases. TFB enables researchers to evaluate new forecasting methods more rigorously across diverse datasets, which is crucial for advancing the state-of-the-art in time series forecasting. TFB also provides a user-friendly, flexible, and scalable evaluation pipeline that offers robust evaluation support. TFB has the following key characteristics: a comprehensive collection of datasets organized according to a taxonomy, broad coverage of existing methods and extended support for evaluation strategies and metrics, and a flexible and scalable pipeline. Based on the experiments conducted using TFB, we make the following key observations: (1) The statistical methods VAR and LinearRegression perform better than recently proposed SOTA methods on some datasets. (2) Linear-based methods perform well when datasets exhibit an increasing trend or significant shifts. (3) Transformer-based methods outperform linear-based methods on datasets with marked seasonTFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. We propose TFB, an automated benchmark for Time Series Forecasting (TSF) methods. TFB addresses shortcomings related to datasets, comparison methods, and evaluation pipelines: 1) insufficient coverage of data domains, 2) stereotype bias against traditional methods, and 3) inconsistent and inflexible pipelines. To achieve better domain coverage, we include datasets from 10 different domains: traffic, electricity, energy, the environment, nature, economic, stock markets, banking, health, and the web. We also provide a time series characterization to ensure that the selected datasets are comprehensive. To remove biases against some methods, we include a diverse range of methods, including statistical learning, machine learning, and deep learning methods, and we also support a variety of evaluation strategies and metrics to ensure a more comprehensive evaluation of different methods. To support the integration of different methods into the benchmark and enable fair comparisons, TFB features a flexible and scalable pipeline that eliminates biases. Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8,068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets. The results offer a deeper understanding of the forecasting methods, allowing us to better select the ones that are most suitable for particular datasets and settings. Overall, TFB and this evaluation provide researchers with improved means of designing new TSF methods. TFB is a comprehensive and fair benchmarking tool for time series forecasting methods. It includes a diverse set of datasets from 10 different domains, covering a wide range of characteristics. TFB supports a variety of evaluation strategies and metrics, and features a flexible and scalable pipeline that eliminates biases. TFB enables researchers to evaluate new forecasting methods more rigorously across diverse datasets, which is crucial for advancing the state-of-the-art in time series forecasting. TFB also provides a user-friendly, flexible, and scalable evaluation pipeline that offers robust evaluation support. TFB has the following key characteristics: a comprehensive collection of datasets organized according to a taxonomy, broad coverage of existing methods and extended support for evaluation strategies and metrics, and a flexible and scalable pipeline. Based on the experiments conducted using TFB, we make the following key observations: (1) The statistical methods VAR and LinearRegression perform better than recently proposed SOTA methods on some datasets. (2) Linear-based methods perform well when datasets exhibit an increasing trend or significant shifts. (3) Transformer-based methods outperform linear-based methods on datasets with marked season
Reach us at info@study.space