23 May 2024 | Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, Shu-tao Xia
The paper introduces CALF, a novel cross-modal fine-tuning framework for multivariate time series forecasting (MTSF) that addresses the distribution discrepancy between textual and temporal input tokens. Unlike existing methods that focus on adapting large language models (LLMs), CALF emphasizes aligning the input distributions of textual and temporal data through three key techniques: the Cross-Modal Match Module, Feature Regularization Loss, and Output Consistency Loss. The Cross-Modal Match Module aligns time series and textual inputs using principal word embeddings and cross-attention. The Feature Regularization Loss ensures intermediate features between the two branches are aligned for better weight updates, while the Output Consistency Loss ensures effective correspondence between output representations of both modalities. CALF achieves state-of-the-art performance in both long-term and short-term forecasting tasks with low computational complexity and exhibits strong few-shot and zero-shot abilities similar to those in LLMs. Extensive experiments on eight real-world datasets demonstrate that CALF outperforms existing methods in terms of accuracy and efficiency. The framework is designed to bridge the modality gap between textual and temporal data, enabling more effective and accurate time series forecasting.The paper introduces CALF, a novel cross-modal fine-tuning framework for multivariate time series forecasting (MTSF) that addresses the distribution discrepancy between textual and temporal input tokens. Unlike existing methods that focus on adapting large language models (LLMs), CALF emphasizes aligning the input distributions of textual and temporal data through three key techniques: the Cross-Modal Match Module, Feature Regularization Loss, and Output Consistency Loss. The Cross-Modal Match Module aligns time series and textual inputs using principal word embeddings and cross-attention. The Feature Regularization Loss ensures intermediate features between the two branches are aligned for better weight updates, while the Output Consistency Loss ensures effective correspondence between output representations of both modalities. CALF achieves state-of-the-art performance in both long-term and short-term forecasting tasks with low computational complexity and exhibits strong few-shot and zero-shot abilities similar to those in LLMs. Extensive experiments on eight real-world datasets demonstrate that CALF outperforms existing methods in terms of accuracy and efficiency. The framework is designed to bridge the modality gap between textual and temporal data, enabling more effective and accurate time series forecasting.