[slides] TOTEM%3A TOkenized Time Series EMbeddings for General Time Series Analysis

TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis **Abstract:** This paper explores the unification of modeling in general time series analysis, focusing on cross-task and cross-domain training. TOTEM, or TOkenized Time Series EMbeddings, proposes a simple tokenizer architecture that embeds time series data from various domains using a discrete vectorized representation learned in a self-supervised manner. The method enables generalist training across multiple tasks and domains with minimal to no tuning. Extensive evaluations on 17 real-world datasets across three tasks (imputation, anomaly detection, and forecasting) demonstrate that TOTEM matches or outperforms state-of-the-art methods in both in-domain and zero-shot testing regimes. **Introduction:** Time series analysis has traditionally been conducted via specialist-training, where models are trained on a single domain. Recent advancements have led to the exploration of generalist-training, where models are trained on multiple domains. TOTEM addresses this by using a VQVAE-based tokenizer that operates directly on time steps, eliminating the need for data engineering. This tokenizer handles varying dimensionality and can be applied to different tasks, including imputation, anomaly detection, and forecasting. **Method:** TOTEM's VQVAE architecture consists of an encoder, quantizer, latent codebook, and decoder. It operates on univariate time series data, creating discrete, non-overlapping tokens along the time dimension. The codebook captures maximal information within a large receptive field, enabling generalist training and zero-shot testing. **Experimental Setup:** Experiments are conducted on 17 benchmark datasets, including both specialist and generalist settings. TOTEM is compared against state-of-the-art multitask and singletask models, showing superior performance in imputation, anomaly detection, and forecasting tasks. **Results:** TOTEM outperforms or matches the best methods on several popular benchmarks, demonstrating its effectiveness in generalist modeling. Ablation studies and exploratory results further validate the benefits of TOTEM's tokenized representation. **Conclusion:** TOTEM provides a simple and effective solution for general time series analysis, enabling unified and cross-domain training with minimal tuning. Future work could explore dynamic token lengths and the impact of data size and diversity on generalist performance.TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis **Abstract:** This paper explores the unification of modeling in general time series analysis, focusing on cross-task and cross-domain training. TOTEM, or TOkenized Time Series EMbeddings, proposes a simple tokenizer architecture that embeds time series data from various domains using a discrete vectorized representation learned in a self-supervised manner. The method enables generalist training across multiple tasks and domains with minimal to no tuning. Extensive evaluations on 17 real-world datasets across three tasks (imputation, anomaly detection, and forecasting) demonstrate that TOTEM matches or outperforms state-of-the-art methods in both in-domain and zero-shot testing regimes. **Introduction:** Time series analysis has traditionally been conducted via specialist-training, where models are trained on a single domain. Recent advancements have led to the exploration of generalist-training, where models are trained on multiple domains. TOTEM addresses this by using a VQVAE-based tokenizer that operates directly on time steps, eliminating the need for data engineering. This tokenizer handles varying dimensionality and can be applied to different tasks, including imputation, anomaly detection, and forecasting. **Method:** TOTEM's VQVAE architecture consists of an encoder, quantizer, latent codebook, and decoder. It operates on univariate time series data, creating discrete, non-overlapping tokens along the time dimension. The codebook captures maximal information within a large receptive field, enabling generalist training and zero-shot testing. **Experimental Setup:** Experiments are conducted on 17 benchmark datasets, including both specialist and generalist settings. TOTEM is compared against state-of-the-art multitask and singletask models, showing superior performance in imputation, anomaly detection, and forecasting tasks. **Results:** TOTEM outperforms or matches the best methods on several popular benchmarks, demonstrating its effectiveness in generalist modeling. Ablation studies and exploratory results further validate the benefits of TOTEM's tokenized representation. **Conclusion:** TOTEM provides a simple and effective solution for general time series analysis, enabling unified and cross-domain training with minimal tuning. Future work could explore dynamic token lengths and the impact of data size and diversity on generalist performance.

TOTEM: TOKENIZED TIME SERIES EMBEDDINGS FOR GENERAL TIME SERIES ANALYSIS

26 Feb 2024 | Sabera Talukder, Yisong Yue, Georgia Gkioxari