The paper introduces TESTAM, a novel Mixture-of-Experts (MoE) model designed to enhance traffic forecasting accuracy by capturing both recurring and non-recurring traffic patterns. TESTAM employs three experts for different spatial modeling methods: no spatial modeling, learnable static graph, and dynamic graph (attention). Each expert uses transformer-based blocks with distinct spatial modeling techniques. The gating network, reformulated as a classification task with pseudo labels, routes traffic conditions to the most appropriate expert. Experimental results on three real-world datasets (METR-LA, PEMS-BAY, and EXPY-TKY) demonstrate that TESTAM outperforms 13 existing methods in terms of accuracy, particularly in handling complex traffic scenarios and long-term predictions. The model's effectiveness is attributed to its ability to dynamically adjust spatial modeling methods based on traffic context, improving overall forecasting performance.The paper introduces TESTAM, a novel Mixture-of-Experts (MoE) model designed to enhance traffic forecasting accuracy by capturing both recurring and non-recurring traffic patterns. TESTAM employs three experts for different spatial modeling methods: no spatial modeling, learnable static graph, and dynamic graph (attention). Each expert uses transformer-based blocks with distinct spatial modeling techniques. The gating network, reformulated as a classification task with pseudo labels, routes traffic conditions to the most appropriate expert. Experimental results on three real-world datasets (METR-LA, PEMS-BAY, and EXPY-TKY) demonstrate that TESTAM outperforms 13 existing methods in terms of accuracy, particularly in handling complex traffic scenarios and long-term predictions. The model's effectiveness is attributed to its ability to dynamically adjust spatial modeling methods based on traffic context, improving overall forecasting performance.