FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

23 May 2024 | Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, Suchi Saria
FuseMoE is a mixture-of-experts (MoE) framework designed to handle FlexiModal data, which involves a variable number of modalities with potential missingness and irregular sampling. The framework incorporates a novel Laplace gating function that improves convergence rates and enhances predictive performance. FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data by dynamically adjusting the influence of experts responsible for the absent data. The model's MoE fusion layer uses a routing mechanism to categorize multimodal inputs and direct them to appropriate combinations of MLPs. The outputs from these MLPs are weighted through a gating function, resulting in fused embeddings used for further processing. The Laplace gating function is theoretically proven to ensure better convergence rates compared to traditional Softmax functions, leading to enhanced performance in multiple downstream tasks. The model was evaluated on various benchmarks, including MIMIC-III, MIMIC-IV, CMU-MOSI, MOSEI, PAM, and CIFAR-10, demonstrating superior performance in handling multimodal data with irregularities and missing modalities. Theoretical analysis shows that the Laplace gating function provides better convergence rates than the standard Softmax gating function in MoE. The model's ability to handle a wide range of modalities and manage missing and irregular data makes it a promising approach for multimodal fusion tasks.FuseMoE is a mixture-of-experts (MoE) framework designed to handle FlexiModal data, which involves a variable number of modalities with potential missingness and irregular sampling. The framework incorporates a novel Laplace gating function that improves convergence rates and enhances predictive performance. FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data by dynamically adjusting the influence of experts responsible for the absent data. The model's MoE fusion layer uses a routing mechanism to categorize multimodal inputs and direct them to appropriate combinations of MLPs. The outputs from these MLPs are weighted through a gating function, resulting in fused embeddings used for further processing. The Laplace gating function is theoretically proven to ensure better convergence rates compared to traditional Softmax functions, leading to enhanced performance in multiple downstream tasks. The model was evaluated on various benchmarks, including MIMIC-III, MIMIC-IV, CMU-MOSI, MOSEI, PAM, and CIFAR-10, demonstrating superior performance in handling multimodal data with irregularities and missing modalities. Theoretical analysis shows that the Laplace gating function provides better convergence rates than the standard Softmax gating function in MoE. The model's ability to handle a wide range of modalities and manage missing and irregular data makes it a promising approach for multimodal fusion tasks.
Reach us at info@study.space
[slides and audio] FuseMoE%3A Mixture-of-Experts Transformers for Fleximodal Fusion