August 19-23, 2018, London, United Kingdom | Jiaqi Ma1*, Zhe Zhao2, Xinyang Yi2, Jilin Chen2, Lichan Hong2, Ed H. Chi2
This paper introduces a novel multi-task learning approach called Multi-gate Mixture-of-Experts (MMoE), which explicitly models task relationships from data. The authors adapt the Mixture-of-Experts (MoE) structure to multi-task learning by sharing expert submodels across all tasks and using a gating network to optimize each task. The proposed method is evaluated on synthetic datasets with controlled task-relatedness, showing improved performance over baseline methods when tasks are less related. Additionally, the MMoE structure is found to enhance trainability, especially in scenarios with high randomness in training data and model initialization. The effectiveness of MMoE is further demonstrated on real-world benchmarks, including a binary classification benchmark and a large-scale content recommendation system at Google, where it outperforms state-of-the-art multi-task learning models in terms of offline metrics and live experiment results. The key contributions of the paper include the introduction of MMoE, its superior performance in handling task-relatedness, and its improved trainability.This paper introduces a novel multi-task learning approach called Multi-gate Mixture-of-Experts (MMoE), which explicitly models task relationships from data. The authors adapt the Mixture-of-Experts (MoE) structure to multi-task learning by sharing expert submodels across all tasks and using a gating network to optimize each task. The proposed method is evaluated on synthetic datasets with controlled task-relatedness, showing improved performance over baseline methods when tasks are less related. Additionally, the MMoE structure is found to enhance trainability, especially in scenarios with high randomness in training data and model initialization. The effectiveness of MMoE is further demonstrated on real-world benchmarks, including a binary classification benchmark and a large-scale content recommendation system at Google, where it outperforms state-of-the-art multi-task learning models in terms of offline metrics and live experiment results. The key contributions of the paper include the introduction of MMoE, its superior performance in handling task-relatedness, and its improved trainability.