Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

August 19-23, 2018 | Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, Ed H. Chi
This paper proposes a novel multi-task learning approach called Multi-gate Mixture-of-Experts (MMoE), which explicitly models task relationships from data. The MMoE structure shares expert submodels across all tasks while also training a gating network to optimize each task. The approach is validated on synthetic data with varying levels of task relatedness, where MMoE outperforms baseline methods, especially when tasks are less related. Additionally, MMoE shows improved performance on real-world tasks, including a binary classification benchmark and a large-scale content recommendation system at Google. The MMoE model is inspired by the Mixture-of-Experts (MoE) model and recent MoE layers. It allows parameters to be automatically allocated to capture either shared task information or task-specific information, avoiding the need to add many new parameters per task. The backbone of MMoE is built upon the Shared-Bottom multi-task DNN structure, which is widely used in multi-task learning. The Shared-Bottom model has a shared bottom network and individual "tower" networks for each task. MMoE introduces a group of bottom networks, each called an expert, and a gating network for each task to control how experts are used. The paper evaluates MMoE on synthetic data generated with controlled task relatedness, showing that MMoE is easier to train and converges to a better loss than the Shared-Bottom model. It also demonstrates improved performance on real-world tasks, including a benchmark dataset and a large-scale recommendation system. The MMoE model is shown to be more effective than other multi-task learning models in handling tasks with low relatedness, and it is more efficient in terms of computational resources compared to other approaches. The paper concludes that MMoE provides a more effective and efficient approach to multi-task learning by explicitly modeling task relationships.This paper proposes a novel multi-task learning approach called Multi-gate Mixture-of-Experts (MMoE), which explicitly models task relationships from data. The MMoE structure shares expert submodels across all tasks while also training a gating network to optimize each task. The approach is validated on synthetic data with varying levels of task relatedness, where MMoE outperforms baseline methods, especially when tasks are less related. Additionally, MMoE shows improved performance on real-world tasks, including a binary classification benchmark and a large-scale content recommendation system at Google. The MMoE model is inspired by the Mixture-of-Experts (MoE) model and recent MoE layers. It allows parameters to be automatically allocated to capture either shared task information or task-specific information, avoiding the need to add many new parameters per task. The backbone of MMoE is built upon the Shared-Bottom multi-task DNN structure, which is widely used in multi-task learning. The Shared-Bottom model has a shared bottom network and individual "tower" networks for each task. MMoE introduces a group of bottom networks, each called an expert, and a gating network for each task to control how experts are used. The paper evaluates MMoE on synthetic data generated with controlled task relatedness, showing that MMoE is easier to train and converges to a better loss than the Shared-Bottom model. It also demonstrates improved performance on real-world tasks, including a benchmark dataset and a large-scale recommendation system. The MMoE model is shown to be more effective than other multi-task learning models in handling tasks with low relatedness, and it is more efficient in terms of computational resources compared to other approaches. The paper concludes that MMoE provides a more effective and efficient approach to multi-task learning by explicitly modeling task relationships.
Reach us at info@study.space
[slides and audio] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts