Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

30 May 2019 | Xiaodong Liu1*, Pengcheng He2*, Weizhu Chen2, Jianfeng Gao1
This paper presents a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN combines multi-task learning with pre-trained language models, such as BERT, to improve the performance of NLU tasks. The model leverages cross-task data and benefits from a regularization effect that leads to more general representations, helping it adapt to new tasks and domains. MT-DNN extends the model proposed in Liu et al. (2015) by incorporating BERT as its shared text encoding layers. The model achieves new state-of-the-art results on ten NLU tasks, including SNLI, SciTail, and eight out of nine GLUE tasks, pushing the GLUE benchmark to 82.7% (2.2% absolute improvement). The model also demonstrates effective domain adaptation with substantially fewer in-domain labels than pre-trained BERT representations. The code and pre-trained models are publicly available at https://github.com/namisan/mt-dnn. The MT-DNN model combines four types of NLU tasks: single-sentence classification, pairwise text classification, text similarity scoring, and relevance ranking. The model uses a shared text encoding layer based on BERT, followed by task-specific layers for each task. The model is trained using a combination of pre-training and multi-task learning. The model achieves new state-of-the-art results on multiple NLU tasks, including SNLI and SciTail, outperforming previous state-of-the-art models. The model also demonstrates exceptional performance in domain adaptation experiments, achieving higher accuracy with fewer in-domain labels than pre-trained BERT representations. The results show that MT-DNN's multi-task learning approach leads to more effective domain adaptation than pre-trained BERT representations. The model is evaluated on three popular NLU benchmarks: GLUE, SNLI, and SciTail. The results show that MT-DNN outperforms existing state-of-the-art models on all tasks, except WNLI, and achieves new state-of-the-art results on eight GLUE tasks. The model also demonstrates effective domain adaptation on SNLI and SciTail datasets, achieving higher accuracy with fewer in-domain labels than pre-trained BERT representations. The results indicate that the representations learned by MT-DNN are more consistently effective for domain adaptation than BERT. The model is also compared with other models, including the Single-Task DNN (ST-DNN), and shows that the SAN answer module is effective for pairwise text classification tasks. The results demonstrate the effectiveness of the model's design choices and the importance of problem formulation in achieving high performance. The model is also evaluated on domain adaptation tasks, showing that MT-DNN outperforms BERT in all cases, even with very few in-domain labels. The results indicate that the representations learned by MT-DNN are more effective for domain adaptation than pre-trained BERT representations. The model is also compared with other strong baselines, showing that MTThis paper presents a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN combines multi-task learning with pre-trained language models, such as BERT, to improve the performance of NLU tasks. The model leverages cross-task data and benefits from a regularization effect that leads to more general representations, helping it adapt to new tasks and domains. MT-DNN extends the model proposed in Liu et al. (2015) by incorporating BERT as its shared text encoding layers. The model achieves new state-of-the-art results on ten NLU tasks, including SNLI, SciTail, and eight out of nine GLUE tasks, pushing the GLUE benchmark to 82.7% (2.2% absolute improvement). The model also demonstrates effective domain adaptation with substantially fewer in-domain labels than pre-trained BERT representations. The code and pre-trained models are publicly available at https://github.com/namisan/mt-dnn. The MT-DNN model combines four types of NLU tasks: single-sentence classification, pairwise text classification, text similarity scoring, and relevance ranking. The model uses a shared text encoding layer based on BERT, followed by task-specific layers for each task. The model is trained using a combination of pre-training and multi-task learning. The model achieves new state-of-the-art results on multiple NLU tasks, including SNLI and SciTail, outperforming previous state-of-the-art models. The model also demonstrates exceptional performance in domain adaptation experiments, achieving higher accuracy with fewer in-domain labels than pre-trained BERT representations. The results show that MT-DNN's multi-task learning approach leads to more effective domain adaptation than pre-trained BERT representations. The model is evaluated on three popular NLU benchmarks: GLUE, SNLI, and SciTail. The results show that MT-DNN outperforms existing state-of-the-art models on all tasks, except WNLI, and achieves new state-of-the-art results on eight GLUE tasks. The model also demonstrates effective domain adaptation on SNLI and SciTail datasets, achieving higher accuracy with fewer in-domain labels than pre-trained BERT representations. The results indicate that the representations learned by MT-DNN are more consistently effective for domain adaptation than BERT. The model is also compared with other models, including the Single-Task DNN (ST-DNN), and shows that the SAN answer module is effective for pairwise text classification tasks. The results demonstrate the effectiveness of the model's design choices and the importance of problem formulation in achieving high performance. The model is also evaluated on domain adaptation tasks, showing that MT-DNN outperforms BERT in all cases, even with very few in-domain labels. The results indicate that the representations learned by MT-DNN are more effective for domain adaptation than pre-trained BERT representations. The model is also compared with other strong baselines, showing that MT
Reach us at info@study.space
[slides and audio] Multi-Task Deep Neural Networks for Natural Language Understanding