This paper introduces a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN combines multi-task learning (MTL) and language model pre-training, leveraging large amounts of cross-task data and a regularization effect to enhance generalization. The model incorporates a pre-trained bidirectional transformer language model (BERT) to improve text representation. MT-DNN achieves state-of-the-art results on ten NLU tasks, including SNLI, SciTail, and eight out of nine GLUE tasks, pushing the GLUE benchmark to 82.7%. The model demonstrates superior domain adaptation capabilities with fewer in-domain labels compared to pre-trained BERT representations. The code and pre-trained models are publicly available.This paper introduces a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN combines multi-task learning (MTL) and language model pre-training, leveraging large amounts of cross-task data and a regularization effect to enhance generalization. The model incorporates a pre-trained bidirectional transformer language model (BERT) to improve text representation. MT-DNN achieves state-of-the-art results on ten NLU tasks, including SNLI, SciTail, and eight out of nine GLUE tasks, pushing the GLUE benchmark to 82.7%. The model demonstrates superior domain adaptation capabilities with fewer in-domain labels compared to pre-trained BERT representations. The code and pre-trained models are publicly available.