ERNIE (Enhanced Representation through kNowledge IntEgration) is a novel language representation model that integrates prior knowledge into the pre-training process. Inspired by BERT's masking strategy, ERNIE employs entity-level and phrase-level masking to enhance language representation. Entity-level masking focuses on entities composed of multiple words, while phrase-level masking treats phrases as conceptual units. These strategies implicitly learn prior knowledge during training, improving the model's ability to capture semantic dependencies and generalization.
The model is pre-trained on heterogeneous Chinese data, including Wikipedia, Baidu Baike, Baidu news, and Baidu Tieba, and then applied to five Chinese NLP tasks: natural language inference, semantic similarity, named entity recognition, sentiment analysis, and question answering. Experimental results show that ERNIE outperforms other baseline methods, achieving state-of-the-art results on all tasks. Additionally, ERNIE demonstrates superior knowledge inference capacity in a cloze test, outperforming BERT in predicting missing entities and their relationships.
The paper also includes ablation studies to validate the effectiveness of the knowledge masking strategies and the Dialogue Language Model (DLM) task. The results confirm that ERNIE's knowledge integration and pre-training on heterogeneous data significantly enhance its performance. Future work will explore integrating other types of knowledge and validating the approach in other languages.ERNIE (Enhanced Representation through kNowledge IntEgration) is a novel language representation model that integrates prior knowledge into the pre-training process. Inspired by BERT's masking strategy, ERNIE employs entity-level and phrase-level masking to enhance language representation. Entity-level masking focuses on entities composed of multiple words, while phrase-level masking treats phrases as conceptual units. These strategies implicitly learn prior knowledge during training, improving the model's ability to capture semantic dependencies and generalization.
The model is pre-trained on heterogeneous Chinese data, including Wikipedia, Baidu Baike, Baidu news, and Baidu Tieba, and then applied to five Chinese NLP tasks: natural language inference, semantic similarity, named entity recognition, sentiment analysis, and question answering. Experimental results show that ERNIE outperforms other baseline methods, achieving state-of-the-art results on all tasks. Additionally, ERNIE demonstrates superior knowledge inference capacity in a cloze test, outperforming BERT in predicting missing entities and their relationships.
The paper also includes ablation studies to validate the effectiveness of the knowledge masking strategies and the Dialogue Language Model (DLM) task. The results confirm that ERNIE's knowledge integration and pre-training on heterogeneous data significantly enhance its performance. Future work will explore integrating other types of knowledge and validating the approach in other languages.