BERT RedisCOVERS THE CLASSICAL NLP PIPELINE

BERT RedisCOVERS THE CLASSICAL NLP PIPELINE

9 Aug 2019 | Ian Tenney1 Dipanjan Das1 Ellie Pavlick1,2
BERT Rediscovers the Classical NLP Pipeline This paper explores how the BERT model processes linguistic information, revealing that it follows a traditional NLP pipeline in an interpretable and localizable way. The model represents tasks in the order: POS tagging, parsing, NER, semantic roles, and coreference. Qualitative analysis shows that BERT can dynamically adjust this pipeline, revising lower-level decisions based on higher-level information. The study uses edge probing to analyze how BERT encodes linguistic information. It finds that lower layers encode more local syntax while higher layers capture more complex semantics. The model's performance on various tasks is evaluated using two metrics: scalar mixing weights and cumulative scoring. These metrics show that the model's performance generally improves with more layers, and that the expected layer for correct predictions varies across tasks. The results show that BERT processes linguistic information in a natural progression, with basic syntactic information appearing earlier in the network and high-level semantic information appearing at higher layers. Syntactic information is more localizable, while semantic information is spread across the network. For semantic tasks, the model shows gradual improvements across layers, suggesting that semantic information is harder to localize. The study also compares BERT with other models, finding similar patterns in both BERT-base and BERT-large models. It shows that the representations for a given task tend to concentrate at the same layers relative to the top of the model, indicating a consistent pattern across different models. The paper concludes that BERT can represent the types of syntactic and semantic abstractions traditionally believed necessary for language processing, and that it can model complex interactions between different levels of hierarchical information. The findings provide new evidence that deep language models can capture the structure of natural language in a way that aligns with traditional NLP pipelines.BERT Rediscovers the Classical NLP Pipeline This paper explores how the BERT model processes linguistic information, revealing that it follows a traditional NLP pipeline in an interpretable and localizable way. The model represents tasks in the order: POS tagging, parsing, NER, semantic roles, and coreference. Qualitative analysis shows that BERT can dynamically adjust this pipeline, revising lower-level decisions based on higher-level information. The study uses edge probing to analyze how BERT encodes linguistic information. It finds that lower layers encode more local syntax while higher layers capture more complex semantics. The model's performance on various tasks is evaluated using two metrics: scalar mixing weights and cumulative scoring. These metrics show that the model's performance generally improves with more layers, and that the expected layer for correct predictions varies across tasks. The results show that BERT processes linguistic information in a natural progression, with basic syntactic information appearing earlier in the network and high-level semantic information appearing at higher layers. Syntactic information is more localizable, while semantic information is spread across the network. For semantic tasks, the model shows gradual improvements across layers, suggesting that semantic information is harder to localize. The study also compares BERT with other models, finding similar patterns in both BERT-base and BERT-large models. It shows that the representations for a given task tend to concentrate at the same layers relative to the top of the model, indicating a consistent pattern across different models. The paper concludes that BERT can represent the types of syntactic and semantic abstractions traditionally believed necessary for language processing, and that it can model complex interactions between different levels of hierarchical information. The findings provide new evidence that deep language models can capture the structure of natural language in a way that aligns with traditional NLP pipelines.
Reach us at info@study.space