What does BERT learn about the structure of language?

What does BERT learn about the structure of language?

July 28 - August 2, 2019 | Ganesh Jawahar, Benoît Sagot, Djamé Seddah
BERT is a bidirectional language model that has shown strong performance in various language understanding tasks, suggesting it captures structural information about language. This study investigates the types of linguistic structures BERT learns by analyzing representations from different layers. The results show that lower layers of BERT capture phrase-level information, while intermediate layers encode a hierarchy of linguistic features, including surface, syntactic, and semantic information. BERT requires deeper layers to handle long-distance dependencies, such as subject-verb agreement. The study also demonstrates that BERT's representations are compositional, mimicking classical tree-like structures. Using Tensor Product Decomposition Networks (TPDN), the research shows that BERT implicitly captures tree-based structures. The findings support the idea that BERT can learn and represent complex linguistic information, which has implications for understanding and improving neural network architectures in NLP. The study also highlights the importance of interpretability in neural networks and provides evidence that BERT can capture syntactic structures, as shown through various probing tasks and analysis of its representations.BERT is a bidirectional language model that has shown strong performance in various language understanding tasks, suggesting it captures structural information about language. This study investigates the types of linguistic structures BERT learns by analyzing representations from different layers. The results show that lower layers of BERT capture phrase-level information, while intermediate layers encode a hierarchy of linguistic features, including surface, syntactic, and semantic information. BERT requires deeper layers to handle long-distance dependencies, such as subject-verb agreement. The study also demonstrates that BERT's representations are compositional, mimicking classical tree-like structures. Using Tensor Product Decomposition Networks (TPDN), the research shows that BERT implicitly captures tree-based structures. The findings support the idea that BERT can learn and represent complex linguistic information, which has implications for understanding and improving neural network architectures in NLP. The study also highlights the importance of interpretability in neural networks and provides evidence that BERT can capture syntactic structures, as shown through various probing tasks and analysis of its representations.
Reach us at info@study.space
Understanding What Does BERT Learn about the Structure of Language%3F