A Primer in BERTology: What We Know About How BERT Works

A Primer in BERTology: What We Know About How BERT Works

9 Nov 2020 | Anna Rogers, Olga Kovaleva, Anna Rumshisky
This paper provides a comprehensive survey of over 150 studies on the BERT model, aiming to understand how it works, what information it learns, and how it is represented. It reviews the current state of knowledge, including linguistic aspects, technical aspects, and common modifications to its training objectives and architecture. The paper also discusses overparameterization, compression techniques, and pruning as a model analysis technique. Finally, it outlines future research directions, such as benchmarks requiring verbal reasoning, developing methods to teach reasoning, and learning what happens at inference time. The paper highlights the need for more comprehensive stress tests for different aspects of linguistic knowledge and the importance of focusing on what knowledge is actually used by the model.This paper provides a comprehensive survey of over 150 studies on the BERT model, aiming to understand how it works, what information it learns, and how it is represented. It reviews the current state of knowledge, including linguistic aspects, technical aspects, and common modifications to its training objectives and architecture. The paper also discusses overparameterization, compression techniques, and pruning as a model analysis technique. Finally, it outlines future research directions, such as benchmarks requiring verbal reasoning, developing methods to teach reasoning, and learning what happens at inference time. The paper highlights the need for more comprehensive stress tests for different aspects of linguistic knowledge and the importance of focusing on what knowledge is actually used by the model.
Reach us at info@study.space
[slides] A Primer in BERTology%3A What We Know About How BERT Works | StudySpace