VL-BERT: PRE-TRAINING OF GENERIC VISUAL-LINGUISTIC REPRESENTATIONS

VL-BERT: PRE-TRAINING OF GENERIC VISUAL-LINGUISTIC REPRESENTATIONS

18 Feb 2020 | Weijie Su1,2*, Xizhou Zhu1,2*, Yue Cao2, Bin Li1, Lewei Lu2, Furu Wei2, Jifeng Dai2
VL-BERT: Pre-Training of Generic Visual-Linguistic Representations This paper introduces VL-BERT, a new pre-trainable generic representation for visual-linguistic tasks. VL-BERT is designed to handle a wide range of tasks by combining the Transformer model with both visual and linguistic inputs. Each input element can be a word from the input sentence or a region-of-interest (RoI) from the input image. The model is pre-trained on the Conceptual Captions dataset and text-only corpora, such as BooksCorpus and English Wikipedia, to better align visual and linguistic clues. Extensive empirical analysis demonstrates that VL-BERT outperforms state-of-the-art methods on various downstream tasks, including visual commonsense reasoning, visual question answering, and referring expression comprehension. Notably, VL-BERT achieved the first place on the VCR benchmark leaderboard. The code for VL-BERT is available at https://github.com/jackroos/VL-BERT.VL-BERT: Pre-Training of Generic Visual-Linguistic Representations This paper introduces VL-BERT, a new pre-trainable generic representation for visual-linguistic tasks. VL-BERT is designed to handle a wide range of tasks by combining the Transformer model with both visual and linguistic inputs. Each input element can be a word from the input sentence or a region-of-interest (RoI) from the input image. The model is pre-trained on the Conceptual Captions dataset and text-only corpora, such as BooksCorpus and English Wikipedia, to better align visual and linguistic clues. Extensive empirical analysis demonstrates that VL-BERT outperforms state-of-the-art methods on various downstream tasks, including visual commonsense reasoning, visual question answering, and referring expression comprehension. Notably, VL-BERT achieved the first place on the VCR benchmark leaderboard. The code for VL-BERT is available at https://github.com/jackroos/VL-BERT.
Reach us at info@study.space
Understanding VL-BERT%3A Pre-training of Generic Visual-Linguistic Representations