[slides and audio] Conceptual Captions%3A A Cleaned%2C Hypernymed%2C Image Alt-text Dataset For Automatic Image Captioning

The paper introduces a new dataset called Conceptual Captions, which contains significantly more images and a wider variety of image and caption styles compared to the MS-COCO dataset. The dataset is created by extracting and filtering image caption annotations from billions of webpages, ensuring a balance between cleanliness, informativeness, fluency, and learnability. The authors evaluate several image captioning models, including those based on Inception-ResNet-v2 for image feature extraction and Transformer for sequence modeling, and find that the Transformer-based models achieve the best performance when trained on the Conceptual Captions dataset. The paper also discusses the challenges and improvements in automatic image captioning, highlighting the importance of large annotated datasets and powerful modeling mechanisms.The paper introduces a new dataset called Conceptual Captions, which contains significantly more images and a wider variety of image and caption styles compared to the MS-COCO dataset. The dataset is created by extracting and filtering image caption annotations from billions of webpages, ensuring a balance between cleanliness, informativeness, fluency, and learnability. The authors evaluate several image captioning models, including those based on Inception-ResNet-v2 for image feature extraction and Transformer for sequence modeling, and find that the Transformer-based models achieve the best performance when trained on the Conceptual Captions dataset. The paper also discusses the challenges and improvements in automatic image captioning, highlighting the importance of large annotated datasets and powerful modeling mechanisms.

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

July 15 - 20, 2018 | Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut