Microsoft COCO Captions: Data Collection and Evaluation Server

Microsoft COCO Captions: Data Collection and Evaluation Server

3 Apr 2015 | Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, C. Lawrence Zitnick
The paper introduces the Microsoft COCO Caption dataset and the evaluation server for automatic caption generation. The dataset will contain over 1.5 million captions for 330,000 images, with five human-generated captions provided for each image. The evaluation server uses metrics such as BLEU, METEOR, ROUGE, and CIDEr to score candidate captions. The paper details the data collection process, which involves using Amazon's Mechanical Turk (AMT) to gather captions from human subjects. It also describes the evaluation metrics, including their tokenization and preprocessing steps, and provides instructions for using the evaluation server. The paper concludes with a discussion on the challenges of creating image caption datasets and the importance of aligning automatic evaluation metrics with human judgment.The paper introduces the Microsoft COCO Caption dataset and the evaluation server for automatic caption generation. The dataset will contain over 1.5 million captions for 330,000 images, with five human-generated captions provided for each image. The evaluation server uses metrics such as BLEU, METEOR, ROUGE, and CIDEr to score candidate captions. The paper details the data collection process, which involves using Amazon's Mechanical Turk (AMT) to gather captions from human subjects. It also describes the evaluation metrics, including their tokenization and preprocessing steps, and provides instructions for using the evaluation server. The paper concludes with a discussion on the challenges of creating image caption datasets and the importance of aligning automatic evaluation metrics with human judgment.
Reach us at info@study.space