CIDEr: Consensus-based Image Description Evaluation

CIDEr: Consensus-based Image Description Evaluation

3 Jun 2015 | Ramakrishna Vedantam, C. Lawrence Zitnick, Devi Parikh
The paper introduces a novel paradigm for evaluating image descriptions using human consensus, consisting of a new triplet-based method for collecting human annotations, an automated metric called CIDEr, and two new datasets (PASCAL-50S and ABSTRACT-50S) with 50 sentences per image. CIDEr measures the similarity of a generated sentence to a set of human-written reference sentences, capturing properties like grammaticality, saliency, and accuracy. The authors evaluate five state-of-the-art image description approaches using this protocol and provide a benchmark for future comparisons. They also introduce CIDEr-D, a modified version of CIDEr that is more robust to gaming, and make it available on the MS COCO evaluation server. The results show that CIDEr outperforms existing metrics in capturing human judgment of consensus and that the proposed datasets enable more accurate automated evaluation.The paper introduces a novel paradigm for evaluating image descriptions using human consensus, consisting of a new triplet-based method for collecting human annotations, an automated metric called CIDEr, and two new datasets (PASCAL-50S and ABSTRACT-50S) with 50 sentences per image. CIDEr measures the similarity of a generated sentence to a set of human-written reference sentences, capturing properties like grammaticality, saliency, and accuracy. The authors evaluate five state-of-the-art image description approaches using this protocol and provide a benchmark for future comparisons. They also introduce CIDEr-D, a modified version of CIDEr that is more robust to gaming, and make it available on the MS COCO evaluation server. The results show that CIDEr outperforms existing metrics in capturing human judgment of consensus and that the proposed datasets enable more accurate automated evaluation.
Reach us at info@study.space
Understanding CIDEr%3A Consensus-based image description evaluation