Understanding Self-Critical Sequence Training for Image Captioning

This paper introduces a novel approach called Self-Critical Sequence Training (SCST) for optimizing image captioning systems using reinforcement learning. SCST is an extension of the REINFORCE algorithm, which normalizes rewards using the output of its own test-time inference algorithm rather than a baseline. This approach avoids the need to estimate both the reward signal and normalization, reducing variance and improving training efficiency. The authors demonstrate that SCST, when used with greedy decoding, significantly improves performance on the MSCOCO dataset, achieving a new state-of-the-art CIDEr score of 114.7. The paper also discusses the advantages of SCST over traditional actor-critic methods and provides experimental results showing its effectiveness in various scenarios, including different training metrics and model types.This paper introduces a novel approach called Self-Critical Sequence Training (SCST) for optimizing image captioning systems using reinforcement learning. SCST is an extension of the REINFORCE algorithm, which normalizes rewards using the output of its own test-time inference algorithm rather than a baseline. This approach avoids the need to estimate both the reward signal and normalization, reducing variance and improving training efficiency. The authors demonstrate that SCST, when used with greedy decoding, significantly improves performance on the MSCOCO dataset, achieving a new state-of-the-art CIDEr score of 114.7. The paper also discusses the advantages of SCST over traditional actor-critic methods and provides experimental results showing its effectiveness in various scenarios, including different training metrics and model types.

Self-critical Sequence Training for Image Captioning

16 Nov 2017 | Steven J. Rennie1, Etienne Marcheret1, Youssef Mroueh, Jerret Ross and Vaibhava Goel1