The paper "Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks" by Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer from Google Research addresses the issue of discrepancies between training and inference in sequence prediction tasks using recurrent neural networks (RNNs). The authors propose a curriculum learning strategy to gradually transition from using the true previous token during training to primarily using the model's own generated token, thereby reducing the accumulation of errors that can occur during sequence generation.
In sequence prediction tasks, RNNs are trained to maximize the likelihood of each token given the current state and the previous token. However, during inference, the previous token is replaced by a token generated by the model, leading to a mismatch that can result in cumulative errors. The proposed approach, called Scheduled Sampling, involves randomly deciding whether to use the true previous token or a token sampled from the model's distribution during training. This decision is made using a probability parameter \(\epsilon_i\) that decreases over time, gradually shifting the training process towards more realistic conditions.
Experiments on various sequence prediction tasks, including image captioning, constituency parsing, and speech recognition, demonstrate significant improvements in performance compared to baseline methods. The authors also report that their approach was successful in winning the 2015 MSCOCO image captioning challenge. The paper concludes by highlighting the benefits of Scheduled Sampling and suggesting future directions, such as backpropagating errors through sampling decisions and exploring better sampling strategies.The paper "Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks" by Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer from Google Research addresses the issue of discrepancies between training and inference in sequence prediction tasks using recurrent neural networks (RNNs). The authors propose a curriculum learning strategy to gradually transition from using the true previous token during training to primarily using the model's own generated token, thereby reducing the accumulation of errors that can occur during sequence generation.
In sequence prediction tasks, RNNs are trained to maximize the likelihood of each token given the current state and the previous token. However, during inference, the previous token is replaced by a token generated by the model, leading to a mismatch that can result in cumulative errors. The proposed approach, called Scheduled Sampling, involves randomly deciding whether to use the true previous token or a token sampled from the model's distribution during training. This decision is made using a probability parameter \(\epsilon_i\) that decreases over time, gradually shifting the training process towards more realistic conditions.
Experiments on various sequence prediction tasks, including image captioning, constituency parsing, and speech recognition, demonstrate significant improvements in performance compared to baseline methods. The authors also report that their approach was successful in winning the 2015 MSCOCO image captioning challenge. The paper concludes by highlighting the benefits of Scheduled Sampling and suggesting future directions, such as backpropagating errors through sampling decisions and exploring better sampling strategies.