SEQUENCE LEVEL TRAINING WITH RECURRENT NEURAL NETWORKS

SEQUENCE LEVEL TRAINING WITH RECURRENT NEURAL NETWORKS

6 May 2016 | Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba
This paper proposes a novel sequence-level training algorithm for recurrent neural networks (RNNs) that directly optimizes metrics used at test time, such as BLEU or ROUGE. The method addresses two key issues in text generation: exposure bias and the use of word-level loss functions. Exposure bias occurs when models are trained on ground truth data but must generate sequences based on their own predictions during testing, leading to error accumulation. The proposed algorithm, called MIXER, uses a hybrid loss function combining REINFORCE and cross-entropy, and incorporates incremental learning to improve performance. MIXER outperforms several strong baselines for greedy generation and is competitive with beam search, while being significantly faster. The method is also complementary to beam search and can be combined to further improve performance. The paper evaluates MIXER on three tasks: text summarization, machine translation, and image captioning, demonstrating its effectiveness in improving generation quality and efficiency.This paper proposes a novel sequence-level training algorithm for recurrent neural networks (RNNs) that directly optimizes metrics used at test time, such as BLEU or ROUGE. The method addresses two key issues in text generation: exposure bias and the use of word-level loss functions. Exposure bias occurs when models are trained on ground truth data but must generate sequences based on their own predictions during testing, leading to error accumulation. The proposed algorithm, called MIXER, uses a hybrid loss function combining REINFORCE and cross-entropy, and incorporates incremental learning to improve performance. MIXER outperforms several strong baselines for greedy generation and is competitive with beam search, while being significantly faster. The method is also complementary to beam search and can be combined to further improve performance. The paper evaluates MIXER on three tasks: text summarization, machine translation, and image captioning, demonstrating its effectiveness in improving generation quality and efficiency.
Reach us at info@study.space
[slides] Sequence Level Training with Recurrent Neural Networks | StudySpace