Understanding A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

This paper explores the application of dropout in recurrent neural networks (RNNs) using a Bayesian interpretation, which suggests an extension of theoretical results. The authors propose a new variational inference-based dropout technique for LSTM and GRU models, which outperforms existing techniques in language modeling and sentiment analysis tasks. The method involves treating network weights as random variables and applying dropout masks at each time step, including recurrent layers. This approach improves model performance and reduces overfitting, achieving state-of-the-art results on the Penn Treebank dataset with a test perplexity of 73.4. The paper also discusses the implementation details, experimental results, and comparisons with other methods, highlighting the effectiveness of the proposed technique.This paper explores the application of dropout in recurrent neural networks (RNNs) using a Bayesian interpretation, which suggests an extension of theoretical results. The authors propose a new variational inference-based dropout technique for LSTM and GRU models, which outperforms existing techniques in language modeling and sentiment analysis tasks. The method involves treating network weights as random variables and applying dropout masks at each time step, including recurrent layers. This approach improves model performance and reduces overfitting, achieving state-of-the-art results on the Penn Treebank dataset with a test perplexity of 73.4. The paper also discusses the implementation details, experimental results, and comparisons with other methods, highlighting the effectiveness of the proposed technique.

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

5 Oct 2016 | Yarin Gal, Zoubin Ghahramani