This paper presents a theoretically grounded application of dropout in recurrent neural networks (RNNs). The authors propose a new variational inference-based dropout technique for LSTM and GRU models, which outperforms existing techniques. The new approach is shown to improve the state-of-the-art in language modelling on the Penn Treebank, achieving a test perplexity of 73.4. The technique is based on a Bayesian interpretation of dropout as a variational approximation to the posterior of a Bayesian neural network. This allows for a more effective regularization of RNNs, particularly in handling overfitting. The new dropout variant applies the same dropout mask at each time step, including recurrent layers, which differs from existing ad hoc techniques that use different masks at different time steps. The method is evaluated on language modelling and sentiment analysis tasks, showing improved performance compared to early stopping and under-specified models. The results demonstrate that the proposed approach is effective in regularizing RNNs and improving model performance.This paper presents a theoretically grounded application of dropout in recurrent neural networks (RNNs). The authors propose a new variational inference-based dropout technique for LSTM and GRU models, which outperforms existing techniques. The new approach is shown to improve the state-of-the-art in language modelling on the Penn Treebank, achieving a test perplexity of 73.4. The technique is based on a Bayesian interpretation of dropout as a variational approximation to the posterior of a Bayesian neural network. This allows for a more effective regularization of RNNs, particularly in handling overfitting. The new dropout variant applies the same dropout mask at each time step, including recurrent layers, which differs from existing ad hoc techniques that use different masks at different time steps. The method is evaluated on language modelling and sentiment analysis tasks, showing improved performance compared to early stopping and under-specified models. The results demonstrate that the proposed approach is effective in regularizing RNNs and improving model performance.