This paper explores the integration of latent random variables into the hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder (VAE). The proposed model, called the *Variational RNN* (VRNN), aims to better model the variability observed in highly structured sequential data, such as natural speech. The VRNN extends the VAE to a recurrent framework, allowing it to model complex multimodal distributions and capture temporal dependencies between latent random variables across timesteps. Empirical evaluations on four speech datasets and one handwriting dataset show that the VRNN outperforms standard RNNs and other related models, particularly in generating natural speech with less noise and more diverse samples. The results highlight the importance of incorporating latent random variables and temporal dependencies in the RNN dynamics for effective sequence modeling.This paper explores the integration of latent random variables into the hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder (VAE). The proposed model, called the *Variational RNN* (VRNN), aims to better model the variability observed in highly structured sequential data, such as natural speech. The VRNN extends the VAE to a recurrent framework, allowing it to model complex multimodal distributions and capture temporal dependencies between latent random variables across timesteps. Empirical evaluations on four speech datasets and one handwriting dataset show that the VRNN outperforms standard RNNs and other related models, particularly in generating natural speech with less noise and more diverse samples. The results highlight the importance of incorporating latent random variables and temporal dependencies in the RNN dynamics for effective sequence modeling.