20 Dec 2015 | Diederik P. Kingma, Tim Salimans, and Max Welling
This paper introduces a local reparameterization technique to reduce the variance of stochastic gradients in variational Bayesian inference, enabling faster convergence while maintaining parallelizability. The method translates global parameter uncertainty into local noise that is independent across data points in a minibatch, leading to a variance inversely proportional to the minibatch size. This approach is shown to be equivalent to Gaussian dropout, a scale-invariant prior, and proportionally fixed posterior variance. The method allows for more flexible posterior distributions, leading to improved models through variational dropout, where dropout rates are learned rather than fixed. The paper demonstrates that this technique significantly reduces the variance of the stochastic gradient estimator, leading to faster and more efficient inference. Experiments show that variational dropout performs as well as or better than standard dropout and Gaussian dropout, with adaptive dropout rates leading to improved performance. The method is shown to be computationally efficient and has a low variance, making it suitable for large-scale neural network training. The paper also discusses the connection between dropout and Bayesian inference, showing that dropout can be interpreted as a variational method with a specific prior and posterior distribution. The results demonstrate that the proposed method improves the efficiency and effectiveness of variational inference in neural networks.This paper introduces a local reparameterization technique to reduce the variance of stochastic gradients in variational Bayesian inference, enabling faster convergence while maintaining parallelizability. The method translates global parameter uncertainty into local noise that is independent across data points in a minibatch, leading to a variance inversely proportional to the minibatch size. This approach is shown to be equivalent to Gaussian dropout, a scale-invariant prior, and proportionally fixed posterior variance. The method allows for more flexible posterior distributions, leading to improved models through variational dropout, where dropout rates are learned rather than fixed. The paper demonstrates that this technique significantly reduces the variance of the stochastic gradient estimator, leading to faster and more efficient inference. Experiments show that variational dropout performs as well as or better than standard dropout and Gaussian dropout, with adaptive dropout rates leading to improved performance. The method is shown to be computationally efficient and has a low variance, making it suitable for large-scale neural network training. The paper also discusses the connection between dropout and Bayesian inference, showing that dropout can be interpreted as a variational method with a specific prior and posterior distribution. The results demonstrate that the proposed method improves the efficiency and effectiveness of variational inference in neural networks.