Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

15 Aug 2013 | Yoshua Bengio, Nicholas Léonard and Aaron Courville
The paper "Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation" by Yoshua Bengio, Nicholas Leonard, and Aaron Courville explores the challenges and solutions for estimating gradients through stochastic neurons in deep learning models. Stochastic neurons and hard non-linearities can pose significant challenges for gradient-based learning, but they are useful for various reasons, such as modeling biological neurons and achieving sparse representations. The authors examine four families of solutions: the minimum variance unbiased gradient estimator for stochastic binary neurons, a decomposition approach for binary stochastic neurons, the injection of additive or multiplicative noise, and the straight-through estimator. They also discuss the application of these estimators in conditional computation, where sparse stochastic units gate computation, potentially reducing computational costs. The paper includes theoretical results and experimental validation, demonstrating the effectiveness of the proposed methods in training large deep networks.The paper "Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation" by Yoshua Bengio, Nicholas Leonard, and Aaron Courville explores the challenges and solutions for estimating gradients through stochastic neurons in deep learning models. Stochastic neurons and hard non-linearities can pose significant challenges for gradient-based learning, but they are useful for various reasons, such as modeling biological neurons and achieving sparse representations. The authors examine four families of solutions: the minimum variance unbiased gradient estimator for stochastic binary neurons, a decomposition approach for binary stochastic neurons, the injection of additive or multiplicative noise, and the straight-through estimator. They also discuss the application of these estimators in conditional computation, where sparse stochastic units gate computation, potentially reducing computational costs. The paper includes theoretical results and experimental validation, demonstrating the effectiveness of the proposed methods in training large deep networks.
Reach us at info@study.space