Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient

2008 | Tijmen Tieleman
This paper introduces a new algorithm for training Restricted Boltzmann Machines (RBMs), called Persistent Contrastive Divergence (PCD), which aims to draw samples from the model distribution more accurately than standard Contrastive Divergence (CD) algorithms. PCD outperforms other algorithms in modeling and classifying data, and is equally fast and simple. RBMs are neural network models for unsupervised learning, but have been used as feature extraction methods for supervised learning. Training RBMs is challenging due to the intractability of the objective function, so gradient approximations are used. The most popular approximation is CD-1, but it may not be the best. PCD improves upon CD-1 by using a Markov Chain that is not reset between parameter updates, allowing for more accurate gradient estimates. Experiments show that PCD performs better than CD-1 and other algorithms on various tasks, including MNIST digit classification, email classification, and modeling artificial data. PCD also works well with fully visible Markov Random Fields (MRFs). The algorithm is efficient and can be used with mini-batch learning. The results show that PCD provides better feature detectors and classification performance. The paper also discusses the use of weight decay and the importance of learning rate in training RBMs. The experiments demonstrate that PCD is a promising algorithm for training RBMs and other models.This paper introduces a new algorithm for training Restricted Boltzmann Machines (RBMs), called Persistent Contrastive Divergence (PCD), which aims to draw samples from the model distribution more accurately than standard Contrastive Divergence (CD) algorithms. PCD outperforms other algorithms in modeling and classifying data, and is equally fast and simple. RBMs are neural network models for unsupervised learning, but have been used as feature extraction methods for supervised learning. Training RBMs is challenging due to the intractability of the objective function, so gradient approximations are used. The most popular approximation is CD-1, but it may not be the best. PCD improves upon CD-1 by using a Markov Chain that is not reset between parameter updates, allowing for more accurate gradient estimates. Experiments show that PCD performs better than CD-1 and other algorithms on various tasks, including MNIST digit classification, email classification, and modeling artificial data. PCD also works well with fully visible Markov Random Fields (MRFs). The algorithm is efficient and can be used with mini-batch learning. The results show that PCD provides better feature detectors and classification performance. The paper also discusses the use of weight decay and the importance of learning rate in training RBMs. The experiments demonstrate that PCD is a promising algorithm for training RBMs and other models.
Reach us at info@study.space
[slides and audio] Training restricted Boltzmann machines using approximations to the likelihood gradient