Understanding Training restricted Boltzmann machines using approximations to the likelihood gradient

The paper introduces a new algorithm called Persistent Contrastive Divergence (PCD) for training Restricted Boltzmann Machines (RBMs). PCD aims to draw samples from the exact model distribution, unlike standard Contrastive Divergence (CD) algorithms which provide only approximate gradient estimates. The algorithm is compared to various CD-based and pseudo-likelihood algorithms on tasks such as modeling and classifying different types of data, including MNIST digit images, email spam classification, and horse image segmentations. PCD outperforms other algorithms in terms of test data likelihood and classification error rate, producing more meaningful feature detectors. The paper also discusses the implementation details of PCD, including the use of mini-batches and weight decay, and explores its performance on fully visible Markov Random Fields. Future work includes investigating the impact of weight decay and optimizing PCD for larger learning rates.The paper introduces a new algorithm called Persistent Contrastive Divergence (PCD) for training Restricted Boltzmann Machines (RBMs). PCD aims to draw samples from the exact model distribution, unlike standard Contrastive Divergence (CD) algorithms which provide only approximate gradient estimates. The algorithm is compared to various CD-based and pseudo-likelihood algorithms on tasks such as modeling and classifying different types of data, including MNIST digit images, email spam classification, and horse image segmentations. PCD outperforms other algorithms in terms of test data likelihood and classification error rate, producing more meaningful feature detectors. The paper also discusses the implementation details of PCD, including the use of mini-batches and weight decay, and explores its performance on fully visible Markov Random Fields. Future work includes investigating the impact of weight decay and optimizing PCD for larger learning rates.

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient

2008 | Tijmen Tieleman