Improving neural networks by preventing co-adaptation of feature detectors

Improving neural networks by preventing co-adaptation of feature detectors

3 Jul 2012 | G. E. Hinton*, N. Srivastava, A. Krizhevsky, I. Sutskever and R. R. Salakhutdinov
This paper introduces a technique called "dropout" to improve the performance of neural networks by preventing co-adaptation of feature detectors. The method randomly omits half of the feature detectors during training, which prevents complex co-adaptations where a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. This approach significantly improves performance on many benchmark tasks, including speech and object recognition. The paper describes how dropout works in neural networks. During training, each hidden unit is randomly omitted with a probability of 0.5, which prevents the network from relying on other hidden units being present. This process is equivalent to performing model averaging with neural networks, where the predictions of many different networks are averaged. Dropout allows the training of a huge number of different networks in a reasonable time, with each network sharing the same weights for the hidden units that are present. The paper also describes how dropout can be used to improve the performance of neural networks on various tasks, including MNIST, TIMIT, CIFAR-10, and ImageNet. For example, on the MNIST dataset, dropout reduces the test error from 160 to 110 errors. On the TIMIT dataset, dropout improves the recognition rate from 22.7% to 19.7%. On the CIFAR-10 dataset, dropout reduces the test error from 18.5% to 15.6%. On the ImageNet dataset, dropout reduces the test error from 45.7% to 42.4%. The paper also discusses the benefits of dropout in comparison to other techniques, such as Bayesian model averaging and bagging. Dropout is simpler to implement and more efficient than Bayesian model averaging, which weights each model by its posterior probability given the training data. Dropout is also more efficient than bagging, which trains different models on different random selections of cases from the training set. The paper concludes that dropout is a significant improvement for neural networks, allowing them to perform better on a variety of tasks. It also demonstrates that dropout is effective even for complex neural networks that have been developed by the joint efforts of many groups over many years to be really good at object recognition.This paper introduces a technique called "dropout" to improve the performance of neural networks by preventing co-adaptation of feature detectors. The method randomly omits half of the feature detectors during training, which prevents complex co-adaptations where a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. This approach significantly improves performance on many benchmark tasks, including speech and object recognition. The paper describes how dropout works in neural networks. During training, each hidden unit is randomly omitted with a probability of 0.5, which prevents the network from relying on other hidden units being present. This process is equivalent to performing model averaging with neural networks, where the predictions of many different networks are averaged. Dropout allows the training of a huge number of different networks in a reasonable time, with each network sharing the same weights for the hidden units that are present. The paper also describes how dropout can be used to improve the performance of neural networks on various tasks, including MNIST, TIMIT, CIFAR-10, and ImageNet. For example, on the MNIST dataset, dropout reduces the test error from 160 to 110 errors. On the TIMIT dataset, dropout improves the recognition rate from 22.7% to 19.7%. On the CIFAR-10 dataset, dropout reduces the test error from 18.5% to 15.6%. On the ImageNet dataset, dropout reduces the test error from 45.7% to 42.4%. The paper also discusses the benefits of dropout in comparison to other techniques, such as Bayesian model averaging and bagging. Dropout is simpler to implement and more efficient than Bayesian model averaging, which weights each model by its posterior probability given the training data. Dropout is also more efficient than bagging, which trains different models on different random selections of cases from the training set. The paper concludes that dropout is a significant improvement for neural networks, allowing them to perform better on a variety of tasks. It also demonstrates that dropout is effective even for complex neural networks that have been developed by the joint efforts of many groups over many years to be really good at object recognition.
Reach us at info@study.space