This paper introduces Neural Architecture Search (NAS), a method that uses a recurrent neural network (RNN) to automatically design neural network architectures. The RNN, called the controller, generates architectural descriptions and is trained using reinforcement learning to maximize the accuracy of the generated architectures on a validation set. The controller learns to improve its search over time by using the accuracy of the generated architectures as a reward signal.
On the CIFAR-10 dataset, the method designs a novel convolutional architecture that rivals the best human-invented architecture in terms of test set accuracy, achieving a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model. On the Penn Treebank dataset, the method designs a novel recurrent cell that outperforms the widely-used LSTM cell and other state-of-the-art baselines, achieving a test set perplexity of 62.4, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.
The method uses a policy gradient approach to train the controller RNN, which is optimized to maximize the expected validation accuracy of the proposed architectures. The controller RNN is trained in a distributed setting with asynchronous parameter updates to speed up the learning process. The method also introduces skip connections and other layer types to increase the complexity of the architectures. Additionally, the method is extended to generate recurrent cell architectures, which are trained using a tree-based approach.
The experiments show that NAS can design good models from scratch, an achievement considered not possible with other methods. On image recognition with CIFAR-10, NAS can find a novel ConvNet model that is better than most human-invented architectures. On language modeling with Penn Treebank, NAS can design a novel recurrent cell that is also better than previous RNN and LSTM architectures. The cell that our model found achieves a test set perplexity of 62.4 on the Penn Treebank dataset, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.This paper introduces Neural Architecture Search (NAS), a method that uses a recurrent neural network (RNN) to automatically design neural network architectures. The RNN, called the controller, generates architectural descriptions and is trained using reinforcement learning to maximize the accuracy of the generated architectures on a validation set. The controller learns to improve its search over time by using the accuracy of the generated architectures as a reward signal.
On the CIFAR-10 dataset, the method designs a novel convolutional architecture that rivals the best human-invented architecture in terms of test set accuracy, achieving a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model. On the Penn Treebank dataset, the method designs a novel recurrent cell that outperforms the widely-used LSTM cell and other state-of-the-art baselines, achieving a test set perplexity of 62.4, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.
The method uses a policy gradient approach to train the controller RNN, which is optimized to maximize the expected validation accuracy of the proposed architectures. The controller RNN is trained in a distributed setting with asynchronous parameter updates to speed up the learning process. The method also introduces skip connections and other layer types to increase the complexity of the architectures. Additionally, the method is extended to generate recurrent cell architectures, which are trained using a tree-based approach.
The experiments show that NAS can design good models from scratch, an achievement considered not possible with other methods. On image recognition with CIFAR-10, NAS can find a novel ConvNet model that is better than most human-invented architectures. On language modeling with Penn Treebank, NAS can design a novel recurrent cell that is also better than previous RNN and LSTM architectures. The cell that our model found achieves a test set perplexity of 62.4 on the Penn Treebank dataset, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.