Efficient Neural Architecture Search via Parameter Sharing

Efficient Neural Architecture Search via Parameter Sharing

2018 | Hieu Pham * 1 2 Melody Y. Guan * 3 Barret Zoph 1 Quoc V. Le 1 Jeff Dean 1
Efficient Neural Architecture Search (ENAS) is a fast and cost-effective method for automatically designing neural network architectures. ENAS uses a controller to search for an optimal subgraph within a large computational graph, which is trained using policy gradients to maximize expected rewards on a validation set. The selected subgraph is then trained to minimize cross-entropy loss. By sharing parameters among child models, ENAS achieves strong performance with significantly fewer GPU-hours compared to existing methods, using 1000x less computational resources. On the Penn Treebank dataset, ENAS achieves a test perplexity of 55.8, outperforming NAS. On CIFAR-10, it achieves 2.89% test error, comparable to NASNet. ENAS reduces GPU-hours by over 1000x compared to NAS, making it highly efficient. The method is based on a shared parameter approach, allowing all child models to share weights, which is inspired by transfer learning and multitask learning. ENAS is applied to both recurrent and convolutional architectures, with the latter involving a DAG-based search space. The controller is trained using policy gradients, and the shared parameters are optimized using stochastic gradient descent. ENAS outperforms other methods in both language and image tasks, demonstrating its effectiveness and efficiency in neural architecture search.Efficient Neural Architecture Search (ENAS) is a fast and cost-effective method for automatically designing neural network architectures. ENAS uses a controller to search for an optimal subgraph within a large computational graph, which is trained using policy gradients to maximize expected rewards on a validation set. The selected subgraph is then trained to minimize cross-entropy loss. By sharing parameters among child models, ENAS achieves strong performance with significantly fewer GPU-hours compared to existing methods, using 1000x less computational resources. On the Penn Treebank dataset, ENAS achieves a test perplexity of 55.8, outperforming NAS. On CIFAR-10, it achieves 2.89% test error, comparable to NASNet. ENAS reduces GPU-hours by over 1000x compared to NAS, making it highly efficient. The method is based on a shared parameter approach, allowing all child models to share weights, which is inspired by transfer learning and multitask learning. ENAS is applied to both recurrent and convolutional architectures, with the latter involving a DAG-based search space. The controller is trained using policy gradients, and the shared parameters are optimized using stochastic gradient descent. ENAS outperforms other methods in both language and image tasks, demonstrating its effectiveness and efficiency in neural architecture search.
Reach us at info@study.space