Efficient Neural Architecture Search via Parameter Sharing

Efficient Neural Architecture Search via Parameter Sharing

2018 | Hieu Pham * 1 2 Melody Y. Guan * 3 Barret Zoph 1 Quoc V. Le 1 Jeff Dean 1
**Efficient Neural Architecture Search (ENAS)** is a novel approach to automatic model design that significantly reduces computational costs compared to traditional Neural Architecture Search (NAS). ENAS achieves this by allowing child models to share parameters, which is inspired by concepts from transfer learning and multitask learning. The controller, trained with policy gradient, selects an optimal subgraph within a large computational graph to maximize the expected reward on a validation set. The selected subgraph is then trained to minimize a cross-entropy loss. This method delivers strong empirical performance while using much fewer GPU-hours and being significantly more expensive than standard NAS. **Key Contributions:** 1. **Efficiency:** ENAS reduces the computational cost of NAS by over 1000x in terms of GPU hours. 2. **Parameter Sharing:** Child models share parameters, allowing for efficient training and improved performance. 3. **Empirical Performance:** ENAS achieves state-of-the-art results on the Penn Treebank dataset with a test perplexity of 55.8 and on the CIFAR-10 dataset with a test error of 2.89%, comparable to NAS-Net. **Methods:** - **Search Space Representation:** ENAS represents the search space as a Directed Acyclic Graph (DAG), where each node represents a local computation and edges represent information flow. - **Training Procedure:** ENAS alternates between training the shared parameters of child models and the controller's parameters. The controller samples decisions for the DAG, and the shared parameters are trained to minimize the cross-entropy loss. - **Deriving Architectures:** After training, the best-performing model from the sampled architectures is re-trained from scratch to derive the final architecture. **Experiments:** - **Penn Treebank:** ENAS discovers a recurrent cell with a test perplexity of 55.8, outperforming existing methods. - **CIFAR-10:** ENAS finds a convolutional architecture with a test error of 2.89%, comparable to NAS-Net. **Conclusion:** ENAS is a fast and efficient method for automatic model design, achieving strong performance while significantly reducing computational costs.**Efficient Neural Architecture Search (ENAS)** is a novel approach to automatic model design that significantly reduces computational costs compared to traditional Neural Architecture Search (NAS). ENAS achieves this by allowing child models to share parameters, which is inspired by concepts from transfer learning and multitask learning. The controller, trained with policy gradient, selects an optimal subgraph within a large computational graph to maximize the expected reward on a validation set. The selected subgraph is then trained to minimize a cross-entropy loss. This method delivers strong empirical performance while using much fewer GPU-hours and being significantly more expensive than standard NAS. **Key Contributions:** 1. **Efficiency:** ENAS reduces the computational cost of NAS by over 1000x in terms of GPU hours. 2. **Parameter Sharing:** Child models share parameters, allowing for efficient training and improved performance. 3. **Empirical Performance:** ENAS achieves state-of-the-art results on the Penn Treebank dataset with a test perplexity of 55.8 and on the CIFAR-10 dataset with a test error of 2.89%, comparable to NAS-Net. **Methods:** - **Search Space Representation:** ENAS represents the search space as a Directed Acyclic Graph (DAG), where each node represents a local computation and edges represent information flow. - **Training Procedure:** ENAS alternates between training the shared parameters of child models and the controller's parameters. The controller samples decisions for the DAG, and the shared parameters are trained to minimize the cross-entropy loss. - **Deriving Architectures:** After training, the best-performing model from the sampled architectures is re-trained from scratch to derive the final architecture. **Experiments:** - **Penn Treebank:** ENAS discovers a recurrent cell with a test perplexity of 55.8, outperforming existing methods. - **CIFAR-10:** ENAS finds a convolutional architecture with a test error of 2.89%, comparable to NAS-Net. **Conclusion:** ENAS is a fast and efficient method for automatic model design, achieving strong performance while significantly reducing computational costs.
Reach us at info@study.space