AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

April 19, 2021 | Xin He, Kaiyong Zhao, Xiaowen Chu
AutoML: A Survey of the State-of-the-Art Xin He, Kaiyong Zhao, Xiaowen Chu Department of Computer Science, Hong Kong Baptist University Abstract: Deep learning (DL) has achieved remarkable results in various tasks, but building high-quality DL systems requires expert knowledge, limiting their application. Automated machine learning (AutoML) offers a solution by automating the DL pipeline. This paper provides a comprehensive review of AutoML, focusing on neural architecture search (NAS), which is a hot topic. It summarizes NAS algorithms' performance on CIFAR-10 and ImageNet, discusses NAS sub-topics like one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. It also identifies open problems in AutoML for future research. Keywords: deep learning, automated machine learning (AutoML), neural architecture search (NAS), hyperparameter optimization (HPO) Introduction: Deep learning has been applied in various fields, with increasingly complex neural networks proposed. However, these models are manually designed, requiring substantial resources. AutoML automates the entire ML pipeline, reducing development costs. It involves data preparation, feature engineering, model generation, and evaluation. NAS is a key component, aiming to find robust neural architectures by selecting and combining operations from a search space. This paper reviews NAS methods, classifying search spaces into entire-structured, cell-based, hierarchical, and morphism-based. It also discusses AO methods like reinforcement learning, evolution-based algorithms, and gradient descent. The paper compares NAS algorithms on CIFAR-10 and ImageNet, discusses open problems, and provides a comprehensive overview of AutoML. Data Preparation: The first step in the ML pipeline is data preparation, involving data collection, cleaning, and augmentation. Data collection includes searching for web data and data synthesis using simulators or GANs. Data cleaning removes noise and ensures data quality, with methods like Katara and BoostClean automating the process. Data augmentation generates new data to improve model robustness, with techniques like rotation, scaling, and GANs. Feature Engineering: Feature engineering aims to extract useful features from raw data. It includes feature selection, extraction, and construction. Feature selection reduces irrelevant features, while feature extraction reduces dimensionality. Feature construction creates new features to enhance model performance. Automated methods like genetic algorithms and decision trees are used to improve efficiency. Model Generation: Model generation involves search space and optimization methods. Search space defines model structures, with types like entire-structured, cell-based, hierarchical, and morphism-based. Optimization methods include reinforcement learning, evolution-based algorithms, and gradient descent. NAS is a key component, with methods like DARTS and P-DARTS improving search efficiency. Model Evaluation: Model evaluation assesses performance, with methods like training on the training set and estimating performance on the validation set. Advanced methods accelerate evaluation but may lose fidelity. Balancing efficiency and effectiveness is a keyAutoML: A Survey of the State-of-the-Art Xin He, Kaiyong Zhao, Xiaowen Chu Department of Computer Science, Hong Kong Baptist University Abstract: Deep learning (DL) has achieved remarkable results in various tasks, but building high-quality DL systems requires expert knowledge, limiting their application. Automated machine learning (AutoML) offers a solution by automating the DL pipeline. This paper provides a comprehensive review of AutoML, focusing on neural architecture search (NAS), which is a hot topic. It summarizes NAS algorithms' performance on CIFAR-10 and ImageNet, discusses NAS sub-topics like one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. It also identifies open problems in AutoML for future research. Keywords: deep learning, automated machine learning (AutoML), neural architecture search (NAS), hyperparameter optimization (HPO) Introduction: Deep learning has been applied in various fields, with increasingly complex neural networks proposed. However, these models are manually designed, requiring substantial resources. AutoML automates the entire ML pipeline, reducing development costs. It involves data preparation, feature engineering, model generation, and evaluation. NAS is a key component, aiming to find robust neural architectures by selecting and combining operations from a search space. This paper reviews NAS methods, classifying search spaces into entire-structured, cell-based, hierarchical, and morphism-based. It also discusses AO methods like reinforcement learning, evolution-based algorithms, and gradient descent. The paper compares NAS algorithms on CIFAR-10 and ImageNet, discusses open problems, and provides a comprehensive overview of AutoML. Data Preparation: The first step in the ML pipeline is data preparation, involving data collection, cleaning, and augmentation. Data collection includes searching for web data and data synthesis using simulators or GANs. Data cleaning removes noise and ensures data quality, with methods like Katara and BoostClean automating the process. Data augmentation generates new data to improve model robustness, with techniques like rotation, scaling, and GANs. Feature Engineering: Feature engineering aims to extract useful features from raw data. It includes feature selection, extraction, and construction. Feature selection reduces irrelevant features, while feature extraction reduces dimensionality. Feature construction creates new features to enhance model performance. Automated methods like genetic algorithms and decision trees are used to improve efficiency. Model Generation: Model generation involves search space and optimization methods. Search space defines model structures, with types like entire-structured, cell-based, hierarchical, and morphism-based. Optimization methods include reinforcement learning, evolution-based algorithms, and gradient descent. NAS is a key component, with methods like DARTS and P-DARTS improving search efficiency. Model Evaluation: Model evaluation assesses performance, with methods like training on the training set and estimating performance on the validation set. Advanced methods accelerate evaluation but may lose fidelity. Balancing efficiency and effectiveness is a key
Reach us at info@study.space