DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification

DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification

2024 | Xinyan Liang, Pinhan Fu, Qian Guo, Keyin Zheng, Yuhua Qian
DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification This paper proposes DC-NAS, an efficient evolutionary-based neural architecture search (NAS) method for multi-modal classification. DC-NAS divides the population into k+1 sub-populations, with k sub-populations evolving on k small-scale data sets obtained by k-fold stratified sampling, and the remaining one evolving on the entire dataset. Two knowledge bases are used to exchange knowledge between sub-populations, improving training efficiency and classification performance. Experimental results show that DC-NAS achieves state-of-the-art results in terms of classification performance, training efficiency, and model parameter count compared to existing NAS-MMC methods on three popular multi-modal tasks: multi-label movie genre classification, action recognition with RGB and body joints, and dynamic hand gesture recognition. DC-NAS is an evolution-based NAS method with a large search space and high computational efficiency. It uses a binary tree to encode individuals, where leaf nodes represent features and branch nodes represent fusion operators. The method employs a divide-and-conquer strategy to partition the training dataset into multiple disjoint subsets and allocate each subset to a separate sub-population. Knowledge exchange between sub-populations is facilitated through two knowledge bases, enabling effective knowledge transfer and improving learning performance. The method is evaluated on three popular multi-modal tasks: MM-IMDB, NTU RGB-D, and EgoGesture. On the MM-IMDB dataset, DC-NAS achieves the best weighted F1 score, outperforming the latest BM-NAS method by 0.78%. On the NTU RGB-D dataset, DC-NAS achieves a cross-subject accuracy of 90.85%, demonstrating superior results compared to existing methods. On the EgoGesture dataset, DC-NAS achieves state-of-the-art classification performance. DC-NAS reduces search time by training different sub-populations using small-scale data, while achieving comparable or better classification performance through knowledge exchange between sub-populations. The method is efficient and effective, making it suitable for large-scale multi-modal data. Future research should focus on improving knowledge exchange strategies between sub-populations and data splitting.DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification This paper proposes DC-NAS, an efficient evolutionary-based neural architecture search (NAS) method for multi-modal classification. DC-NAS divides the population into k+1 sub-populations, with k sub-populations evolving on k small-scale data sets obtained by k-fold stratified sampling, and the remaining one evolving on the entire dataset. Two knowledge bases are used to exchange knowledge between sub-populations, improving training efficiency and classification performance. Experimental results show that DC-NAS achieves state-of-the-art results in terms of classification performance, training efficiency, and model parameter count compared to existing NAS-MMC methods on three popular multi-modal tasks: multi-label movie genre classification, action recognition with RGB and body joints, and dynamic hand gesture recognition. DC-NAS is an evolution-based NAS method with a large search space and high computational efficiency. It uses a binary tree to encode individuals, where leaf nodes represent features and branch nodes represent fusion operators. The method employs a divide-and-conquer strategy to partition the training dataset into multiple disjoint subsets and allocate each subset to a separate sub-population. Knowledge exchange between sub-populations is facilitated through two knowledge bases, enabling effective knowledge transfer and improving learning performance. The method is evaluated on three popular multi-modal tasks: MM-IMDB, NTU RGB-D, and EgoGesture. On the MM-IMDB dataset, DC-NAS achieves the best weighted F1 score, outperforming the latest BM-NAS method by 0.78%. On the NTU RGB-D dataset, DC-NAS achieves a cross-subject accuracy of 90.85%, demonstrating superior results compared to existing methods. On the EgoGesture dataset, DC-NAS achieves state-of-the-art classification performance. DC-NAS reduces search time by training different sub-populations using small-scale data, while achieving comparable or better classification performance through knowledge exchange between sub-populations. The method is efficient and effective, making it suitable for large-scale multi-modal data. Future research should focus on improving knowledge exchange strategies between sub-populations and data splitting.
Reach us at info@study.space