30 Apr 2024 | Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis
AutoGluon-Multimodal (AutoMM) is an open-source AutoML library designed specifically for multimodal learning, enabling fine-tuning of foundation models with just three lines of code. It supports various modalities, including image, text, and tabular data, and offers a comprehensive suite of functionalities for tasks such as classification, regression, object detection, semantic matching, and image segmentation. The library leverages popular model repositories like Huggingface/transformers, TIMM, and MMDetection to support a wide range of models. Experimental results demonstrate that AutoMM outperforms existing AutoML tools in basic classification and regression tasks and shows competitive performance in advanced tasks, aligning with specialized toolboxes designed for such purposes. The evaluation of AutoMM includes a benchmark comprising 55 publicly available datasets and comparisons with task-specific open-source libraries. The paper also discusses related work, the design and functionalities of AutoMM, and future directions, including the integration of multimodal foundation models, support for generative tasks, and expansion to more modalities.AutoGluon-Multimodal (AutoMM) is an open-source AutoML library designed specifically for multimodal learning, enabling fine-tuning of foundation models with just three lines of code. It supports various modalities, including image, text, and tabular data, and offers a comprehensive suite of functionalities for tasks such as classification, regression, object detection, semantic matching, and image segmentation. The library leverages popular model repositories like Huggingface/transformers, TIMM, and MMDetection to support a wide range of models. Experimental results demonstrate that AutoMM outperforms existing AutoML tools in basic classification and regression tasks and shows competitive performance in advanced tasks, aligning with specialized toolboxes designed for such purposes. The evaluation of AutoMM includes a benchmark comprising 55 publicly available datasets and comparisons with task-specific open-source libraries. The paper also discusses related work, the design and functionalities of AutoMM, and future directions, including the integration of multimodal foundation models, support for generative tasks, and expansion to more modalities.