2024-04-30 | Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis
AutoGluon-Multimodal (AutoMM) is an open-source AutoML library designed for multimodal learning. It enables users to fine-tune foundation models with just three lines of code, supporting various modalities including image, text, and tabular data. AutoMM offers a comprehensive suite of functionalities for tasks such as classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks show that AutoMM outperforms existing AutoML tools in basic classification and regression tasks and performs competitively in advanced tasks. AutoMM is built within the AutoGluon ecosystem and leverages popular model repositories to support a wide range of modalities. It provides a unified framework for handling diverse modalities and tasks, with a focus on ease of use and performance. AutoMM supports both unimodal and multimodal data, and its design includes a unified data pipeline, user-friendly APIs, and a variety of models. The framework also includes features for deployment, including support for NVIDIA TensorRT for efficient inference. AutoMM is evaluated on a benchmark of 55 publicly available datasets, demonstrating its superior performance in classification and regression tasks compared to AutoKeras. It also supports advanced tasks such as semantic matching, object detection, and semantic segmentation. AutoMM is designed to handle multimodal inputs efficiently, with a focus on parameter-efficient fine-tuning and scalable performance. The framework is continuously developed to expand its capabilities, including support for more modalities and generative tasks. AutoMM aims to democratize machine learning by providing a unified, user-friendly, and efficient solution for a wide range of tasks.AutoGluon-Multimodal (AutoMM) is an open-source AutoML library designed for multimodal learning. It enables users to fine-tune foundation models with just three lines of code, supporting various modalities including image, text, and tabular data. AutoMM offers a comprehensive suite of functionalities for tasks such as classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks show that AutoMM outperforms existing AutoML tools in basic classification and regression tasks and performs competitively in advanced tasks. AutoMM is built within the AutoGluon ecosystem and leverages popular model repositories to support a wide range of modalities. It provides a unified framework for handling diverse modalities and tasks, with a focus on ease of use and performance. AutoMM supports both unimodal and multimodal data, and its design includes a unified data pipeline, user-friendly APIs, and a variety of models. The framework also includes features for deployment, including support for NVIDIA TensorRT for efficient inference. AutoMM is evaluated on a benchmark of 55 publicly available datasets, demonstrating its superior performance in classification and regression tasks compared to AutoKeras. It also supports advanced tasks such as semantic matching, object detection, and semantic segmentation. AutoMM is designed to handle multimodal inputs efficiently, with a focus on parameter-efficient fine-tuning and scalable performance. The framework is continuously developed to expand its capabilities, including support for more modalities and generative tasks. AutoMM aims to democratize machine learning by providing a unified, user-friendly, and efficient solution for a wide range of tasks.