5 Jul 2024 | David Holzmüller, Léo Grinsztajn, Ingo Steinwart
This paper presents RealMLP, an improved multilayer perceptron (MLP) with enhanced default parameters, and improved default parameters for gradient-boosted decision trees (GBDTs) such as XGBoost, LightGBM, and CatBoost. The authors evaluate these models on a meta-train benchmark with 71 classification and 47 regression datasets, and a disjoint meta-test benchmark with 48 classification and 42 regression datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). The results show that RealMLP offers a better time-accuracy tradeoff than other neural nets and is competitive with GBDTs. Additionally, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results on medium-sized tabular datasets (1K–500K samples) without hyperparameter tuning. The paper also discusses the importance of good default parameters in automated machine learning (AutoML), and how they can lead to better performance than hyperparameter optimization. The authors propose a meta-learning approach to find better default parameters, and evaluate their models on various benchmarks. The results show that RealMLP performs favorably among neural networks and is competitive with GBDTs. The paper also highlights the benefits of using improved default parameters and the importance of algorithm selection and ensembling in achieving better performance. The authors conclude that with good default parameters, it is worth trying both algorithm families even with a moderate training time budget.This paper presents RealMLP, an improved multilayer perceptron (MLP) with enhanced default parameters, and improved default parameters for gradient-boosted decision trees (GBDTs) such as XGBoost, LightGBM, and CatBoost. The authors evaluate these models on a meta-train benchmark with 71 classification and 47 regression datasets, and a disjoint meta-test benchmark with 48 classification and 42 regression datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). The results show that RealMLP offers a better time-accuracy tradeoff than other neural nets and is competitive with GBDTs. Additionally, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results on medium-sized tabular datasets (1K–500K samples) without hyperparameter tuning. The paper also discusses the importance of good default parameters in automated machine learning (AutoML), and how they can lead to better performance than hyperparameter optimization. The authors propose a meta-learning approach to find better default parameters, and evaluate their models on various benchmarks. The results show that RealMLP performs favorably among neural networks and is competitive with GBDTs. The paper also highlights the benefits of using improved default parameters and the importance of algorithm selection and ensembling in achieving better performance. The authors conclude that with good default parameters, it is worth trying both algorithm families even with a moderate training time budget.