[slides and audio] Better by Default%3A Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data

The paper "Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data" by David Holzmüller, Léo Grinsztajn, and Ingo Steinhart addresses the challenge of deep learning methods being slower and requiring extensive hyperparameter tuning compared to gradient-boosted decision trees (GBDTs) on tabular data. The authors introduce RealMLP, an improved multilayer perceptron (MLP), and provide tuned default parameters for GBDTs and RealMLP. They tune these parameters on a meta-train benchmark and evaluate them on a meta-test benchmark, demonstrating that RealMLP offers a better time-accuracy tradeoff than other neural nets and is competitive with GBDTs. The combination of RealMLP and GBDTs with improved default parameters can achieve excellent results on medium-sized tabular datasets (1K–500K samples) without hyperparameter tuning. The paper also includes a detailed methodology, experimental results, and discussions on the effectiveness of the proposed improvements.The paper "Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data" by David Holzmüller, Léo Grinsztajn, and Ingo Steinhart addresses the challenge of deep learning methods being slower and requiring extensive hyperparameter tuning compared to gradient-boosted decision trees (GBDTs) on tabular data. The authors introduce RealMLP, an improved multilayer perceptron (MLP), and provide tuned default parameters for GBDTs and RealMLP. They tune these parameters on a meta-train benchmark and evaluate them on a meta-test benchmark, demonstrating that RealMLP offers a better time-accuracy tradeoff than other neural nets and is competitive with GBDTs. The combination of RealMLP and GBDTs with improved default parameters can achieve excellent results on medium-sized tabular datasets (1K–500K samples) without hyperparameter tuning. The paper also includes a detailed methodology, experimental results, and discussions on the effectiveness of the proposed improvements.

Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data

5 Jul 2024 | David Holzmüller, Léo Grinsztajn, Ingo Steinwart