November 24, 2021 | Ravid Shwartz-Ziv, Amitai Armon
This paper evaluates whether deep learning models for tabular data should replace traditional tree-based models like XGBoost. The authors compare several deep learning models (TabNet, NODE, DNF-Net, and 1D-CNN) with XGBoost on 11 diverse tabular datasets. They find that XGBoost generally outperforms the deep models across all datasets, including those used in the papers that proposed the deep models. Additionally, XGBoost requires significantly less hyperparameter tuning than the deep models. However, an ensemble of deep models combined with XGBoost performs better than XGBoost alone or any single deep model. The study shows that while deep learning models can achieve good results on specific datasets, they are not universally superior to XGBoost. The authors conclude that XGBoost remains the preferred choice for tabular data problems, and that deep learning is not yet all we need for tabular data. The paper also highlights the importance of hyperparameter optimization and the trade-off between model performance and computational efficiency. The results suggest that while deep learning models can be useful, they are not a replacement for traditional tree-based models in most cases.This paper evaluates whether deep learning models for tabular data should replace traditional tree-based models like XGBoost. The authors compare several deep learning models (TabNet, NODE, DNF-Net, and 1D-CNN) with XGBoost on 11 diverse tabular datasets. They find that XGBoost generally outperforms the deep models across all datasets, including those used in the papers that proposed the deep models. Additionally, XGBoost requires significantly less hyperparameter tuning than the deep models. However, an ensemble of deep models combined with XGBoost performs better than XGBoost alone or any single deep model. The study shows that while deep learning models can achieve good results on specific datasets, they are not universally superior to XGBoost. The authors conclude that XGBoost remains the preferred choice for tabular data problems, and that deep learning is not yet all we need for tabular data. The paper also highlights the importance of hyperparameter optimization and the trade-off between model performance and computational efficiency. The results suggest that while deep learning models can be useful, they are not a replacement for traditional tree-based models in most cases.