MAKING PRE-TRAINED LANGUAGE MODELS GREAT ON TABULAR PREDICTION

MAKING PRE-TRAINED LANGUAGE MODELS GREAT ON TABULAR PREDICTION

12 Mar 2024 | Jiahuan Yan, Bo Zheng, Hongxia Xu, Yiheng Zhu, Danny Z. Chen, Jimeng Sun, Jian Wu, Jintai Chen
The paper presents TP-BERTa, a pre-trained language model specifically designed for tabular data prediction. It addresses the challenge of handling numerical features in tables, which are inherently incompatible with the discrete text representation space of language models (LMs). TP-BERTa introduces two key innovations: *relative magnitude tokenization* (RMT) and *intra-feature attention* (IFA). RMT converts scalar numerical values into high-dimensional tokens, allowing LMs to understand relative value magnitudes. IFA integrates feature values with their corresponding names, preserving the semantic signal of feature names. Comprehensive experiments demonstrate that TP-BERTa outperforms existing tabular DNNs and is competitive with Gradient Boosted Decision Tree (GBDT) models on typical tabular data tasks. The paper also provides insights into the effectiveness of RMT and IFA through ablation studies and comparisons with other numerical encoding strategies. Overall, TP-BERTa showcases the potential of LMs in tabular prediction tasks, particularly when dealing with informative categorical features.The paper presents TP-BERTa, a pre-trained language model specifically designed for tabular data prediction. It addresses the challenge of handling numerical features in tables, which are inherently incompatible with the discrete text representation space of language models (LMs). TP-BERTa introduces two key innovations: *relative magnitude tokenization* (RMT) and *intra-feature attention* (IFA). RMT converts scalar numerical values into high-dimensional tokens, allowing LMs to understand relative value magnitudes. IFA integrates feature values with their corresponding names, preserving the semantic signal of feature names. Comprehensive experiments demonstrate that TP-BERTa outperforms existing tabular DNNs and is competitive with Gradient Boosted Decision Tree (GBDT) models on typical tabular data tasks. The paper also provides insights into the effectiveness of RMT and IFA through ablation studies and comparisons with other numerical encoding strategies. Overall, TP-BERTa showcases the potential of LMs in tabular prediction tasks, particularly when dealing with informative categorical features.
Reach us at info@study.space
Understanding Making Pre-trained Language Models Great on Tabular Prediction