MAKING PRE-TRAINED LANGUAGE MODELS GREAT ON TABULAR PREDICTION

MAKING PRE-TRAINED LANGUAGE MODELS GREAT ON TABULAR PREDICTION

2024 | Jiahuan Yan, Bo Zheng, Hongxia Xu, Yiheng Zhu, Danny Z. Chen, Jimeng Sun, Jian Wu, Jintai Chen
This paper introduces TP-BERTa, a pre-trained language model specifically designed for tabular data prediction. The model addresses the challenge of transferring knowledge from pre-training to downstream tabular prediction tasks by converting numerical values into relative magnitude tokens and integrating feature names with their corresponding values through an intra-feature attention mechanism. TP-BERTa outperforms other tabular DNNs and is competitive with Gradient Boosted Decision Trees (GBDTs) in typical tabular data regimes. The model's key innovations include relative magnitude tokenization, which discretizes numerical values into meaningful tokens, and intra-feature attention, which fuses feature names and values into a single vector for processing. Comprehensive experiments show that TP-BERTa achieves significant improvements in performance on various tabular prediction tasks, demonstrating its effectiveness in handling both categorical and numerical features. The model's design enables it to better understand and utilize the semantic information in tabular data, making it a powerful alternative to traditional DNNs for tabular prediction tasks. The study highlights the potential of pre-trained language models in tabular prediction and underscores the importance of addressing numerical feature representation and feature name processing in tabular data.This paper introduces TP-BERTa, a pre-trained language model specifically designed for tabular data prediction. The model addresses the challenge of transferring knowledge from pre-training to downstream tabular prediction tasks by converting numerical values into relative magnitude tokens and integrating feature names with their corresponding values through an intra-feature attention mechanism. TP-BERTa outperforms other tabular DNNs and is competitive with Gradient Boosted Decision Trees (GBDTs) in typical tabular data regimes. The model's key innovations include relative magnitude tokenization, which discretizes numerical values into meaningful tokens, and intra-feature attention, which fuses feature names and values into a single vector for processing. Comprehensive experiments show that TP-BERTa achieves significant improvements in performance on various tabular prediction tasks, demonstrating its effectiveness in handling both categorical and numerical features. The model's design enables it to better understand and utilize the semantic information in tabular data, making it a powerful alternative to traditional DNNs for tabular prediction tasks. The study highlights the potential of pre-trained language models in tabular prediction and underscores the importance of addressing numerical feature representation and feature name processing in tabular data.
Reach us at info@study.space