CARTE: Pretraining and Transfer for Tabular Learning

CARTE: Pretraining and Transfer for Tabular Learning

31 May 2024 | Myung Jun Kim, Léo Grinsztajn, Gaël Varoquaux
The paper introduces CARTE (Context-Aware Representation of Table Entries), a neural architecture designed to facilitate learning from tabular data without requiring explicit data integration. CARTE uses a graph representation of tabular data, string embeddings for entries and column names, and a graph-attentional network to contextualize entries with column names and neighboring entries. This approach allows CARTE to be pre-trained on background data that has not been matched, making it suitable for transfer learning across tables with unmatched columns. Extensive benchmarking shows that CARTE outperforms a set of 42 baselines, including the best tree-based models, and enables joint learning across multiple tables, enhancing small tables with larger ones. The paper also discusses the importance of strings in tabular data and the potential societal impact of CARTE, particularly in healthcare, where tabular data is crucial for code and entity normalization.The paper introduces CARTE (Context-Aware Representation of Table Entries), a neural architecture designed to facilitate learning from tabular data without requiring explicit data integration. CARTE uses a graph representation of tabular data, string embeddings for entries and column names, and a graph-attentional network to contextualize entries with column names and neighboring entries. This approach allows CARTE to be pre-trained on background data that has not been matched, making it suitable for transfer learning across tables with unmatched columns. Extensive benchmarking shows that CARTE outperforms a set of 42 baselines, including the best tree-based models, and enables joint learning across multiple tables, enhancing small tables with larger ones. The paper also discusses the importance of strings in tabular data and the potential societal impact of CARTE, particularly in healthcare, where tabular data is crucial for code and entity normalization.
Reach us at info@study.space