2024 | Sungwon Han, Jinsung Yoon, Sercan Ö. Arik, Tomas Pfister
This paper introduces FeatLLM, a novel in-context learning framework that leverages large language models (LLMs) to automatically generate features for few-shot tabular learning. FeatLLM uses LLMs as feature engineers to create an input dataset optimized for tabular predictions. The generated features are then used with a simple downstream machine learning model, such as linear regression, to achieve high performance in few-shot learning. Unlike existing LLM-based approaches, FeatLLM eliminates the need to send queries to the LLM at inference time and only requires API-level access to LLMs, overcoming prompt size limitations. FeatLLM generates high-quality rules that significantly outperform alternatives like TabLLM and STUNT across various tabular datasets.
The framework is designed to extract "rules" from few samples, which are then parsed into binary features indicating whether samples satisfy the rules. These binary features are used to estimate class likelihoods with a linear model. The process is repeated multiple times with bagging and combined via ensembling to improve robustness. FeatLLM structures the input prompt into two sub-tasks: (1) understanding the problem and inferring the relationship between features and targets; and (2) deriving decisive rules to distinguish classes based on previous knowledge and few-shot examples. The deduced rules are applied to the data, and a low-complexity model is fitted to estimate class probabilities from these new features.
The framework is evaluated on 13 tabular datasets, showing strong and robust performance. FeatLLM outperforms contemporary few-shot learning baselines across various settings. The code is released via an anonymized GitHub link. The paper also discusses related work, including few-shot learning with tabular data and language-interfaced tabular learning. It highlights the effectiveness of FeatLLM in handling tabular data with a large number of features and demonstrates its potential for real-world applications. The framework is designed for low-shot learning and aims to expand its capabilities for larger datasets and diverse feature types in future work.This paper introduces FeatLLM, a novel in-context learning framework that leverages large language models (LLMs) to automatically generate features for few-shot tabular learning. FeatLLM uses LLMs as feature engineers to create an input dataset optimized for tabular predictions. The generated features are then used with a simple downstream machine learning model, such as linear regression, to achieve high performance in few-shot learning. Unlike existing LLM-based approaches, FeatLLM eliminates the need to send queries to the LLM at inference time and only requires API-level access to LLMs, overcoming prompt size limitations. FeatLLM generates high-quality rules that significantly outperform alternatives like TabLLM and STUNT across various tabular datasets.
The framework is designed to extract "rules" from few samples, which are then parsed into binary features indicating whether samples satisfy the rules. These binary features are used to estimate class likelihoods with a linear model. The process is repeated multiple times with bagging and combined via ensembling to improve robustness. FeatLLM structures the input prompt into two sub-tasks: (1) understanding the problem and inferring the relationship between features and targets; and (2) deriving decisive rules to distinguish classes based on previous knowledge and few-shot examples. The deduced rules are applied to the data, and a low-complexity model is fitted to estimate class probabilities from these new features.
The framework is evaluated on 13 tabular datasets, showing strong and robust performance. FeatLLM outperforms contemporary few-shot learning baselines across various settings. The code is released via an anonymized GitHub link. The paper also discusses related work, including few-shot learning with tabular data and language-interfaced tabular learning. It highlights the effectiveness of FeatLLM in handling tabular data with a large number of features and demonstrates its potential for real-world applications. The framework is designed for low-shot learning and aims to expand its capabilities for larger datasets and diverse feature types in future work.