4 Jun 2024 | Xinhao Zhang, Jinghan Zhang, Banafsheh Rekabdar, Yuanchun Zhou, Pengfei Wang, Kunpeng Liu
The paper introduces a novel approach to feature generation using large language models (LLMs) to address the limitations of traditional feature engineering methods. The authors propose a dynamic and adaptive feature generation method that enhances interpretability, broadens applicability, and improves strategic flexibility. The method involves using LLMs to create expert agents that generate new features from an original feature set, which are then evaluated in downstream tasks. Feedback from these tasks is used to refine the feature generation strategies, leading to an optimal feature set. The paper includes a detailed methodology, experimental setup, and results demonstrating the effectiveness of the proposed approach across various datasets and tasks. The experiments show that the LFG method outperforms existing methods in terms of accuracy, precision, recall, and F1 scores, highlighting its robustness and adaptability. The authors also discuss the limitations and ethical considerations of their approach, such as computational demands and the quality of input data.The paper introduces a novel approach to feature generation using large language models (LLMs) to address the limitations of traditional feature engineering methods. The authors propose a dynamic and adaptive feature generation method that enhances interpretability, broadens applicability, and improves strategic flexibility. The method involves using LLMs to create expert agents that generate new features from an original feature set, which are then evaluated in downstream tasks. Feedback from these tasks is used to refine the feature generation strategies, leading to an optimal feature set. The paper includes a detailed methodology, experimental setup, and results demonstrating the effectiveness of the proposed approach across various datasets and tasks. The experiments show that the LFG method outperforms existing methods in terms of accuracy, precision, recall, and F1 scores, highlighting its robustness and adaptability. The authors also discuss the limitations and ethical considerations of their approach, such as computational demands and the quality of input data.