Understanding EIVEN%3A Efficient Implicit Attribute Value Extraction using Multimodal LLM

In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency. However, previous approaches often struggle with implicit attribute values, rely heavily on labeled data, and can confuse similar values. To address these issues, the authors introduce EIVEN, a data- and parameter-efficient generative framework that uses multimodal LLM for implicit attribute value extraction. EIVEN leverages the knowledge of pre-trained LLMs and vision encoders to reduce reliance on labeled data. The authors also introduce a novel "Learning-by-Comparison" technique to reduce model confusion by comparing and distinguishing similar attribute values. Additionally, they construct initial open-source datasets for multimodal implicit attribute value extraction. Extensive experiments show that EIVEN significantly outperforms existing methods in extracting implicit attribute values while requiring less labeled data. The framework uses multi-granularity visual features and lightweight adapters for efficient fine-tuning. The Learning-by-Comparison strategy involves feeding the model pairs of instances with the same attribute but different values, forcing it to compare and distinguish them. The authors also release open-source datasets for further research. The experiments demonstrate that EIVEN achieves better performance with fewer labeled examples, highlighting its data efficiency. The study also includes ablation studies and qualitative examples showing the effectiveness of the framework. The authors acknowledge limitations, such as the need for more baselines and better annotation methods, and suggest future work on more effective Learning-by-Comparison strategies.In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency. However, previous approaches often struggle with implicit attribute values, rely heavily on labeled data, and can confuse similar values. To address these issues, the authors introduce EIVEN, a data- and parameter-efficient generative framework that uses multimodal LLM for implicit attribute value extraction. EIVEN leverages the knowledge of pre-trained LLMs and vision encoders to reduce reliance on labeled data. The authors also introduce a novel "Learning-by-Comparison" technique to reduce model confusion by comparing and distinguishing similar attribute values. Additionally, they construct initial open-source datasets for multimodal implicit attribute value extraction. Extensive experiments show that EIVEN significantly outperforms existing methods in extracting implicit attribute values while requiring less labeled data. The framework uses multi-granularity visual features and lightweight adapters for efficient fine-tuning. The Learning-by-Comparison strategy involves feeding the model pairs of instances with the same attribute but different values, forcing it to compare and distinguish them. The authors also release open-source datasets for further research. The experiments demonstrate that EIVEN achieves better performance with fewer labeled examples, highlighting its data efficiency. The study also includes ablation studies and qualitative examples showing the effectiveness of the framework. The authors acknowledge limitations, such as the need for more baselines and better annotation methods, and suggest future work on more effective Learning-by-Comparison strategies.

EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

13 Apr 2024 | Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea