EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

13 Apr 2024 | Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To address these issues, we introduce EIVEN, a data- and parameter-efficient generative framework that pioneers the use of multimodal LLM for implicit attribute value extraction. EIVEN leverages the rich inherent knowledge of a pre-trained LLM and vision encoder to reduce reliance on labeled data. We also introduce a novel Learning-by-Comparison technique to reduce model confusion by enforcing attribute value comparison and difference identification. Additionally, we construct initial open-source datasets for multimodal implicit attribute value extraction. Our extensive experiments reveal that EIVEN significantly outperforms existing methods in extracting implicit attribute values while requiring less labeled data. EIVEN is a data and parameter-efficient multimodal generative framework for multimodal implicit attribute value extraction. It utilizes the rich inherent knowledge of a pre-trained LLM and vision encoder to lessen reliance on extensive attribute-specific data. Additionally, to address the issue of model confusion caused by similar attribute values, we introduce a novel technique termed "Learning-by-Comparison". This approach feeds the model with pairs of instances that share the same attribute but potentially have different attribute values, forcing the model to compare and distinguish them. Our contributions are summarized as follows: we are the first to explore multimodal LLM for the emerging real-world problem of implicit attribute value extraction. We propose a novel Learning-by-Comparison technique to reduce model confusion among similar attribute values. We construct initial open-source datasets for multimodal implicit AVE. Extensive experiments show that our framework greatly outperforms recent multimodal AVE works, even with less labeled data. EIVEN leverages the rich internal knowledge of pre-trained LLM to reduce reliance on attribute-specific labeled data and adopts lightweight adapters for parameter-efficient fine-tuning of LLM. Additionally, to enhance the visual understanding ability of our model, we feed multi-granularity visual features into LLM and propose Learning-by-Comparison strategies to alleviate model confusion among attribute values. We also release the first open-source dataset. Through extensive experiments on three multimodal implicit attribute value extraction datasets, we found that EIVEN can significantly outperform previous works using fewer labels, making it an efficient solution for implicit attribute value extraction.EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To address these issues, we introduce EIVEN, a data- and parameter-efficient generative framework that pioneers the use of multimodal LLM for implicit attribute value extraction. EIVEN leverages the rich inherent knowledge of a pre-trained LLM and vision encoder to reduce reliance on labeled data. We also introduce a novel Learning-by-Comparison technique to reduce model confusion by enforcing attribute value comparison and difference identification. Additionally, we construct initial open-source datasets for multimodal implicit attribute value extraction. Our extensive experiments reveal that EIVEN significantly outperforms existing methods in extracting implicit attribute values while requiring less labeled data. EIVEN is a data and parameter-efficient multimodal generative framework for multimodal implicit attribute value extraction. It utilizes the rich inherent knowledge of a pre-trained LLM and vision encoder to lessen reliance on extensive attribute-specific data. Additionally, to address the issue of model confusion caused by similar attribute values, we introduce a novel technique termed "Learning-by-Comparison". This approach feeds the model with pairs of instances that share the same attribute but potentially have different attribute values, forcing the model to compare and distinguish them. Our contributions are summarized as follows: we are the first to explore multimodal LLM for the emerging real-world problem of implicit attribute value extraction. We propose a novel Learning-by-Comparison technique to reduce model confusion among similar attribute values. We construct initial open-source datasets for multimodal implicit AVE. Extensive experiments show that our framework greatly outperforms recent multimodal AVE works, even with less labeled data. EIVEN leverages the rich internal knowledge of pre-trained LLM to reduce reliance on attribute-specific labeled data and adopts lightweight adapters for parameter-efficient fine-tuning of LLM. Additionally, to enhance the visual understanding ability of our model, we feed multi-granularity visual features into LLM and propose Learning-by-Comparison strategies to alleviate model confusion among attribute values. We also release the first open-source dataset. Through extensive experiments on three multimodal implicit attribute value extraction datasets, we found that EIVEN can significantly outperform previous works using fewer labels, making it an efficient solution for implicit attribute value extraction.
Reach us at info@study.space
Understanding EIVEN%3A Efficient Implicit Attribute Value Extraction using Multimodal LLM