6 Mar 2024 | Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley
This paper introduces BLAIR, a series of pre-trained sentence embedding models designed for recommendation scenarios. BLAIR is trained to learn correlations between item metadata and natural language contexts, enhancing the retrieval and recommendation of items. The authors collect a new dataset, **AMAZON REVIEWS 2023**, which includes over 570 million reviews and 48 million items from 33 categories, significantly expanding the scope of previous datasets. BLAIR is pre-trained using a contrastive objective that pairs user reviews with item metadata, enabling the models to effectively link items with natural language contexts. The evaluation of BLAIR across multiple domains and tasks, including a new task named *complex product search*, demonstrates its strong text and item representation capacity. The main contributions of the paper include the collection of the extensive **AMAZON REVIEWS 2023** dataset, the presentation of BLAIR, and the introduction of the *complex product search* task. The paper also provides a comprehensive evaluation benchmark and highlights the effectiveness of BLAIR in various recommendation tasks.This paper introduces BLAIR, a series of pre-trained sentence embedding models designed for recommendation scenarios. BLAIR is trained to learn correlations between item metadata and natural language contexts, enhancing the retrieval and recommendation of items. The authors collect a new dataset, **AMAZON REVIEWS 2023**, which includes over 570 million reviews and 48 million items from 33 categories, significantly expanding the scope of previous datasets. BLAIR is pre-trained using a contrastive objective that pairs user reviews with item metadata, enabling the models to effectively link items with natural language contexts. The evaluation of BLAIR across multiple domains and tasks, including a new task named *complex product search*, demonstrates its strong text and item representation capacity. The main contributions of the paper include the collection of the extensive **AMAZON REVIEWS 2023** dataset, the presentation of BLAIR, and the introduction of the *complex product search* task. The paper also provides a comprehensive evaluation benchmark and highlights the effectiveness of BLAIR in various recommendation tasks.