Understanding MMGRec%3A Multimodal Generative Recommendation with Transformer Model

**MMGRec: Multimodal Generative Recommendation with Transformer Model** **Authors:** Han Liu **Abstract:** Multimodal recommendation aims to recommend user-preferred items based on their historical interactions and associated multimodal information. Traditional methods often use an embed-and-retrieve paradigm, which suffers from high inference costs, inadequate interaction modeling, and false-negative issues. To address these limitations, we propose MMGRec, a novel Transformer-based model that introduces a generative paradigm into multimodal recommendation. Specifically, we devise a hierarchical quantization method, Graph RQ-VAE, to assign Rec-ID for each item, consisting of a sequence of semantically meaningful tokens. This Rec-ID serves as a unique identifier for each item. We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified by predicting the tuple of tokens identifying the recommended item in an autoregressive manner. Additionally, we introduce a relation-aware self-attention mechanism to handle non-sequential interaction sequences, exploring the element-wise pairwise relation to replace absolute positional encoding. Extensive experiments on three public datasets demonstrate the effectiveness and efficiency of MMGRec compared with state-of-the-art methods. **Keywords:** Recommender Systems, Multimodal Recommendation, Generative Recommendation, Transformer **Main Contributions:** 1. We propose MMGRec, a novel Transformer-based recommendation framework that includes Rec-ID assignment and generation. 2. We design a multimodal information quantization algorithm, Graph RQ-VAE, for Rec-ID assignment. 3. We introduce a relation-aware self-attention mechanism within the Transformer for Rec-ID generation. 4. Empirical studies on three real-world datasets show that MMGRec achieves state-of-the-art performance with promising inference efficiency. **Experimental Setup:** - Datasets: MovieLens, TikTok, Kwai - Evaluation Metrics: Recall (R@K) and Normalized Discounted Cumulative Gain (NDCG@K) - Baselines: CF-based (GraphSAGE, NGCF, GAT, LightGCN) and multimodal (VBPR, MMGCN, GRGCN, LATTICE, InvRL, LightGT) **Results:** - MMGRec consistently outperforms state-of-the-art baselines, achieving significant improvements in NDCG@10. - Ablation studies validate the effectiveness of each component. - Hyper-parameter analysis shows the impact of layer and head numbers. - Efficiency study demonstrates that MMGRec is more efficient in large-scale recommendation scenarios. **Conclusion:** MMGRec addresses the limitations of traditional multimodal recommendation methods by introducing a generative paradigm. It achieves state-of-the-art performance while maintaining efficient inference, making it a promising approach for multimodal recommendation tasks.**MMGRec: Multimodal Generative Recommendation with Transformer Model** **Authors:** Han Liu **Abstract:** Multimodal recommendation aims to recommend user-preferred items based on their historical interactions and associated multimodal information. Traditional methods often use an embed-and-retrieve paradigm, which suffers from high inference costs, inadequate interaction modeling, and false-negative issues. To address these limitations, we propose MMGRec, a novel Transformer-based model that introduces a generative paradigm into multimodal recommendation. Specifically, we devise a hierarchical quantization method, Graph RQ-VAE, to assign Rec-ID for each item, consisting of a sequence of semantically meaningful tokens. This Rec-ID serves as a unique identifier for each item. We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified by predicting the tuple of tokens identifying the recommended item in an autoregressive manner. Additionally, we introduce a relation-aware self-attention mechanism to handle non-sequential interaction sequences, exploring the element-wise pairwise relation to replace absolute positional encoding. Extensive experiments on three public datasets demonstrate the effectiveness and efficiency of MMGRec compared with state-of-the-art methods. **Keywords:** Recommender Systems, Multimodal Recommendation, Generative Recommendation, Transformer **Main Contributions:** 1. We propose MMGRec, a novel Transformer-based recommendation framework that includes Rec-ID assignment and generation. 2. We design a multimodal information quantization algorithm, Graph RQ-VAE, for Rec-ID assignment. 3. We introduce a relation-aware self-attention mechanism within the Transformer for Rec-ID generation. 4. Empirical studies on three real-world datasets show that MMGRec achieves state-of-the-art performance with promising inference efficiency. **Experimental Setup:** - Datasets: MovieLens, TikTok, Kwai - Evaluation Metrics: Recall (R@K) and Normalized Discounted Cumulative Gain (NDCG@K) - Baselines: CF-based (GraphSAGE, NGCF, GAT, LightGCN) and multimodal (VBPR, MMGCN, GRGCN, LATTICE, InvRL, LightGT) **Results:** - MMGRec consistently outperforms state-of-the-art baselines, achieving significant improvements in NDCG@10. - Ablation studies validate the effectiveness of each component. - Hyper-parameter analysis shows the impact of layer and head numbers. - Efficiency study demonstrates that MMGRec is more efficient in large-scale recommendation scenarios. **Conclusion:** MMGRec addresses the limitations of traditional multimodal recommendation methods by introducing a generative paradigm. It achieves state-of-the-art performance while maintaining efficient inference, making it a promising approach for multimodal recommendation tasks.

MMGRec: Multimodal Generative Recommendation with Transformer Model

25 Apr 2024 | Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, Liqiang Nie