25 Apr 2024 | Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, Liqiang Nie
**MMGRec: Multimodal Generative Recommendation with Transformer Model**
**Authors:** Han Liu
**Abstract:**
Multimodal recommendation aims to recommend user-preferred items based on their historical interactions and associated multimodal information. Traditional methods often use an embed-and-retrieve paradigm, which suffers from high inference costs, inadequate interaction modeling, and false-negative issues. To address these limitations, we propose MMGRec, a novel Transformer-based model that introduces a generative paradigm into multimodal recommendation. Specifically, we devise a hierarchical quantization method, Graph RQ-VAE, to assign Rec-ID for each item, consisting of a sequence of semantically meaningful tokens. This Rec-ID serves as a unique identifier for each item. We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified by predicting the tuple of tokens identifying the recommended item in an autoregressive manner. Additionally, we introduce a relation-aware self-attention mechanism to handle non-sequential interaction sequences, exploring the element-wise pairwise relation to replace absolute positional encoding. Extensive experiments on three public datasets demonstrate the effectiveness and efficiency of MMGRec compared with state-of-the-art methods.
**Keywords:** Recommender Systems, Multimodal Recommendation, Generative Recommendation, Transformer
**Main Contributions:**
1. We propose MMGRec, a novel Transformer-based recommendation framework that includes Rec-ID assignment and generation.
2. We design a multimodal information quantization algorithm, Graph RQ-VAE, for Rec-ID assignment.
3. We introduce a relation-aware self-attention mechanism within the Transformer for Rec-ID generation.
4. Empirical studies on three real-world datasets show that MMGRec achieves state-of-the-art performance with promising inference efficiency.
**Experimental Setup:**
- Datasets: MovieLens, TikTok, Kwai
- Evaluation Metrics: Recall (R@K) and Normalized Discounted Cumulative Gain (NDCG@K)
- Baselines: CF-based (GraphSAGE, NGCF, GAT, LightGCN) and multimodal (VBPR, MMGCN, GRGCN, LATTICE, InvRL, LightGT)
**Results:**
- MMGRec consistently outperforms state-of-the-art baselines, achieving significant improvements in NDCG@10.
- Ablation studies validate the effectiveness of each component.
- Hyper-parameter analysis shows the impact of layer and head numbers.
- Efficiency study demonstrates that MMGRec is more efficient in large-scale recommendation scenarios.
**Conclusion:**
MMGRec addresses the limitations of traditional multimodal recommendation methods by introducing a generative paradigm. It achieves state-of-the-art performance while maintaining efficient inference, making it a promising approach for multimodal recommendation tasks.**MMGRec: Multimodal Generative Recommendation with Transformer Model**
**Authors:** Han Liu
**Abstract:**
Multimodal recommendation aims to recommend user-preferred items based on their historical interactions and associated multimodal information. Traditional methods often use an embed-and-retrieve paradigm, which suffers from high inference costs, inadequate interaction modeling, and false-negative issues. To address these limitations, we propose MMGRec, a novel Transformer-based model that introduces a generative paradigm into multimodal recommendation. Specifically, we devise a hierarchical quantization method, Graph RQ-VAE, to assign Rec-ID for each item, consisting of a sequence of semantically meaningful tokens. This Rec-ID serves as a unique identifier for each item. We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified by predicting the tuple of tokens identifying the recommended item in an autoregressive manner. Additionally, we introduce a relation-aware self-attention mechanism to handle non-sequential interaction sequences, exploring the element-wise pairwise relation to replace absolute positional encoding. Extensive experiments on three public datasets demonstrate the effectiveness and efficiency of MMGRec compared with state-of-the-art methods.
**Keywords:** Recommender Systems, Multimodal Recommendation, Generative Recommendation, Transformer
**Main Contributions:**
1. We propose MMGRec, a novel Transformer-based recommendation framework that includes Rec-ID assignment and generation.
2. We design a multimodal information quantization algorithm, Graph RQ-VAE, for Rec-ID assignment.
3. We introduce a relation-aware self-attention mechanism within the Transformer for Rec-ID generation.
4. Empirical studies on three real-world datasets show that MMGRec achieves state-of-the-art performance with promising inference efficiency.
**Experimental Setup:**
- Datasets: MovieLens, TikTok, Kwai
- Evaluation Metrics: Recall (R@K) and Normalized Discounted Cumulative Gain (NDCG@K)
- Baselines: CF-based (GraphSAGE, NGCF, GAT, LightGCN) and multimodal (VBPR, MMGCN, GRGCN, LATTICE, InvRL, LightGT)
**Results:**
- MMGRec consistently outperforms state-of-the-art baselines, achieving significant improvements in NDCG@10.
- Ablation studies validate the effectiveness of each component.
- Hyper-parameter analysis shows the impact of layer and head numbers.
- Efficiency study demonstrates that MMGRec is more efficient in large-scale recommendation scenarios.
**Conclusion:**
MMGRec addresses the limitations of traditional multimodal recommendation methods by introducing a generative paradigm. It achieves state-of-the-art performance while maintaining efficient inference, making it a promising approach for multimodal recommendation tasks.