MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation
This paper proposes a novel multi-level self-supervised learning framework, MENTOR, for multimodal recommendation. MENTOR addresses the challenges of data sparsity and modality alignment in multimodal recommendation systems. The framework introduces a multilevel cross-modal alignment task to align different modalities while preserving historical interaction information. Additionally, MENTOR develops a general feature enhancement task that improves model robustness by enhancing features from both the graph and feature perspectives.
MENTOR first enhances the specific features of each modality using a graph convolutional network (GCN) and fuses the visual and textual modalities. It then enhances the item representation via the item semantic graph for all modalities, including the fused modality. MENTOR introduces two multilevel self-supervised tasks: the multilevel cross-modal alignment task and the general feature enhancement task. The multilevel cross-modal alignment task aligns each modality under the guidance of the ID embedding from multiple levels while maintaining the historical interaction information. The general feature enhancement task enhances the general feature from both the graph and feature perspectives to improve the robustness of the model.
Extensive experiments on three publicly available datasets demonstrate the effectiveness of MENTOR. The results show that MENTOR outperforms traditional recommendation methods and multimodal recommendation methods in terms of recommendation accuracy. The framework is effective in aligning different modalities and improving the robustness of the model. The results also show that the multilevel cross-modal alignment component significantly improves the recommendation performance. The general feature enhancement component also contributes to the improvement of the model's robustness. The experiments also show that the performance of MENTOR is sensitive to the hyper-parameter settings, and the optimal settings are determined through extensive experiments. The results demonstrate that MENTOR is an effective framework for multimodal recommendation.MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation
This paper proposes a novel multi-level self-supervised learning framework, MENTOR, for multimodal recommendation. MENTOR addresses the challenges of data sparsity and modality alignment in multimodal recommendation systems. The framework introduces a multilevel cross-modal alignment task to align different modalities while preserving historical interaction information. Additionally, MENTOR develops a general feature enhancement task that improves model robustness by enhancing features from both the graph and feature perspectives.
MENTOR first enhances the specific features of each modality using a graph convolutional network (GCN) and fuses the visual and textual modalities. It then enhances the item representation via the item semantic graph for all modalities, including the fused modality. MENTOR introduces two multilevel self-supervised tasks: the multilevel cross-modal alignment task and the general feature enhancement task. The multilevel cross-modal alignment task aligns each modality under the guidance of the ID embedding from multiple levels while maintaining the historical interaction information. The general feature enhancement task enhances the general feature from both the graph and feature perspectives to improve the robustness of the model.
Extensive experiments on three publicly available datasets demonstrate the effectiveness of MENTOR. The results show that MENTOR outperforms traditional recommendation methods and multimodal recommendation methods in terms of recommendation accuracy. The framework is effective in aligning different modalities and improving the robustness of the model. The results also show that the multilevel cross-modal alignment component significantly improves the recommendation performance. The general feature enhancement component also contributes to the improvement of the model's robustness. The experiments also show that the performance of MENTOR is sensitive to the hyper-parameter settings, and the optimal settings are determined through extensive experiments. The results demonstrate that MENTOR is an effective framework for multimodal recommendation.