IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT

IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT

July 14–18, 2024 | Junchen Fu, Xuri Ge, Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Jie Wang, Joemon M. Jose
IISAN is a novel approach for efficiently adapting multimodal representation in sequential recommendation using a decoupled PEFT structure. The paper introduces IISAN, which significantly reduces GPU memory usage and training time compared to full fine-tuning (FFT) and other PEFT methods. IISAN achieves performance comparable to FFT while using only 3GB of GPU memory instead of 47GB and reducing training time from 443 seconds to 22 seconds per epoch. It also outperforms Adapter and LoRA in terms of GPU memory and training time efficiency. The paper proposes a new composite efficiency metric, TPME, to evaluate practical efficiency, which considers training time, parameter count, and GPU memory usage. The results show that IISAN achieves high efficiency and performance on three widely-used multimodal recommendation datasets. The method uses a decoupled PEFT structure, which allows for efficient adaptation of pre-trained multimodal models. IISAN also incorporates a caching strategy to further improve efficiency. The paper also evaluates the robustness of IISAN across different multimodal backbones and demonstrates its effectiveness in both multimodal and unimodal scenarios. The results show that IISAN outperforms other methods in terms of efficiency and performance. The paper concludes that IISAN is a promising approach for efficient adaptation of multimodal representation in sequential recommendation.IISAN is a novel approach for efficiently adapting multimodal representation in sequential recommendation using a decoupled PEFT structure. The paper introduces IISAN, which significantly reduces GPU memory usage and training time compared to full fine-tuning (FFT) and other PEFT methods. IISAN achieves performance comparable to FFT while using only 3GB of GPU memory instead of 47GB and reducing training time from 443 seconds to 22 seconds per epoch. It also outperforms Adapter and LoRA in terms of GPU memory and training time efficiency. The paper proposes a new composite efficiency metric, TPME, to evaluate practical efficiency, which considers training time, parameter count, and GPU memory usage. The results show that IISAN achieves high efficiency and performance on three widely-used multimodal recommendation datasets. The method uses a decoupled PEFT structure, which allows for efficient adaptation of pre-trained multimodal models. IISAN also incorporates a caching strategy to further improve efficiency. The paper also evaluates the robustness of IISAN across different multimodal backbones and demonstrates its effectiveness in both multimodal and unimodal scenarios. The results show that IISAN outperforms other methods in terms of efficiency and performance. The paper concludes that IISAN is a promising approach for efficient adaptation of multimodal representation in sequential recommendation.
Reach us at info@study.space