Understanding Actions Speak Louder than Words%3A Trillion-Parameter Sequential Transducers for Generative Recommendations

This article introduces "Generative Recommenders" (GRs), a new paradigm for recommendation systems that reformulates ranking and retrieval tasks as sequential transduction problems within a generative modeling framework. The proposed architecture, Hierarchical Sequential Transduction Units (HSTU), is designed to handle high cardinality, non-stationary streaming recommendation data. HSTU outperforms existing models by up to 65.8% in NDCG on synthetic and public datasets and is significantly faster than FlashAttention2-based Transformers. A 1.5 trillion parameter GR model improves online metrics by 12.4% and has been deployed on a large internet platform. The study shows that model quality scales with training compute, reducing the carbon footprint for future developments and enabling the first foundation models in recommendations. The article discusses the challenges of recommendation systems, including handling high cardinality features, dynamic vocabularies, and computational costs, and presents HSTU's efficient attention mechanism, sparsity techniques, and training algorithms like M-FALCON. The experiments validate the effectiveness of GRs on both synthetic and real-world datasets, demonstrating superior performance and scalability compared to traditional Deep Learning Recommendation Models (DLRMs). The work highlights the potential of generative modeling in recommendation systems and the importance of scaling laws in achieving better performance.This article introduces "Generative Recommenders" (GRs), a new paradigm for recommendation systems that reformulates ranking and retrieval tasks as sequential transduction problems within a generative modeling framework. The proposed architecture, Hierarchical Sequential Transduction Units (HSTU), is designed to handle high cardinality, non-stationary streaming recommendation data. HSTU outperforms existing models by up to 65.8% in NDCG on synthetic and public datasets and is significantly faster than FlashAttention2-based Transformers. A 1.5 trillion parameter GR model improves online metrics by 12.4% and has been deployed on a large internet platform. The study shows that model quality scales with training compute, reducing the carbon footprint for future developments and enabling the first foundation models in recommendations. The article discusses the challenges of recommendation systems, including handling high cardinality features, dynamic vocabularies, and computational costs, and presents HSTU's efficient attention mechanism, sparsity techniques, and training algorithms like M-FALCON. The experiments validate the effectiveness of GRs on both synthetic and real-world datasets, demonstrating superior performance and scalability compared to traditional Deep Learning Recommendation Models (DLRMs). The work highlights the potential of generative modeling in recommendation systems and the importance of scaling laws in achieving better performance.

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

2024 | Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi