[slides and audio] Wukong%3A Towards a Scaling Law for Large-Scale Recommendation

Wukong: Towards a Scaling Law for Large-Scale Recommendation **Abstract:** Scaling laws are crucial for improving model quality in recommendation systems. However, existing recommendation models lack such scaling laws due to inefficiencies in their upscaling mechanisms. This paper proposes Wukong, an effective network architecture based on stacked factorization machines and a synergistic upscaling strategy. Wukong captures diverse, any-order interactions through taller and wider layers, demonstrating superior performance on six public datasets and an internal, large-scale dataset. Evaluations show that Wukong consistently outperforms state-of-the-art models and maintains its quality across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example where prior models fall short. **Introduction:** Deep learning-based recommendation systems (DLRS) process continuous dense and categorical sparse features. While existing models perform well on smaller datasets, their scalability and adaptability to larger, more complex datasets are limited. This paper aims to establish a scaling law for recommendation models by proposing Wukong, which focuses on dense interaction components to mitigate the drawbacks of sparse scaling. **Design of Wukong:** Wukong's architecture is designed to capture intricate high-order feature interactions and scale gracefully with dataset size, GFLOP/example, and parameter budgets. It consists of an Embedding Layer, an Interaction Stack, and a final Multilayer Perceptron (MLP). The Interaction Stack, inspired by binary exponentiation, uses stacked Factorization Machines (FMs) to capture exponentially higher-order interactions. Each layer in the stack includes a Factorization Machine Block (FMB) and a Linear Compression Block (LCB), ensuring efficient and scalable interaction capture. **Evaluation:** Wukong is evaluated on six public datasets and an internal dataset. Results show that Wukong outperforms state-of-the-art models in terms of AUC, demonstrating its effectiveness across various datasets. On the internal dataset, Wukong maintains its quality superiority and shows continuous enhancements in quality as complexity increases, outperforming baselines by over 0.2% improvement in LogLoss. **Conclusion:** Wukong establishes a scaling law in the domain of recommendation, demonstrating efficient scaling up and down across two orders of magnitude in compute complexity while maintaining competitive performance. This makes Wukong a scalable architecture suitable for a wide range of tasks and datasets.Wukong: Towards a Scaling Law for Large-Scale Recommendation **Abstract:** Scaling laws are crucial for improving model quality in recommendation systems. However, existing recommendation models lack such scaling laws due to inefficiencies in their upscaling mechanisms. This paper proposes Wukong, an effective network architecture based on stacked factorization machines and a synergistic upscaling strategy. Wukong captures diverse, any-order interactions through taller and wider layers, demonstrating superior performance on six public datasets and an internal, large-scale dataset. Evaluations show that Wukong consistently outperforms state-of-the-art models and maintains its quality across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example where prior models fall short. **Introduction:** Deep learning-based recommendation systems (DLRS) process continuous dense and categorical sparse features. While existing models perform well on smaller datasets, their scalability and adaptability to larger, more complex datasets are limited. This paper aims to establish a scaling law for recommendation models by proposing Wukong, which focuses on dense interaction components to mitigate the drawbacks of sparse scaling. **Design of Wukong:** Wukong's architecture is designed to capture intricate high-order feature interactions and scale gracefully with dataset size, GFLOP/example, and parameter budgets. It consists of an Embedding Layer, an Interaction Stack, and a final Multilayer Perceptron (MLP). The Interaction Stack, inspired by binary exponentiation, uses stacked Factorization Machines (FMs) to capture exponentially higher-order interactions. Each layer in the stack includes a Factorization Machine Block (FMB) and a Linear Compression Block (LCB), ensuring efficient and scalable interaction capture. **Evaluation:** Wukong is evaluated on six public datasets and an internal dataset. Results show that Wukong outperforms state-of-the-art models in terms of AUC, demonstrating its effectiveness across various datasets. On the internal dataset, Wukong maintains its quality superiority and shows continuous enhancements in quality as complexity increases, outperforming baselines by over 0.2% improvement in LogLoss. **Conclusion:** Wukong establishes a scaling law in the domain of recommendation, demonstrating efficient scaling up and down across two orders of magnitude in compute complexity while maintaining competitive performance. This makes Wukong a scalable architecture suitable for a wide range of tasks and datasets.

Wukong: Towards a Scaling Law for Large-Scale Recommendation

4 Jun 2024 | Buyun Zhang * 1 Liang Luo * 1 Yuxin Chen * 1 Jade Nie 1 Xi Liu 1 Daifeng Guo 1 Yanli Zhao 1 Shen Li 1 Yuchen Hao 1 Yantao Yao 1 Guna Lakshminarayanan 1 Ellie Dingqiao Wen 1 Jongsoo Park 1 Maxim Naumov 1 Wenlin Chen 1