Wukong is a novel recommendation system architecture that establishes a scaling law for large-scale recommendation tasks. The paper introduces Wukong, which is based on stacked factorization machines (FMs) and a synergistic upscaling strategy. Unlike traditional recommendation models that rely on sparse scaling, Wukong employs dense scaling to capture higher-order feature interactions efficiently. The architecture allows for the capture of any-order interactions through taller and wider layers, enabling the model to scale effectively in terms of model complexity and computational efficiency.
The paper evaluates Wukong on six public datasets and a large-scale internal dataset, demonstrating that it consistently outperforms state-of-the-art models in terms of AUC and LogLoss. Furthermore, Wukong maintains its superiority across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short. The architecture is designed to scale gracefully with dataset size, GFLOP/example, and parameter budgets, making it suitable for both small and large-scale recommendation tasks.
Wukong's design includes an embedding layer that transforms categorical and dense features into dense embeddings, followed by an interaction stack that captures feature interactions through stacked FMs and linear compression blocks. The interaction stack is inspired by binary exponentiation, allowing each successive layer to capture exponentially higher-order interactions. The model also incorporates residual connections and layer normalization to stabilize training and improve performance.
The paper also discusses the scalability of Wukong, showing that it can be effectively scaled up by increasing the number of layers, embeddings, and other hyperparameters. The results demonstrate that Wukong consistently outperforms baselines across various complexity levels, achieving significant improvements in LogLoss and maintaining a steady trend in model quality. The architecture's ability to capture higher-order interactions and its efficient scaling make it a promising solution for large-scale recommendation systems.Wukong is a novel recommendation system architecture that establishes a scaling law for large-scale recommendation tasks. The paper introduces Wukong, which is based on stacked factorization machines (FMs) and a synergistic upscaling strategy. Unlike traditional recommendation models that rely on sparse scaling, Wukong employs dense scaling to capture higher-order feature interactions efficiently. The architecture allows for the capture of any-order interactions through taller and wider layers, enabling the model to scale effectively in terms of model complexity and computational efficiency.
The paper evaluates Wukong on six public datasets and a large-scale internal dataset, demonstrating that it consistently outperforms state-of-the-art models in terms of AUC and LogLoss. Furthermore, Wukong maintains its superiority across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short. The architecture is designed to scale gracefully with dataset size, GFLOP/example, and parameter budgets, making it suitable for both small and large-scale recommendation tasks.
Wukong's design includes an embedding layer that transforms categorical and dense features into dense embeddings, followed by an interaction stack that captures feature interactions through stacked FMs and linear compression blocks. The interaction stack is inspired by binary exponentiation, allowing each successive layer to capture exponentially higher-order interactions. The model also incorporates residual connections and layer normalization to stabilize training and improve performance.
The paper also discusses the scalability of Wukong, showing that it can be effectively scaled up by increasing the number of layers, embeddings, and other hyperparameters. The results demonstrate that Wukong consistently outperforms baselines across various complexity levels, achieving significant improvements in LogLoss and maintaining a steady trend in model quality. The architecture's ability to capture higher-order interactions and its efficient scaling make it a promising solution for large-scale recommendation systems.