August 14, 2017 | RuoXi Wang, Bin Fu, Gang Fu, Mingliang Wang
The Deep & Cross Network (DCN) is a neural network model designed for ad click-through rate (CTR) prediction. It combines the strengths of deep neural networks (DNNs) with a novel cross network that efficiently learns bounded-degree feature interactions. DCN explicitly applies feature crossing at each layer, eliminating the need for manual feature engineering and adding negligible complexity to the DNN model. The cross network consists of multiple layers, where each layer produces higher-order interactions based on existing ones, and keeps the interactions from previous layers. DCN is trained jointly with a DNN, allowing it to efficiently capture predictive feature interactions and deliver state-of-the-art performance on the Criteo CTR dataset and dense classification dataset.
DCN is particularly effective in handling sparse and dense inputs, and it outperforms other models in terms of model accuracy and memory usage. The cross network is memory efficient and easy to implement, and it allows for the efficient learning of bounded-degree feature interactions. The cross network is composed of cross layers, where each layer has the following formula: x_{l+1} = x_0 x_l^T w_l + b_l + x_l = f(x_l, w_l, b_l) + x_l. The cross network's special structure causes the degree of cross features to grow with layer depth, allowing it to capture all cross terms of degree from 1 to l + 1.
The deep network is a fully-connected feed-forward neural network, with each deep layer having the following formula: h_{l+1} = f(W_l h_l + b_l). The cross network and deep network are combined in a final combination layer that concatenates their outputs and feeds the concatenated vector into a standard logits layer. The loss function is the log loss along with a regularization term, and the model is trained jointly to allow each individual network to be aware of the others during the training.
The cross network is analyzed in terms of polynomial approximation, generalization to FMs, and efficient projection. The cross network is able to approximate the polynomial class of the same degree in an efficient, expressive, and generalizable way. It extends the idea of parameter sharing from a single layer to multiple layers and high-degree cross-terms, and it is memory efficient, with the number of parameters growing linearly with the input dimension. The cross network is also efficient in projection, as it projects all pairwise interactions between x_0 and x_l back to the input's dimension in an efficient manner.
The experimental results show that DCN outperforms other models in terms of model accuracy and memory usage. It achieves lower logloss than a DNN with nearly an order of magnitude fewer number of parameters. DCN also performs well on non-CTR prediction problems, such as the forest covertype and Higgs datasets. The cross network is able to learn effective feature interactions more efficiently than a universal DNN, and it is able to capture some types ofThe Deep & Cross Network (DCN) is a neural network model designed for ad click-through rate (CTR) prediction. It combines the strengths of deep neural networks (DNNs) with a novel cross network that efficiently learns bounded-degree feature interactions. DCN explicitly applies feature crossing at each layer, eliminating the need for manual feature engineering and adding negligible complexity to the DNN model. The cross network consists of multiple layers, where each layer produces higher-order interactions based on existing ones, and keeps the interactions from previous layers. DCN is trained jointly with a DNN, allowing it to efficiently capture predictive feature interactions and deliver state-of-the-art performance on the Criteo CTR dataset and dense classification dataset.
DCN is particularly effective in handling sparse and dense inputs, and it outperforms other models in terms of model accuracy and memory usage. The cross network is memory efficient and easy to implement, and it allows for the efficient learning of bounded-degree feature interactions. The cross network is composed of cross layers, where each layer has the following formula: x_{l+1} = x_0 x_l^T w_l + b_l + x_l = f(x_l, w_l, b_l) + x_l. The cross network's special structure causes the degree of cross features to grow with layer depth, allowing it to capture all cross terms of degree from 1 to l + 1.
The deep network is a fully-connected feed-forward neural network, with each deep layer having the following formula: h_{l+1} = f(W_l h_l + b_l). The cross network and deep network are combined in a final combination layer that concatenates their outputs and feeds the concatenated vector into a standard logits layer. The loss function is the log loss along with a regularization term, and the model is trained jointly to allow each individual network to be aware of the others during the training.
The cross network is analyzed in terms of polynomial approximation, generalization to FMs, and efficient projection. The cross network is able to approximate the polynomial class of the same degree in an efficient, expressive, and generalizable way. It extends the idea of parameter sharing from a single layer to multiple layers and high-degree cross-terms, and it is memory efficient, with the number of parameters growing linearly with the input dimension. The cross network is also efficient in projection, as it projects all pairwise interactions between x_0 and x_l back to the input's dimension in an efficient manner.
The experimental results show that DCN outperforms other models in terms of model accuracy and memory usage. It achieves lower logloss than a DNN with nearly an order of magnitude fewer number of parameters. DCN also performs well on non-CTR prediction problems, such as the forest covertype and Higgs datasets. The cross network is able to learn effective feature interactions more efficiently than a universal DNN, and it is able to capture some types of