Understanding DeepWalk%3A online learning of social representations

DeepWalk is a novel approach for learning latent representations of vertices in a network. These representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning from sequences of words to graphs. It uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. DeepWalk's latent representations outperform challenging baselines in multi-label network classification tasks, especially in the presence of missing information. DeepWalk's representations can provide up to 10% higher F1 scores than competing methods when labeled data is sparse. It is also scalable, being an online learning algorithm that builds useful incremental results and is trivially parallelizable. These qualities make it suitable for real-world applications such as network classification and anomaly detection. DeepWalk learns social representations of a graph's vertices by modeling a stream of short random walks. Social representations are latent features of the vertices that capture neighborhood similarity and community membership. These latent representations encode social relations in a continuous vector space with a relatively small number of dimensions. DeepWalk generalizes neural language models to process a special language composed of a set of randomly-generated walks. These neural language models have been used to capture the semantic and syntactic structure of human language. DeepWalk takes a graph as input and produces a latent representation as an output. The result of applying our method to the well-studied Karate network is shown in Figure 1. The graph, as typically presented by force-directed layouts, is shown in Figure 1a. Figure 1b shows the output of our method with 2 latent dimensions. Beyond the striking similarity, we note that linearly separable portions of (1b) correspond to clusters found through modularity maximization in the input graph (1a) (shown as vertex colors). DeepWalk outperforms other latent representation methods for creating social dimensions, especially when labeled nodes are scarce. Strong performance with our representations is possible with very simple linear classifiers. Our representations are general, and can be combined with any classification method. DeepWalk achieves all of that while being an online algorithm that is trivially parallelizable. Our contributions are as follows: we introduce deep learning as a tool to analyze graphs, to build robust representations that are suitable for statistical modeling. DeepWalk learns structural regularities present within short random walks. We extensively evaluate our representations on multilabel classification tasks on several social networks. We show significantly increased classification performance in the presence of label sparsity, getting improvements 5%-10% of Micro F1, on the sparsest problems we consider. In some cases, DeepWalk's representations can outperform its competitors even when given 60% less training data. We demonstrate the scalability of our algorithm by building representations of web-scale graphs, using a parallel implementation. Moreover, we describe the minimal changes necessary to build a streaming version of our approach.DeepWalk is a novel approach for learning latent representations of vertices in a network. These representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning from sequences of words to graphs. It uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. DeepWalk's latent representations outperform challenging baselines in multi-label network classification tasks, especially in the presence of missing information. DeepWalk's representations can provide up to 10% higher F1 scores than competing methods when labeled data is sparse. It is also scalable, being an online learning algorithm that builds useful incremental results and is trivially parallelizable. These qualities make it suitable for real-world applications such as network classification and anomaly detection. DeepWalk learns social representations of a graph's vertices by modeling a stream of short random walks. Social representations are latent features of the vertices that capture neighborhood similarity and community membership. These latent representations encode social relations in a continuous vector space with a relatively small number of dimensions. DeepWalk generalizes neural language models to process a special language composed of a set of randomly-generated walks. These neural language models have been used to capture the semantic and syntactic structure of human language. DeepWalk takes a graph as input and produces a latent representation as an output. The result of applying our method to the well-studied Karate network is shown in Figure 1. The graph, as typically presented by force-directed layouts, is shown in Figure 1a. Figure 1b shows the output of our method with 2 latent dimensions. Beyond the striking similarity, we note that linearly separable portions of (1b) correspond to clusters found through modularity maximization in the input graph (1a) (shown as vertex colors). DeepWalk outperforms other latent representation methods for creating social dimensions, especially when labeled nodes are scarce. Strong performance with our representations is possible with very simple linear classifiers. Our representations are general, and can be combined with any classification method. DeepWalk achieves all of that while being an online algorithm that is trivially parallelizable. Our contributions are as follows: we introduce deep learning as a tool to analyze graphs, to build robust representations that are suitable for statistical modeling. DeepWalk learns structural regularities present within short random walks. We extensively evaluate our representations on multilabel classification tasks on several social networks. We show significantly increased classification performance in the presence of label sparsity, getting improvements 5%-10% of Micro F1, on the sparsest problems we consider. In some cases, DeepWalk's representations can outperform its competitors even when given 60% less training data. We demonstrate the scalability of our algorithm by building representations of web-scale graphs, using a parallel implementation. Moreover, we describe the minimal changes necessary to build a streaming version of our approach.

DeepWalk: Online Learning of Social Representations

27 Jun 2014 | Bryan Perozzi, Rami Al-Rfou, Steven Skiena