May 18-22, 2015, Florence, Italy | Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, Qiaozhu Mei
This paper addresses the problem of embedding large information networks into low-dimensional vector spaces, which is crucial for tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods struggle with real-world networks that contain millions of nodes, making them unsuitable for large-scale applications. The authors propose a novel method called "LINE" that is capable of handling arbitrary types of information networks, including undirected, directed, and weighted networks. LINE optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is introduced to improve the effectiveness and efficiency of the inference process, addressing the limitations of classical stochastic gradient descent. Empirical experiments on various real-world networks, such as language networks, social networks, and citation networks, demonstrate the effectiveness and efficiency of LINE. The method can learn embeddings for networks with millions of vertices and billions of edges in a few hours on a single machine. The paper also discusses practical issues and provides a detailed analysis of the performance of LINE with respect to network sparsity.This paper addresses the problem of embedding large information networks into low-dimensional vector spaces, which is crucial for tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods struggle with real-world networks that contain millions of nodes, making them unsuitable for large-scale applications. The authors propose a novel method called "LINE" that is capable of handling arbitrary types of information networks, including undirected, directed, and weighted networks. LINE optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is introduced to improve the effectiveness and efficiency of the inference process, addressing the limitations of classical stochastic gradient descent. Empirical experiments on various real-world networks, such as language networks, social networks, and citation networks, demonstrate the effectiveness and efficiency of LINE. The method can learn embeddings for networks with millions of vertices and billions of edges in a few hours on a single machine. The paper also discusses practical issues and provides a detailed analysis of the performance of LINE with respect to network sparsity.