Understanding A Comprehensive Survey of Graph Embedding%3A Problems%2C Techniques%2C and Applications

A comprehensive survey of graph embedding: problems, techniques, and applications. Hongyun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. Abstract: Graphs are important data representations in various real-world scenarios. Effective graph analytics helps users understand data and supports applications like node classification, recommendation, and link prediction. However, most methods are computationally and space-intensive. Graph embedding is an efficient way to solve these problems by converting graphs into low-dimensional spaces while preserving structural information. This survey reviews graph embedding literature, introduces formal definitions, proposes two taxonomies based on challenges and solutions, and summarizes applications and future research directions in computation efficiency, problem settings, techniques, and application scenarios. Index Terms: Graph embedding, graph analytics, graph embedding survey, network embedding. Introduction: Graphs naturally exist in various real-world scenarios, such as social media, research, and commerce. Analyzing graphs provides insights into hidden information and has received significant attention. Effective graph analytics benefits applications like node classification, clustering, and link prediction. However, existing methods are computationally and space-intensive. Graph embedding provides an efficient solution by converting graphs into low-dimensional spaces. Different graph types (homogeneous, heterogeneous, attribute, etc.) require different embedding approaches. The output is a low-dimensional vector representing parts or the whole graph. Examples include node, edge, substructure, and whole-graph embeddings. In the early 2000s, graph embedding algorithms reduced high-dimensional data by assuming it lay in a low-dimensional manifold. Since 2010, research has focused on embedding graphs with auxiliary information. Some methods represent parts of the graph (nodes, edges, substructures) as vectors, while others embed the whole graph. Graph kernels are used for whole-graph embeddings. Graph embedding is related to graph analytics and representation learning. It aims to represent graphs as low-dimensional vectors while preserving structure. Graph representation learning does not require low-dimensional representations. Graph embedding focuses on learning low-dimensional representations. Challenges include preserving structural information and adapting to different problem settings. This survey proposes two taxonomies of graph embedding based on problem settings and techniques. It categorizes graph embedding inputs into homogeneous, heterogeneous, graphs with auxiliary information, and graphs constructed from non-relational data. Outputs include node, edge, hybrid, and whole-graph embeddings. The survey also suggests four future research directions in computation efficiency, problem settings, techniques, and applications. Problem Formalization: Graphs are represented as (V, E), with nodes and edges. Homogeneous graphs have single node and edge types. Heterogeneous graphs have multiple types. Knowledge graphs are directed graphs with entities and relations. Proximity measures quantify graph properties. First-order proximity is edge weight, second-order is neighborhood similarity. Higher-order proximity is similarity of higher-order neighborhoods. Problem Settings: Graph embedding inputs include homogeneous, heterogeneous, graphs with auxiliary information, and graphs constructed from non-relational data. Challenges include preserving connectivity patterns in homogeneous graphs and globalA comprehensive survey of graph embedding: problems, techniques, and applications. Hongyun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. Abstract: Graphs are important data representations in various real-world scenarios. Effective graph analytics helps users understand data and supports applications like node classification, recommendation, and link prediction. However, most methods are computationally and space-intensive. Graph embedding is an efficient way to solve these problems by converting graphs into low-dimensional spaces while preserving structural information. This survey reviews graph embedding literature, introduces formal definitions, proposes two taxonomies based on challenges and solutions, and summarizes applications and future research directions in computation efficiency, problem settings, techniques, and application scenarios. Index Terms: Graph embedding, graph analytics, graph embedding survey, network embedding. Introduction: Graphs naturally exist in various real-world scenarios, such as social media, research, and commerce. Analyzing graphs provides insights into hidden information and has received significant attention. Effective graph analytics benefits applications like node classification, clustering, and link prediction. However, existing methods are computationally and space-intensive. Graph embedding provides an efficient solution by converting graphs into low-dimensional spaces. Different graph types (homogeneous, heterogeneous, attribute, etc.) require different embedding approaches. The output is a low-dimensional vector representing parts or the whole graph. Examples include node, edge, substructure, and whole-graph embeddings. In the early 2000s, graph embedding algorithms reduced high-dimensional data by assuming it lay in a low-dimensional manifold. Since 2010, research has focused on embedding graphs with auxiliary information. Some methods represent parts of the graph (nodes, edges, substructures) as vectors, while others embed the whole graph. Graph kernels are used for whole-graph embeddings. Graph embedding is related to graph analytics and representation learning. It aims to represent graphs as low-dimensional vectors while preserving structure. Graph representation learning does not require low-dimensional representations. Graph embedding focuses on learning low-dimensional representations. Challenges include preserving structural information and adapting to different problem settings. This survey proposes two taxonomies of graph embedding based on problem settings and techniques. It categorizes graph embedding inputs into homogeneous, heterogeneous, graphs with auxiliary information, and graphs constructed from non-relational data. Outputs include node, edge, hybrid, and whole-graph embeddings. The survey also suggests four future research directions in computation efficiency, problem settings, techniques, and applications. Problem Formalization: Graphs are represented as (V, E), with nodes and edges. Homogeneous graphs have single node and edge types. Heterogeneous graphs have multiple types. Knowledge graphs are directed graphs with entities and relations. Proximity measures quantify graph properties. First-order proximity is edge weight, second-order is neighborhood similarity. Higher-order proximity is similarity of higher-order neighborhoods. Problem Settings: Graph embedding inputs include homogeneous, heterogeneous, graphs with auxiliary information, and graphs constructed from non-relational data. Challenges include preserving connectivity patterns in homogeneous graphs and global

A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications

SEPTEMBER 2017 | Hongyun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang