30 Aug 2004 | Aaron Clauset, M. E. J. Newman, and Cristopher Moore
This paper presents a new algorithm for detecting community structure in very large networks. The algorithm is based on the greedy optimization of modularity, which measures the quality of a community division by comparing the number of edges within communities to those between them. The algorithm runs in time O(md log n), where m is the number of edges, d is the depth of the dendrogram describing the community structure, and n is the number of vertices. For sparse and hierarchical networks, where m ~ n and d ~ log n, the algorithm runs in essentially linear time, O(n log² n). This makes it significantly faster than previous algorithms, allowing for the analysis of extremely large networks.
The algorithm is applied to a network of items for sale on Amazon.com, where items are linked if they are frequently purchased by the same buyer. The network has over 400,000 vertices and 2 million edges. The algorithm successfully identifies meaningful communities, revealing large-scale patterns in customer purchasing habits.
The algorithm works by iteratively merging pairs of communities that produce the largest increase in modularity. It maintains a matrix of ΔQ values, which represent the change in modularity that would result from merging pairs of communities. This matrix is updated efficiently using sparse data structures, allowing for fast computation.
The algorithm's performance is demonstrated on the Amazon co-purchasing network, where it identifies communities corresponding to specific topics or genres of books and music. The community sizes follow a power-law distribution, suggesting that the algorithm can effectively detect community structure in large networks. The algorithm's efficiency and accuracy make it a valuable tool for analyzing large-scale network data.This paper presents a new algorithm for detecting community structure in very large networks. The algorithm is based on the greedy optimization of modularity, which measures the quality of a community division by comparing the number of edges within communities to those between them. The algorithm runs in time O(md log n), where m is the number of edges, d is the depth of the dendrogram describing the community structure, and n is the number of vertices. For sparse and hierarchical networks, where m ~ n and d ~ log n, the algorithm runs in essentially linear time, O(n log² n). This makes it significantly faster than previous algorithms, allowing for the analysis of extremely large networks.
The algorithm is applied to a network of items for sale on Amazon.com, where items are linked if they are frequently purchased by the same buyer. The network has over 400,000 vertices and 2 million edges. The algorithm successfully identifies meaningful communities, revealing large-scale patterns in customer purchasing habits.
The algorithm works by iteratively merging pairs of communities that produce the largest increase in modularity. It maintains a matrix of ΔQ values, which represent the change in modularity that would result from merging pairs of communities. This matrix is updated efficiently using sparse data structures, allowing for fast computation.
The algorithm's performance is demonstrated on the Amazon co-purchasing network, where it identifies communities corresponding to specific topics or genres of books and music. The community sizes follow a power-law distribution, suggesting that the algorithm can effectively detect community structure in large networks. The algorithm's efficiency and accuracy make it a valuable tool for analyzing large-scale network data.