Combinatorial Correlation Clustering

Combinatorial Correlation Clustering

16 Jul 2024 | Vincent Cohen-Addad, David Rasmussen Lolk, Marcin Pilipczuk, Mikkel Thorup, Shuyi Yan, Hanwen Zhang
Correlation Clustering is a fundamental clustering objective used in machine learning and data mining. Given a graph G = (V, E), the goal is to partition the vertex set into clusters to minimize the number of edges between clusters and the number of edges missing within clusters. The problem is APX-hard, and the best known polynomial-time approximation factor is 1.73. However, there has been interest in more efficient sequential and parallel algorithms. The classic combinatorial pivot algorithm provides a 3-approximation in linear time, while recent work has achieved better approximations in the MPC model and sequential setting. This paper presents an efficient combinatorial algorithm that achieves a 2 - 2/13 ≈ 1.847-approximation in sublinear time or space, and a 2 - 1/8 ≈ 1.876-approximation in the MPC model. The algorithm uses local search with flips to escape local minima and improve the approximation factor. The paper also discusses techniques for implementing the algorithm in various models, including sublinear time, streaming, and MPC. The results show that it is possible to achieve better than 3-approximation in near-linear time, and the algorithm provides strong theoretical guarantees.Correlation Clustering is a fundamental clustering objective used in machine learning and data mining. Given a graph G = (V, E), the goal is to partition the vertex set into clusters to minimize the number of edges between clusters and the number of edges missing within clusters. The problem is APX-hard, and the best known polynomial-time approximation factor is 1.73. However, there has been interest in more efficient sequential and parallel algorithms. The classic combinatorial pivot algorithm provides a 3-approximation in linear time, while recent work has achieved better approximations in the MPC model and sequential setting. This paper presents an efficient combinatorial algorithm that achieves a 2 - 2/13 ≈ 1.847-approximation in sublinear time or space, and a 2 - 1/8 ≈ 1.876-approximation in the MPC model. The algorithm uses local search with flips to escape local minima and improve the approximation factor. The paper also discusses techniques for implementing the algorithm in various models, including sublinear time, streaming, and MPC. The results show that it is possible to achieve better than 3-approximation in near-linear time, and the algorithm provides strong theoretical guarantees.
Reach us at info@study.space