16 Jul 2024 | Vincent Cohen-Addad, David Rasmussen Lolck, Marcin Pilipczuk, Mikkel Thorup, Shuyi Yan, Hanwen Zhang
The paper addresses the problem of Correlation Clustering, a fundamental clustering objective with applications in machine learning and data mining. The goal is to partition a graph's vertices into clusters to minimize the number of edges between clusters plus the number of edges within clusters. The problem is APX-hard, and the best known polynomial-time approximation factor is 1.73 by Cohen-Addad et al. [FOCS'23].
The paper presents an efficient combinatorial algorithm that achieves a $\sim 2 - 2/13 < 1.847$-approximation in sub-linear time or space, improving over the state-of-the-art. This algorithm alternates between performing a local search and a flip operation, which aims to make cut edges internal by increasing their cost. The local search is implemented efficiently using random sampling of neighbors, achieving a 2-approximation. The flip operation helps escape bad local minima, leading to a stronger theoretical guarantee.
The paper also discusses the implementation of the algorithm in various models, including sublinear time, streaming, and massively parallel computation (MPC). In the MPC model, the algorithm can be implemented with a constant number of rounds, achieving a $\sim 2 - 1/8 < 1.876$-approximation.
The main contributions include:
1. A novel combinatorial algorithm achieving a $2 - 2/13$-approximation.
2. Efficient implementations in sublinear time, streaming, and MPC models.
3. Analysis of the local search and flip operations, providing strong theoretical guarantees.
The paper concludes with a discussion on related work and further related topics, highlighting the significance of the results in the context of Correlation Clustering.The paper addresses the problem of Correlation Clustering, a fundamental clustering objective with applications in machine learning and data mining. The goal is to partition a graph's vertices into clusters to minimize the number of edges between clusters plus the number of edges within clusters. The problem is APX-hard, and the best known polynomial-time approximation factor is 1.73 by Cohen-Addad et al. [FOCS'23].
The paper presents an efficient combinatorial algorithm that achieves a $\sim 2 - 2/13 < 1.847$-approximation in sub-linear time or space, improving over the state-of-the-art. This algorithm alternates between performing a local search and a flip operation, which aims to make cut edges internal by increasing their cost. The local search is implemented efficiently using random sampling of neighbors, achieving a 2-approximation. The flip operation helps escape bad local minima, leading to a stronger theoretical guarantee.
The paper also discusses the implementation of the algorithm in various models, including sublinear time, streaming, and massively parallel computation (MPC). In the MPC model, the algorithm can be implemented with a constant number of rounds, achieving a $\sim 2 - 1/8 < 1.876$-approximation.
The main contributions include:
1. A novel combinatorial algorithm achieving a $2 - 2/13$-approximation.
2. Efficient implementations in sublinear time, streaming, and MPC models.
3. Analysis of the local search and flip operations, providing strong theoretical guarantees.
The paper concludes with a discussion on related work and further related topics, highlighting the significance of the results in the context of Correlation Clustering.