NP-hardness of Euclidean sum-of-squares clustering

NP-hardness of Euclidean sum-of-squares clustering

24 January 2009 | Daniel Aloise · Amit Deshpande · Pierre Hansen · Preyas Popat
The paper challenges the NP-hardness proof of Euclidean sum-of-squares clustering (MSSC) by Drineas et al. (2004), showing that their proof is invalid. It provides an alternative short proof, and notes that other independent proofs have also shown the problem to be NP-hard. The original proof attempted to reduce the minimum bisection problem to MSSC, but an error in the cost calculation was identified. Correcting this error shows that the reduction is invalid, as the cost does not depend on cluster sizes. However, a valid reduction from the densest cut problem is presented, proving that MSSC is NP-hard for k=2. The densest cut problem is shown to be equivalent to the sparsest cut problem on the complement graph, which is known to be NP-hard. The proof constructs a matrix from a graph and shows that minimizing the MSSC criterion corresponds to maximizing the densest cut. This demonstrates that MSSC is NP-hard for general dimensions and k=2. The paper also references other works that have addressed the complexity of clustering problems.The paper challenges the NP-hardness proof of Euclidean sum-of-squares clustering (MSSC) by Drineas et al. (2004), showing that their proof is invalid. It provides an alternative short proof, and notes that other independent proofs have also shown the problem to be NP-hard. The original proof attempted to reduce the minimum bisection problem to MSSC, but an error in the cost calculation was identified. Correcting this error shows that the reduction is invalid, as the cost does not depend on cluster sizes. However, a valid reduction from the densest cut problem is presented, proving that MSSC is NP-hard for k=2. The densest cut problem is shown to be equivalent to the sparsest cut problem on the complement graph, which is known to be NP-hard. The proof constructs a matrix from a graph and shows that minimizing the MSSC criterion corresponds to maximizing the densest cut. This demonstrates that MSSC is NP-hard for general dimensions and k=2. The paper also references other works that have addressed the complexity of clustering problems.
Reach us at info@study.space