2006 | ALEXANDRE D'ASPREMONT, LAURENT EL GHAOUI, MICHAEL I. JORDAN, AND GERT R. G. LANCKRIET
This paper presents a direct formulation for sparse principal component analysis (PCA) using semidefinite programming (SDP). The goal is to maximize the variance explained by a linear combination of input variables while constraining the number of nonzero coefficients in this combination. This problem arises in sparse PCA and has applications in various fields such as biology and finance. The authors propose a method that incorporates a sparsity criterion directly into the PCA formulation and then forms a convex relaxation of the problem, which turns out to be an SDP.
The method involves a semidefinite relaxation of the problem, which allows for efficient computation using interior-point methods. For high-dimensional problems, the authors use a smoothing technique combined with an optimal first-order smooth minimization algorithm, which significantly reduces computational time compared to generic interior-point SDP solvers. The complexity of the method is O(n⁴√log(n)/ε), where n is the size of the covariance matrix and ε is the desired accuracy.
The paper also discusses the application of the method to real-world data, including gene expression data, where sparse PCA can lead to more interpretable results by identifying fewer genes responsible for clustering. The results show that the proposed method achieves more sparsity in the principal components while explaining similar variance compared to existing methods. The method is compared with other approaches such as PCA, simple thresholding, SCoTLASS, and SPCA, and it is shown that the proposed method performs better in terms of sparsity while maintaining similar variance explanation. The paper also includes numerical experiments and comparisons, demonstrating the effectiveness of the proposed approach.This paper presents a direct formulation for sparse principal component analysis (PCA) using semidefinite programming (SDP). The goal is to maximize the variance explained by a linear combination of input variables while constraining the number of nonzero coefficients in this combination. This problem arises in sparse PCA and has applications in various fields such as biology and finance. The authors propose a method that incorporates a sparsity criterion directly into the PCA formulation and then forms a convex relaxation of the problem, which turns out to be an SDP.
The method involves a semidefinite relaxation of the problem, which allows for efficient computation using interior-point methods. For high-dimensional problems, the authors use a smoothing technique combined with an optimal first-order smooth minimization algorithm, which significantly reduces computational time compared to generic interior-point SDP solvers. The complexity of the method is O(n⁴√log(n)/ε), where n is the size of the covariance matrix and ε is the desired accuracy.
The paper also discusses the application of the method to real-world data, including gene expression data, where sparse PCA can lead to more interpretable results by identifying fewer genes responsible for clustering. The results show that the proposed method achieves more sparsity in the principal components while explaining similar variance compared to existing methods. The method is compared with other approaches such as PCA, simple thresholding, SCoTLASS, and SPCA, and it is shown that the proposed method performs better in terms of sparsity while maintaining similar variance explanation. The paper also includes numerical experiments and comparisons, demonstrating the effectiveness of the proposed approach.