[slides and audio] A Direct Formulation for Sparse Pca Using Semidefinite Programming

This paper presents a direct formulation for sparse Principal Component Analysis (PCA) using semidefinite programming (SDP). The authors aim to maximize the variance explained by a linear combination of input variables while constraining the number of nonzero coefficients in this combination. This problem is relevant in various applications, such as biology and finance, where the interpretation of principal components is crucial. The method involves a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, constrained by cardinality, and derives a semidefinite relaxation. The complexity of the method is \(O(n^4 \sqrt{\log(n)/\epsilon})\), where \(n\) is the size of the covariance matrix and \(\epsilon\) is the desired accuracy. The paper discusses the extension of the method to non-square matrices and provides a robustness interpretation of the problem. It also introduces Nesterov's smoothing technique to efficiently solve large-scale problems. The effectiveness of the proposed approach is demonstrated through numerical experiments on artificial and real-life datasets, showing that it achieves more sparsity in the principal components while explaining a significant amount of variance. comparisons with other methods, such as PCA, simple thresholding, SCoTLASS, and SPCA, highlight the superior performance of the proposed method in terms of both sparsity and variance explanation.This paper presents a direct formulation for sparse Principal Component Analysis (PCA) using semidefinite programming (SDP). The authors aim to maximize the variance explained by a linear combination of input variables while constraining the number of nonzero coefficients in this combination. This problem is relevant in various applications, such as biology and finance, where the interpretation of principal components is crucial. The method involves a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, constrained by cardinality, and derives a semidefinite relaxation. The complexity of the method is \(O(n^4 \sqrt{\log(n)/\epsilon})\), where \(n\) is the size of the covariance matrix and \(\epsilon\) is the desired accuracy. The paper discusses the extension of the method to non-square matrices and provides a robustness interpretation of the problem. It also introduces Nesterov's smoothing technique to efficiently solve large-scale problems. The effectiveness of the proposed approach is demonstrated through numerical experiments on artificial and real-life datasets, showing that it achieves more sparsity in the principal components while explaining a significant amount of variance. comparisons with other methods, such as PCA, simple thresholding, SCoTLASS, and SPCA, highlight the superior performance of the proposed method in terms of both sparsity and variance explanation.

A DIRECT FORMULATION FOR SPARSE PCA USING SEMIDEFINITE PROGRAMMING

20 May 2006 | ALEXANDRE D'ASPREMONT, LAURENT EL GHAOUF, MICHAEL I. JORDAN, AND GERT R. G. LANCKRIET