2009 June 1; 104(486): 682–693 | Iain M. Johnstone and Arthur Yu Lu
The article discusses the application of Principal Components Analysis (PCA) in high-dimensional settings, where the number of variables \( p \) is comparable to or larger than the number of observations \( n \). The authors argue that initial dimensionality reduction is necessary before applying PCA to improve its performance. They propose a method that works in a basis where the signals have a sparse representation, which can be achieved through feature selection. The paper presents a theoretical model where the consistency of PCA is guaranteed if and only if \( p(n)/n \to 0 \). An algorithm is introduced to select a subset of coordinates with the largest sample variances, and it is shown that PCA on this subset recovers consistency even when \( p(n) \gg n \). The article includes simulations and a real data example to illustrate the effectiveness of the proposed method.The article discusses the application of Principal Components Analysis (PCA) in high-dimensional settings, where the number of variables \( p \) is comparable to or larger than the number of observations \( n \). The authors argue that initial dimensionality reduction is necessary before applying PCA to improve its performance. They propose a method that works in a basis where the signals have a sparse representation, which can be achieved through feature selection. The paper presents a theoretical model where the consistency of PCA is guaranteed if and only if \( p(n)/n \to 0 \). An algorithm is introduced to select a subset of coordinates with the largest sample variances, and it is shown that PCA on this subset recovers consistency even when \( p(n) \gg n \). The article includes simulations and a real data example to illustrate the effectiveness of the proposed method.