L1-regularization path algorithm for generalized linear models

L1-regularization path algorithm for generalized linear models

2007 | Mee Young Park and Trevor Hastie
We introduce a path following algorithm for $ L_1 $-regularized generalized linear models (GLMs). The $ L_1 $-regularization procedure is useful because it selects variables based on the amount of penalization on the $ L_1 $-norm of the coefficients, in a less greedy manner than forward selection-backward deletion. The GLM path algorithm efficiently computes solutions along the entire regularization path using the predictor-corrector method of convex optimization. Selecting the step length of the regularization parameter is critical for controlling the overall accuracy of the paths; we suggest intuitive and flexible strategies for choosing appropriate values. We demonstrate the implementation with several simulated and real data sets. Keywords: Generalized linear model; Lasso; Path algorithm; Predictor–corrector method; Regularization; Variable selection We propose a path following algorithm for $ L_1 $-regularized GLMs. The algorithm uses the predictor-corrector method to determine the entire path of the coefficient estimates as $ \lambda $ varies. Starting from $ \lambda = \lambda_{\max} $, where $ \lambda_{\max} $ is the largest $ \lambda $ that makes $ \hat{\beta}(\lambda) $ non-zero, the algorithm computes a series of solution sets, each time estimating the coefficients with a smaller $ \lambda $ based on the previous estimate. Each round of optimization consists of three steps: determining the step size in $ \lambda $, predicting the corresponding change in the coefficients, and correcting the error in the previous prediction. The algorithm computes the exact solution coefficients at particular values of $ \lambda $ and connects the coefficients in a piecewise linear manner for solutions corresponding to other values of $ \lambda $. The predictor-corrector algorithm is a fundamental strategy for implementing numerical continuation. The algorithm uses the predictor-corrector method to trace the curve $ H(\beta, \lambda) = 0 $ through $ \lambda $ in our problem setting. The algorithm computes the exact coefficients at the values of $ \lambda $ where the set of non-zero coefficients changes. This strategy yields a more accurate path in an efficient way than alternative methods and provides the exact order of the active set changes, which is important information in many applications. We demonstrate the accuracy and efficiency of our strategy in Section 3.2. We describe and support our approach in more detail with examples and justifications. We present the details of the GLM path algorithm in Section 2. In Section 3, our methods are illustrated with simulated and real data sets, including a microarray data set consisting of over 7000 genes. We illustrate an extension of our path following method to the Cox proportional hazards model in Section 4. We conclude with a summary and other possible extensions of our research in Section 5. Proofs for all the lemmas and theorems are provided in Appendix A.We introduce a path following algorithm for $ L_1 $-regularized generalized linear models (GLMs). The $ L_1 $-regularization procedure is useful because it selects variables based on the amount of penalization on the $ L_1 $-norm of the coefficients, in a less greedy manner than forward selection-backward deletion. The GLM path algorithm efficiently computes solutions along the entire regularization path using the predictor-corrector method of convex optimization. Selecting the step length of the regularization parameter is critical for controlling the overall accuracy of the paths; we suggest intuitive and flexible strategies for choosing appropriate values. We demonstrate the implementation with several simulated and real data sets. Keywords: Generalized linear model; Lasso; Path algorithm; Predictor–corrector method; Regularization; Variable selection We propose a path following algorithm for $ L_1 $-regularized GLMs. The algorithm uses the predictor-corrector method to determine the entire path of the coefficient estimates as $ \lambda $ varies. Starting from $ \lambda = \lambda_{\max} $, where $ \lambda_{\max} $ is the largest $ \lambda $ that makes $ \hat{\beta}(\lambda) $ non-zero, the algorithm computes a series of solution sets, each time estimating the coefficients with a smaller $ \lambda $ based on the previous estimate. Each round of optimization consists of three steps: determining the step size in $ \lambda $, predicting the corresponding change in the coefficients, and correcting the error in the previous prediction. The algorithm computes the exact solution coefficients at particular values of $ \lambda $ and connects the coefficients in a piecewise linear manner for solutions corresponding to other values of $ \lambda $. The predictor-corrector algorithm is a fundamental strategy for implementing numerical continuation. The algorithm uses the predictor-corrector method to trace the curve $ H(\beta, \lambda) = 0 $ through $ \lambda $ in our problem setting. The algorithm computes the exact coefficients at the values of $ \lambda $ where the set of non-zero coefficients changes. This strategy yields a more accurate path in an efficient way than alternative methods and provides the exact order of the active set changes, which is important information in many applications. We demonstrate the accuracy and efficiency of our strategy in Section 3.2. We describe and support our approach in more detail with examples and justifications. We present the details of the GLM path algorithm in Section 2. In Section 3, our methods are illustrated with simulated and real data sets, including a microarray data set consisting of over 7000 genes. We illustrate an extension of our path following method to the Cox proportional hazards model in Section 4. We conclude with a summary and other possible extensions of our research in Section 5. Proofs for all the lemmas and theorems are provided in Appendix A.
Reach us at info@study.space