Feature selection, L1 vs. L2 regularization, and rotational invariance

Feature selection, L1 vs. L2 regularization, and rotational invariance

2004 | Andrew Y. Ng
This paper compares $ L_1 $ and $ L_2 $ regularization in logistic regression, focusing on their impact on sample complexity in the presence of many irrelevant features. It shows that $ L_1 $ regularization leads to logarithmic sample complexity in the number of irrelevant features, matching the best known bounds for feature selection. In contrast, any rotationally invariant algorithm, including $ L_2 $-regularized logistic regression, SVMs, and neural networks, has linear sample complexity in the number of irrelevant features. This suggests that $ L_1 $-regularized logistic regression is more effective in high-dimensional settings with many irrelevant features, while rotationally invariant algorithms may struggle when only a few features are relevant. The paper also provides theoretical results and experimental comparisons showing that $ L_1 $ regularization is more robust to irrelevant features than $ L_2 $ regularization.This paper compares $ L_1 $ and $ L_2 $ regularization in logistic regression, focusing on their impact on sample complexity in the presence of many irrelevant features. It shows that $ L_1 $ regularization leads to logarithmic sample complexity in the number of irrelevant features, matching the best known bounds for feature selection. In contrast, any rotationally invariant algorithm, including $ L_2 $-regularized logistic regression, SVMs, and neural networks, has linear sample complexity in the number of irrelevant features. This suggests that $ L_1 $-regularized logistic regression is more effective in high-dimensional settings with many irrelevant features, while rotationally invariant algorithms may struggle when only a few features are relevant. The paper also provides theoretical results and experimental comparisons showing that $ L_1 $ regularization is more robust to irrelevant features than $ L_2 $ regularization.
Reach us at info@study.space
[slides] Feature selection%2C L1 vs. L2 regularization%2C and rotational invariance | StudySpace