This paper explores why larger language models (LLMs) exhibit different in-context learning (ICL) behaviors compared to smaller models. ICL is a key ability of LLMs, allowing them to perform tasks based on brief examples without adjusting model parameters. The study focuses on two settings: linear regression with one-layer single-head linear transformers and parity classification with two-layer multiple attention heads transformers. The analysis reveals that smaller models emphasize important hidden features, making them more robust to noise, while larger models cover more features, leading to increased sensitivity to noise. This difference is quantified through closed-form optimal solutions, which show that smaller models focus on the most significant feature directions, while larger models cover a broader range of features. Empirical results on various LLMs support these theoretical findings, highlighting the robustness of smaller models against noise and the sensitivity of larger models. The study provides insights into the attention mechanisms in ICL and suggests implications for the efficient and safe use of LLMs.This paper explores why larger language models (LLMs) exhibit different in-context learning (ICL) behaviors compared to smaller models. ICL is a key ability of LLMs, allowing them to perform tasks based on brief examples without adjusting model parameters. The study focuses on two settings: linear regression with one-layer single-head linear transformers and parity classification with two-layer multiple attention heads transformers. The analysis reveals that smaller models emphasize important hidden features, making them more robust to noise, while larger models cover more features, leading to increased sensitivity to noise. This difference is quantified through closed-form optimal solutions, which show that smaller models focus on the most significant feature directions, while larger models cover a broader range of features. Empirical results on various LLMs support these theoretical findings, highlighting the robustness of smaller models against noise and the sensitivity of larger models. The study provides insights into the attention mechanisms in ICL and suggests implications for the efficient and safe use of LLMs.