15 Mar 2024 | Matthew Finlayson, Xiang Ren, Swabha Swayamdipta
The paper explores the vulnerability of API-protected large language models (LLMs) to information leakage. By exploiting the softmax bottleneck, which restricts LLM outputs to a low-dimensional subspace, the authors demonstrate that a small number of API queries can reveal significant non-public information about the model. This includes discovering the hidden size of the embedding, obtaining full-vocabulary outputs, detecting and disambiguating model updates, identifying the source LLM, and estimating output layer parameters. The authors provide empirical evidence to support their findings and discuss potential mitigations, suggesting that these capabilities can be viewed as a feature that enhances transparency and accountability. The paper also highlights the implications for LLM providers and users, emphasizing the need for better architectural choices to prevent such vulnerabilities.The paper explores the vulnerability of API-protected large language models (LLMs) to information leakage. By exploiting the softmax bottleneck, which restricts LLM outputs to a low-dimensional subspace, the authors demonstrate that a small number of API queries can reveal significant non-public information about the model. This includes discovering the hidden size of the embedding, obtaining full-vocabulary outputs, detecting and disambiguating model updates, identifying the source LLM, and estimating output layer parameters. The authors provide empirical evidence to support their findings and discuss potential mitigations, suggesting that these capabilities can be viewed as a feature that enhances transparency and accountability. The paper also highlights the implications for LLM providers and users, emphasizing the need for better architectural choices to prevent such vulnerabilities.