[slides and audio] Instructional Fingerprinting of Large Language Models

This paper introduces a method called Instructional Fingerprinting (IF) for fingerprinting large language models (LLMs). The goal is to protect intellectual property by ensuring that downstream users and developers comply with the model's license terms. The method involves embedding a confidential private key as an instruction backdoor that causes the LLM to generate specific text when the key is present. The approach is lightweight and does not affect the normal behavior of the model, while allowing the fingerprint to persist through fine-tuning. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to the MIT License. The method is evaluated on 11 popular LLMs and shows that it is effective in verifying ownership even after significant user fine-tuning. The approach satisfies six key criteria for model fingerprinting: harmlessness, effectiveness, persistence, efficiency, reliability, and robustness. The method is compared to model watermarking and shows that it is different in that it focuses on safeguarding the model itself rather than the output. The paper also discusses the limitations of previous methods and highlights the advantages of the proposed approach. The results show that the method is effective in preventing unauthorized use and fine-tuning of models, and supports multi-stage fingerprinting, allowing for the continual fingerprinting of previously fingerprinted models. The paper concludes that the method provides a valuable approach for LLM fingerprinting and facilitates further research in this area.This paper introduces a method called Instructional Fingerprinting (IF) for fingerprinting large language models (LLMs). The goal is to protect intellectual property by ensuring that downstream users and developers comply with the model's license terms. The method involves embedding a confidential private key as an instruction backdoor that causes the LLM to generate specific text when the key is present. The approach is lightweight and does not affect the normal behavior of the model, while allowing the fingerprint to persist through fine-tuning. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to the MIT License. The method is evaluated on 11 popular LLMs and shows that it is effective in verifying ownership even after significant user fine-tuning. The approach satisfies six key criteria for model fingerprinting: harmlessness, effectiveness, persistence, efficiency, reliability, and robustness. The method is compared to model watermarking and shows that it is different in that it focuses on safeguarding the model itself rather than the output. The paper also discusses the limitations of previous methods and highlights the advantages of the proposed approach. The results show that the method is effective in preventing unauthorized use and fine-tuning of models, and supports multi-stage fingerprinting, allowing for the continual fingerprinting of previously fingerprinted models. The paper concludes that the method provides a valuable approach for LLM fingerprinting and facilitates further research in this area.

Instructional Fingerprinting of Large Language Models

3 Apr 2024 | Jiashu Xu, Fei Wang, Mingyu Derek Ma, Pang Wei Koh, Chaowei Xiao, Muhao Chen