28 May 2024 | Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang
This paper presents an automated jailbreaking method for text-to-image (T2I) generative AI systems, focusing on copyright infringement. The authors evaluate the safety of commercial T2I systems, such as ChatGPT, Copilot, and Gemini, by testing their ability to block prompts that could generate copyrighted content. They find that while ChatGPT blocks 84% of such prompts, Copilot and Gemini block only 12% and 17%, respectively. The authors then propose an automated jailbreaking pipeline (APGP) that generates prompts to bypass the safety mechanisms of T2I systems. This pipeline uses an LLM optimizer to generate prompts that maximize the degree of violation without requiring weight updates or gradient computation. The APGP successfully jailbreaks ChatGPT with a 11.0% block rate, resulting in 76% of generated images being considered copyright violations. The authors also explore defense strategies, such as post-generation filtering and machine unlearning, but find them inadequate. The paper highlights the need for stronger defense mechanisms against copyright infringement in T2I systems. The authors construct a copyright violation dataset (VioT) containing five categories of IP-protected content, including art, characters, logos, products, and architecture. They demonstrate that commercial T2I systems, including Midjourney, Gemini, and Copilot, generate copyrighted content in 89%, 83%, and 88% of cases, respectively, even with naive prompts. The APGP pipeline is designed to generate high-risk prompts for T2I systems by optimizing self-generated QA scores and keyword penalties. The pipeline consists of three steps: 1) searching for seed prompts using vision-language models, 2) revising the prompts to generate high-risk prompts, and 3) post-processing with suffix prompts to suppress keywords and add intentions. The APGP pipeline does not require weight updates or gradient computations, making it fast and computationally efficient. The authors also evaluate the effectiveness of the APGP pipeline on ChatGPT, finding that it results in 11.0% block rate and 76% of generated images being considered copyright violations. The paper concludes that commercial T2I systems currently underestimate the risk of copyright infringement, even with naive prompts, and that the APGP pipeline provides a method to evaluate and expose these risks. The authors also discuss the broader impact of their work, noting that it could enable adversaries to exploit T2I systems and raise concerns about the misuse of their approach. The paper highlights the need for stronger defense mechanisms against copyright infringement in T2I systems.This paper presents an automated jailbreaking method for text-to-image (T2I) generative AI systems, focusing on copyright infringement. The authors evaluate the safety of commercial T2I systems, such as ChatGPT, Copilot, and Gemini, by testing their ability to block prompts that could generate copyrighted content. They find that while ChatGPT blocks 84% of such prompts, Copilot and Gemini block only 12% and 17%, respectively. The authors then propose an automated jailbreaking pipeline (APGP) that generates prompts to bypass the safety mechanisms of T2I systems. This pipeline uses an LLM optimizer to generate prompts that maximize the degree of violation without requiring weight updates or gradient computation. The APGP successfully jailbreaks ChatGPT with a 11.0% block rate, resulting in 76% of generated images being considered copyright violations. The authors also explore defense strategies, such as post-generation filtering and machine unlearning, but find them inadequate. The paper highlights the need for stronger defense mechanisms against copyright infringement in T2I systems. The authors construct a copyright violation dataset (VioT) containing five categories of IP-protected content, including art, characters, logos, products, and architecture. They demonstrate that commercial T2I systems, including Midjourney, Gemini, and Copilot, generate copyrighted content in 89%, 83%, and 88% of cases, respectively, even with naive prompts. The APGP pipeline is designed to generate high-risk prompts for T2I systems by optimizing self-generated QA scores and keyword penalties. The pipeline consists of three steps: 1) searching for seed prompts using vision-language models, 2) revising the prompts to generate high-risk prompts, and 3) post-processing with suffix prompts to suppress keywords and add intentions. The APGP pipeline does not require weight updates or gradient computations, making it fast and computationally efficient. The authors also evaluate the effectiveness of the APGP pipeline on ChatGPT, finding that it results in 11.0% block rate and 76% of generated images being considered copyright violations. The paper concludes that commercial T2I systems currently underestimate the risk of copyright infringement, even with naive prompts, and that the APGP pipeline provides a method to evaluate and expose these risks. The authors also discuss the broader impact of their work, noting that it could enable adversaries to exploit T2I systems and raise concerns about the misuse of their approach. The paper highlights the need for stronger defense mechanisms against copyright infringement in T2I systems.