The paper "Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge" addresses the issue of harmful content generation by Large Language Models (LLMs) through jailbreaking attacks. Existing defenses often fail to remove harmful knowledge from the model, leading to persistent jailbreaking risks. The proposed Eraser method aims to unlearn harmful knowledge while retaining general knowledge and maintaining safety alignment. Eraser uses gradient ascent to unlearn harmful answers, retains entity understanding through distillation, and maintains the ability to refuse harmful queries. Experimental results show that Eraser significantly reduces the success rate of various jailbreaking attacks without compromising the model's performance on other tasks. The method demonstrates a better balance between harmlessness and usefulness compared to existing defenses. The paper also highlights the importance of maintaining general capabilities and suggests that unlearning random data can achieve good defense effects.The paper "Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge" addresses the issue of harmful content generation by Large Language Models (LLMs) through jailbreaking attacks. Existing defenses often fail to remove harmful knowledge from the model, leading to persistent jailbreaking risks. The proposed Eraser method aims to unlearn harmful knowledge while retaining general knowledge and maintaining safety alignment. Eraser uses gradient ascent to unlearn harmful answers, retains entity understanding through distillation, and maintains the ability to refuse harmful queries. Experimental results show that Eraser significantly reduces the success rate of various jailbreaking attacks without compromising the model's performance on other tasks. The method demonstrates a better balance between harmlessness and usefulness compared to existing defenses. The paper also highlights the importance of maintaining general capabilities and suggests that unlearning random data can achieve good defense effects.