May 2024 | Yi Dong, Ronghui Mu, Yanghao Zhang, Siqu Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, Saddek Bensalem, Xiaowei Huang
This survey provides a comprehensive overview of safeguarding mechanisms for Large Language Models (LLMs), focusing on their design, implementation, and challenges. The paper discusses the importance of guardrails in ensuring ethical use of LLMs, addressing issues such as hallucinations, fairness, privacy, robustness, toxicity, legality, out-of-distribution, and uncertainty. It reviews existing guardrail frameworks and techniques used by LLM service providers and the open-source community, including Llama Guard, Nvidia NeMo, Guardrails AI, TruLens, Guidance AI, and LMQL. These frameworks aim to monitor and filter inputs and outputs of LLMs to reduce risks and ensure compliance with ethical and legal standards.
The paper also explores techniques to evaluate, analyze, and enhance guardrails, as well as methods to circumvent these controls and defend against attacks. It highlights the challenges in designing effective guardrails, such as the need for multi-disciplinary approaches, neural-symbolic methods, and systems development lifecycle considerations. The survey emphasizes the importance of systematic processes in the development cycle of guardrails, aligning with industrial standards like ISO-26262 and DO-178B/C.
Key challenges discussed include the complexity of LLMs, the need for precise requirements, and the potential conflicts between different desirable properties. The paper also addresses the importance of fairness, privacy, and robustness in LLMs, discussing methods to mitigate biases, protect user data, and ensure model reliability. It highlights the role of various tools and packages in enhancing the safety, fairness, and compliance of LLMs, such as LangChain, AIF360, ART, Fairlearn, and Detoxify.
The survey concludes with a discussion on the future directions for safeguarding LLMs, emphasizing the need for continuous improvement, interdisciplinary collaboration, and the integration of advanced techniques to ensure the responsible and ethical use of LLMs.This survey provides a comprehensive overview of safeguarding mechanisms for Large Language Models (LLMs), focusing on their design, implementation, and challenges. The paper discusses the importance of guardrails in ensuring ethical use of LLMs, addressing issues such as hallucinations, fairness, privacy, robustness, toxicity, legality, out-of-distribution, and uncertainty. It reviews existing guardrail frameworks and techniques used by LLM service providers and the open-source community, including Llama Guard, Nvidia NeMo, Guardrails AI, TruLens, Guidance AI, and LMQL. These frameworks aim to monitor and filter inputs and outputs of LLMs to reduce risks and ensure compliance with ethical and legal standards.
The paper also explores techniques to evaluate, analyze, and enhance guardrails, as well as methods to circumvent these controls and defend against attacks. It highlights the challenges in designing effective guardrails, such as the need for multi-disciplinary approaches, neural-symbolic methods, and systems development lifecycle considerations. The survey emphasizes the importance of systematic processes in the development cycle of guardrails, aligning with industrial standards like ISO-26262 and DO-178B/C.
Key challenges discussed include the complexity of LLMs, the need for precise requirements, and the potential conflicts between different desirable properties. The paper also addresses the importance of fairness, privacy, and robustness in LLMs, discussing methods to mitigate biases, protect user data, and ensure model reliability. It highlights the role of various tools and packages in enhancing the safety, fairness, and compliance of LLMs, such as LangChain, AIF360, ART, Fairlearn, and Detoxify.
The survey concludes with a discussion on the future directions for safeguarding LLMs, emphasizing the need for continuous improvement, interdisciplinary collaboration, and the integration of advanced techniques to ensure the responsible and ethical use of LLMs.