SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

8 Apr 2024 | Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy
The paper "SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety" by Paul Röttger, Fabio Pernisi, Bertie Vidgen, and Dirk Hovy from Bocconi University and the University of Oxford, addresses the growing concern over the safety of large language models (LLMs). The authors conduct a systematic review of 102 open datasets designed to evaluate and improve LLM safety, identifying trends and gaps in dataset creation and usage. Key findings include: 1. **Rapid Growth in Datasets**: The creation of LLM safety datasets is experiencing unprecedented growth, with a significant increase in publications from 2021 to 2023. 2. **Diverse Purposes**: Datasets cover a wide range of safety aspects, including bias, toxicity, ethical alignment, and specific behaviors like sycophancy and power-seeking. 3. **Format and Size**: Datasets have evolved from autocomplete-style to chat-style prompts and conversations, better suited for current LLMs. Dataset sizes vary widely, but there is no clear pattern based on purpose or creation. 4. **Synthetic Data**: There is a trend towards using synthetic data, with some datasets entirely generated by LLMs. 5. **English Dominance**: English is the dominant language in LLM safety datasets, with a lack of non-English datasets. 6. **Model Evaluation**: Most datasets are intended for model evaluation rather than training, with a focus on benchmarking and evaluation. 7. **Licensing and Access**: Most datasets are shared under permissive licenses and available on platforms like GitHub and Hugging Face. 8. **Publication Trends**: Academic and non-profit organizations drive dataset creation, with a concentration in a few research hubs. 9. **Practical Use**: Current evaluation practices in model release publications and popular benchmarks are highly idiosyncratic, with limited diversity in the datasets used. The authors argue that standardization in LLM safety evaluations is needed to improve meaningful model comparisons and incentivize safer LLM development. They also highlight the need for more diverse and comprehensive datasets, particularly in non-English languages. The paper concludes by emphasizing the importance of leveraging recent progress in safety dataset creation to enhance evaluation practices.The paper "SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety" by Paul Röttger, Fabio Pernisi, Bertie Vidgen, and Dirk Hovy from Bocconi University and the University of Oxford, addresses the growing concern over the safety of large language models (LLMs). The authors conduct a systematic review of 102 open datasets designed to evaluate and improve LLM safety, identifying trends and gaps in dataset creation and usage. Key findings include: 1. **Rapid Growth in Datasets**: The creation of LLM safety datasets is experiencing unprecedented growth, with a significant increase in publications from 2021 to 2023. 2. **Diverse Purposes**: Datasets cover a wide range of safety aspects, including bias, toxicity, ethical alignment, and specific behaviors like sycophancy and power-seeking. 3. **Format and Size**: Datasets have evolved from autocomplete-style to chat-style prompts and conversations, better suited for current LLMs. Dataset sizes vary widely, but there is no clear pattern based on purpose or creation. 4. **Synthetic Data**: There is a trend towards using synthetic data, with some datasets entirely generated by LLMs. 5. **English Dominance**: English is the dominant language in LLM safety datasets, with a lack of non-English datasets. 6. **Model Evaluation**: Most datasets are intended for model evaluation rather than training, with a focus on benchmarking and evaluation. 7. **Licensing and Access**: Most datasets are shared under permissive licenses and available on platforms like GitHub and Hugging Face. 8. **Publication Trends**: Academic and non-profit organizations drive dataset creation, with a concentration in a few research hubs. 9. **Practical Use**: Current evaluation practices in model release publications and popular benchmarks are highly idiosyncratic, with limited diversity in the datasets used. The authors argue that standardization in LLM safety evaluations is needed to improve meaningful model comparisons and incentivize safer LLM development. They also highlight the need for more diverse and comprehensive datasets, particularly in non-English languages. The paper concludes by emphasizing the importance of leveraging recent progress in safety dataset creation to enhance evaluation practices.
Reach us at info@study.space
Understanding SafetyPrompts%3A a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety