The Art of Saying No: Contextual Noncompliance in Language Models

The Art of Saying No: Contextual Noncompliance in Language Models

2 Jul 2024 | Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
The paper "The Art of Saying No: Contextual Noncompliance in Language Models" by Faeze Brahman et al. introduces a comprehensive taxonomy of contextual noncompliance, expanding the scope beyond safety concerns to include a wide range of scenarios where language models should not comply with user requests. The taxonomy covers categories such as incomplete, unsupported, indeterminate, and humanizing requests. To evaluate these noncompliance capabilities, the authors develop a new evaluation suite with 1000 noncompliance prompts and a contrast set of compliant prompts. They find that existing models, including GPT-4, exhibit high compliance rates in previously understudied categories, with up to 30% of requests being incorrectly complied with. To address these gaps, the authors explore different training strategies, including parameter-efficient methods like low-rank adapters, which help strike a balance between appropriate noncompliance and maintaining general capabilities. The paper also discusses the limitations and ethical considerations of their work, emphasizing the need for further research to improve user experiences and trust in language models.The paper "The Art of Saying No: Contextual Noncompliance in Language Models" by Faeze Brahman et al. introduces a comprehensive taxonomy of contextual noncompliance, expanding the scope beyond safety concerns to include a wide range of scenarios where language models should not comply with user requests. The taxonomy covers categories such as incomplete, unsupported, indeterminate, and humanizing requests. To evaluate these noncompliance capabilities, the authors develop a new evaluation suite with 1000 noncompliance prompts and a contrast set of compliant prompts. They find that existing models, including GPT-4, exhibit high compliance rates in previously understudied categories, with up to 30% of requests being incorrectly complied with. To address these gaps, the authors explore different training strategies, including parameter-efficient methods like low-rank adapters, which help strike a balance between appropriate noncompliance and maintaining general capabilities. The paper also discusses the limitations and ethical considerations of their work, emphasizing the need for further research to improve user experiences and trust in language models.
Reach us at info@study.space
Understanding The Art of Saying No%3A Contextual Noncompliance in Language Models