Collective Constitutional AI: Aligning a Language Model with Public Input

Collective Constitutional AI: Aligning a Language Model with Public Input

June 03-06, 2024 | Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli
Collective Constitutional AI (CCAI) is a multi-stage process for sourcing and integrating public input into language models (LMs), aiming to align LM behavior with public preferences. The approach involves using the Polis platform for online deliberation to gather public input and then fine-tuning an LM to adhere to these principles using Constitutional AI. The study presents the first LM fine-tuned with collectively sourced public input and evaluates it against a baseline model trained with established principles. The CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons suggest that the CCAI-trained model tends to generate responses that reframe contentious topics positively rather than refusing to engage. These results demonstrate a promising pathway toward publicly informed development of language models. The study highlights several subjective decision points necessary for running such a process, including operationalizing the concept of 'a public's preferences for LM behavior'. The research also addresses the challenge of ensuring that the relevant public is explicitly defined to avoid implicit assumptions of universality. The study demonstrates the real-world practicality of the approach by training a model using a 'Public' constitution derived from a representative sample of U.S. adults and evaluating it against a 'Standard' constitution. The results show that the CCAI process can reduce bias and improve alignment with public values. The study also discusses limitations, including the lack of a direct metric for assessing a model's adherence to constitutional principles and the need for further research in this area. The authors provide a GitHub repository with anonymized public input data and a Jupyter notebook used to create the constitution, aiming to facilitate transparency and further research. The study contributes to the field of AI ethics and value alignment by proposing a method for aligning LMs with public input, emphasizing the importance of public participation in AI development. The results suggest that public input can lead to more aligned and less biased models, highlighting the potential of participatory AI in shaping the behavior of language models.Collective Constitutional AI (CCAI) is a multi-stage process for sourcing and integrating public input into language models (LMs), aiming to align LM behavior with public preferences. The approach involves using the Polis platform for online deliberation to gather public input and then fine-tuning an LM to adhere to these principles using Constitutional AI. The study presents the first LM fine-tuned with collectively sourced public input and evaluates it against a baseline model trained with established principles. The CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons suggest that the CCAI-trained model tends to generate responses that reframe contentious topics positively rather than refusing to engage. These results demonstrate a promising pathway toward publicly informed development of language models. The study highlights several subjective decision points necessary for running such a process, including operationalizing the concept of 'a public's preferences for LM behavior'. The research also addresses the challenge of ensuring that the relevant public is explicitly defined to avoid implicit assumptions of universality. The study demonstrates the real-world practicality of the approach by training a model using a 'Public' constitution derived from a representative sample of U.S. adults and evaluating it against a 'Standard' constitution. The results show that the CCAI process can reduce bias and improve alignment with public values. The study also discusses limitations, including the lack of a direct metric for assessing a model's adherence to constitutional principles and the need for further research in this area. The authors provide a GitHub repository with anonymized public input data and a Jupyter notebook used to create the constitution, aiming to facilitate transparency and further research. The study contributes to the field of AI ethics and value alignment by proposing a method for aligning LMs with public input, emphasizing the importance of public participation in AI development. The results suggest that public input can lead to more aligned and less biased models, highlighting the potential of participatory AI in shaping the behavior of language models.
Reach us at info@study.space
Understanding Collective Constitutional AI%3A Aligning a Language Model with Public Input