June 03–06, 2024, Rio de Janeiro, Brazil | Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli
The paper "Collective Constitutional AI: Aligning a Language Model with Public Input" by Saffron Huang and colleagues introduces a method called Collective Constitutional AI (CCAI) to integrate public input into the training of language models (LMs). CCAI is a multi-stage process that involves identifying a target population, sourcing principles from this population, and fine-tuning an LM to adhere to these principles. The authors demonstrate the practicality of CCAI by creating the first LM fine-tuned with collectively sourced public input and evaluating it against a baseline model trained with established principles from LM developers.
Key contributions of the paper include:
1. **Framework Development**: A framework for fine-tuning LMs to align with public preferences.
2. **Model Training**: Fine-tuning an LM using a Public constitution derived from a representative sample of U.S. adults.
3. **Qualitative and Quantitative Analysis**: Analyzing differences in model outputs between the Standard and Public constitutions.
4. **Evaluation**: Comparing the models on various benchmarks, showing that the Public model exhibits lower bias across nine social dimensions while maintaining equivalent performance on language, math, and helpful-harmless evaluations.
The authors highlight several subjective decision points in the process, such as participant selection, input elicitation, and input transformation, and discuss the limitations of their work, including the need for more diverse and global representative samples, better handling of conflicting principles, and more sophisticated evaluation methods. The paper concludes by emphasizing the potential of CCAI to align LMs with public values and the need for further research to improve the process and evaluation methods.The paper "Collective Constitutional AI: Aligning a Language Model with Public Input" by Saffron Huang and colleagues introduces a method called Collective Constitutional AI (CCAI) to integrate public input into the training of language models (LMs). CCAI is a multi-stage process that involves identifying a target population, sourcing principles from this population, and fine-tuning an LM to adhere to these principles. The authors demonstrate the practicality of CCAI by creating the first LM fine-tuned with collectively sourced public input and evaluating it against a baseline model trained with established principles from LM developers.
Key contributions of the paper include:
1. **Framework Development**: A framework for fine-tuning LMs to align with public preferences.
2. **Model Training**: Fine-tuning an LM using a Public constitution derived from a representative sample of U.S. adults.
3. **Qualitative and Quantitative Analysis**: Analyzing differences in model outputs between the Standard and Public constitutions.
4. **Evaluation**: Comparing the models on various benchmarks, showing that the Public model exhibits lower bias across nine social dimensions while maintaining equivalent performance on language, math, and helpful-harmless evaluations.
The authors highlight several subjective decision points in the process, such as participant selection, input elicitation, and input transformation, and discuss the limitations of their work, including the need for more diverse and global representative samples, better handling of conflicting principles, and more sophisticated evaluation methods. The paper concludes by emphasizing the potential of CCAI to align LMs with public values and the need for further research to improve the process and evaluation methods.