[slides] Modular Pluralism%3A Pluralistic Alignment via Multi-LLM Collaboration

The paper introduces MODULAR PLURALISM, a framework for pluralistic alignment of large language models (LLMs) through multi-LLM collaboration. This framework aims to address the limitations of existing alignment paradigms, which often fail to model diverse preferences across cultures, demographics, and communities. MODULAR PLURALISM integrates a pool of specialized *community LMs* with a base LLM, enabling flexible collaboration in three modes: Overton, steerable, and distributional pluralism. These modes support the representation of diverse values, steering towards specific attributes, and reflecting population-level distributions, respectively. The framework is designed to be compatible with black-box LLMs and allows for modular control by adding new community LMs to address underrepresented communities. Extensive experiments on six tasks and four datasets demonstrate that MODULAR PLURALISM significantly improves pluralistic alignment, covering diverse values, steering towards specific attributes, and reflecting population distributions more accurately. The results also show that the framework can effectively patch underrepresented communities by adding new community LMs. The paper discusses the methodology, experimental settings, and analysis of message faithfulness and cultural community LMs, highlighting the benefits and limitations of the approach.The paper introduces MODULAR PLURALISM, a framework for pluralistic alignment of large language models (LLMs) through multi-LLM collaboration. This framework aims to address the limitations of existing alignment paradigms, which often fail to model diverse preferences across cultures, demographics, and communities. MODULAR PLURALISM integrates a pool of specialized *community LMs* with a base LLM, enabling flexible collaboration in three modes: Overton, steerable, and distributional pluralism. These modes support the representation of diverse values, steering towards specific attributes, and reflecting population-level distributions, respectively. The framework is designed to be compatible with black-box LLMs and allows for modular control by adding new community LMs to address underrepresented communities. Extensive experiments on six tasks and four datasets demonstrate that MODULAR PLURALISM significantly improves pluralistic alignment, covering diverse values, steering towards specific attributes, and reflecting population distributions more accurately. The results also show that the framework can effectively patch underrepresented communities by adding new community LMs. The paper discusses the methodology, experimental settings, and analysis of message faithfulness and cultural community LMs, highlighting the benefits and limitations of the approach.

Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration

22 Jun 2024 | Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, Yulia Tsvetkov