8 Jun 2024 | Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen
This paper proposes a novel approach to align large language models (LLMs) with human values through social scene simulation, termed MATRIX. MATRIX is a social scene simulator that emulates realistic multi-party interactions and simulates the social consequences of a user's instruction. By fine-tuning the LLM with MATRIX-simulated data, the model learns to consider social implications before responding, enhancing its alignment with human values without compromising inference speed. The method is theoretically analyzed and experimentally validated, showing superior performance over 10 baselines across 4 benchmarks. Notably, a tuned 13B-size LLM outperforms GPT-4 in aligning with human values, as evidenced by 875 user ratings. The paper also discusses the limitations and ethical considerations of the approach.This paper proposes a novel approach to align large language models (LLMs) with human values through social scene simulation, termed MATRIX. MATRIX is a social scene simulator that emulates realistic multi-party interactions and simulates the social consequences of a user's instruction. By fine-tuning the LLM with MATRIX-simulated data, the model learns to consider social implications before responding, enhancing its alignment with human values without compromising inference speed. The method is theoretically analyzed and experimentally validated, showing superior performance over 10 baselines across 4 benchmarks. Notably, a tuned 13B-size LLM outperforms GPT-4 in aligning with human values, as evidenced by 875 user ratings. The paper also discusses the limitations and ethical considerations of the approach.