Human-Instruction-Free LLM Self-Alignment with Limited Samples

Human-Instruction-Free LLM Self-Alignment with Limited Samples

January 17, 2024 | Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu
The paper introduces ISARA (Iterative Self-Alignment with Retrieval-Augmented in-context learning), a novel method for aligning large language models (LLMs) with human values using limited samples ( fewer than 100). Unlike existing methods that require large amounts of annotated data and heavy human involvement, ISARA operates iteratively without active human guidance. The key idea is to retrieve high-quality samples related to the target domain and use them as in-context learning (ICL) examples to generate more samples. These self-generated samples are then used to finetune the LLM iteratively, enhancing its alignment capabilities. The method is designed to be scalable and adaptable to various domains, demonstrating superior performance in safety, truthfulness, and instruction-following benchmarks. Experiments show that ISARA can achieve near-zero human supervision, improve harmlessness rates without compromising helpfulness, and exhibit robust domain generalization.The paper introduces ISARA (Iterative Self-Alignment with Retrieval-Augmented in-context learning), a novel method for aligning large language models (LLMs) with human values using limited samples ( fewer than 100). Unlike existing methods that require large amounts of annotated data and heavy human involvement, ISARA operates iteratively without active human guidance. The key idea is to retrieve high-quality samples related to the target domain and use them as in-context learning (ICL) examples to generate more samples. These self-generated samples are then used to finetune the LLM iteratively, enhancing its alignment capabilities. The method is designed to be scalable and adaptable to various domains, demonstrating superior performance in safety, truthfulness, and instruction-following benchmarks. Experiments show that ISARA can achieve near-zero human supervision, improve harmlessness rates without compromising helpfulness, and exhibit robust domain generalization.
Reach us at info@study.space
Understanding Human-Instruction-Free LLM Self-Alignment with Limited Samples