Suri: Multi-constraint Instruction Following for Long-form Text Generation

Suri: Multi-constraint Instruction Following for Long-form Text Generation

2 Oct 2024 | Chau Minh Pham, Simeng Sun, Mohit Iyyer
This paper introduces Suri, a new dataset for long-form text generation with multi-constraint instruction following. Suri contains 20,000 human-written long-form texts paired with LLM-generated backtranslated instructions that include multiple complex constraints. Due to the difficulty of collecting human preference judgments on long-form texts, the authors propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, the authors perform supervised and I-ORPO fine-tuning on Mistral7b-Instruct-v0.2. The resulting models, Suri-SFT and Suri-I-ORPO, generate significantly longer texts (~5K tokens) than base models without significant quality deterioration. Human evaluation shows that while both SFT and I-ORPO models satisfy most constraints, Suri-I-ORPO generations are generally preferred for their coherent and informative incorporation of the constraints. The Suri dataset is created using instruction backtranslation, involving feeding a human-written long-form text into an LLM to generate instructions that could have been followed to create the text. The dataset includes 20,000 examples, each consisting of a backtranslated instruction, corrupted instruction, and a human-written response. The authors validate the generated instructions through human evaluation and analyze the resulting dataset. The Suri dataset features complex instructions with multiple constraints and lengthy gold responses (2-5K words, about 3-6K tokens). The authors demonstrate the effectiveness of Suri in improving the constraint-following capabilities of LLMs for long-form generation through supervised fine-tuning and I-ORPO. Human and automated evaluations show that the models generate high-quality, long-form responses while effectively satisfying constraints. The paper also discusses related work, ethical considerations, and limitations of the proposed method.This paper introduces Suri, a new dataset for long-form text generation with multi-constraint instruction following. Suri contains 20,000 human-written long-form texts paired with LLM-generated backtranslated instructions that include multiple complex constraints. Due to the difficulty of collecting human preference judgments on long-form texts, the authors propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, the authors perform supervised and I-ORPO fine-tuning on Mistral7b-Instruct-v0.2. The resulting models, Suri-SFT and Suri-I-ORPO, generate significantly longer texts (~5K tokens) than base models without significant quality deterioration. Human evaluation shows that while both SFT and I-ORPO models satisfy most constraints, Suri-I-ORPO generations are generally preferred for their coherent and informative incorporation of the constraints. The Suri dataset is created using instruction backtranslation, involving feeding a human-written long-form text into an LLM to generate instructions that could have been followed to create the text. The dataset includes 20,000 examples, each consisting of a backtranslated instruction, corrupted instruction, and a human-written response. The authors validate the generated instructions through human evaluation and analyze the resulting dataset. The Suri dataset features complex instructions with multiple constraints and lengthy gold responses (2-5K words, about 3-6K tokens). The authors demonstrate the effectiveness of Suri in improving the constraint-following capabilities of LLMs for long-form generation through supervised fine-tuning and I-ORPO. Human and automated evaluations show that the models generate high-quality, long-form responses while effectively satisfying constraints. The paper also discusses related work, ethical considerations, and limitations of the proposed method.
Reach us at info@futurestudyspace.com