3 Jul 2024 | Hanxu Hu*, Simon Yu*, Pinzhen Chen*, Edoardo M. Ponti
The paper addresses the challenge of large language models (LLMs) struggling to respond to queries with multiple instructions, which hinders their performance in complex tasks requiring multiple intermediate steps. To tackle this issue, the authors propose a method called Sequential Instruction Tuning (SIT), which involves creating and fine-tuning LLMs on sequential instruction data. SIT is designed to improve the ability of LLMs to handle complex tasks such as coding, mathematics, and open-ended generation.
The authors first manually create sequential instructions for multilingual and visual question answering tasks, such as "translate then predict" and "caption then answer." They then develop an automated process, Seq-Instruct, to generate diverse and complex sequential instructions from existing datasets like Alpaca and FlanCoT. This automated process involves decomposing, prefixing, suffixing, or holding instructions, creating more natural and varied training data.
To evaluate the effectiveness of SIT, the authors introduce a new benchmark called SeqEval, which assesses a model's ability to follow all instructions in a sequence. The results show that models fine-tuned with SIT datasets perform significantly better in factuality, reasoning, and open-ended generation tasks compared to models fine-tuned with traditional instruction datasets.
The paper also includes ablation studies to confirm the generalizability of SIT to different models and tasks, and qualitative analysis to understand the types of instructions generated by Seq-Instruct. The authors conclude that SIT enhances LLMs' ability to handle complex tasks and improve their instruction-following capabilities.The paper addresses the challenge of large language models (LLMs) struggling to respond to queries with multiple instructions, which hinders their performance in complex tasks requiring multiple intermediate steps. To tackle this issue, the authors propose a method called Sequential Instruction Tuning (SIT), which involves creating and fine-tuning LLMs on sequential instruction data. SIT is designed to improve the ability of LLMs to handle complex tasks such as coding, mathematics, and open-ended generation.
The authors first manually create sequential instructions for multilingual and visual question answering tasks, such as "translate then predict" and "caption then answer." They then develop an automated process, Seq-Instruct, to generate diverse and complex sequential instructions from existing datasets like Alpaca and FlanCoT. This automated process involves decomposing, prefixing, suffixing, or holding instructions, creating more natural and varied training data.
To evaluate the effectiveness of SIT, the authors introduce a new benchmark called SeqEval, which assesses a model's ability to follow all instructions in a sequence. The results show that models fine-tuned with SIT datasets perform significantly better in factuality, reasoning, and open-ended generation tasks compared to models fine-tuned with traditional instruction datasets.
The paper also includes ablation studies to confirm the generalizability of SIT to different models and tasks, and qualitative analysis to understand the types of instructions generated by Seq-Instruct. The authors conclude that SIT enhances LLMs' ability to handle complex tasks and improve their instruction-following capabilities.