3 Jul 2024 | Hanxu Hu*, Simon Yu*, Pinzhen Chen*, Edoardo M. Ponti
This paper introduces sequential instruction tuning (SIT), a method to enhance large language models (LLMs) in handling complex tasks that require multiple steps. Existing instruction-tuned models often struggle with multi-step queries, leading to poor performance in tasks like coding, math, and open-ended generation. To address this, the authors propose SIT, which incorporates sequential instructions into the fine-tuning process. They manually create interpretable intermediate tasks for multilingual and visual question answering, such as "translate then predict" and "caption then answer." They also automate this process by transforming existing datasets into complex sequential instructions, making their method generalizable.
The authors demonstrate that models fine-tuned with SIT show improved performance in factuality, reasoning, and open-ended generation. They introduce a new benchmark, SeqEval, to evaluate a model's ability to follow all instructions in a sequence. Results show that SIT models outperform traditional instruction-tuned models in instruction-following behaviors.
The methodology involves creating sequential instructions through manual and automatic processes. Manual methods involve breaking down tasks into simpler steps, while automatic methods use existing datasets to generate diverse sequential instructions. The authors also evaluate the effectiveness of SIT across various benchmarks, including factuality, reasoning, and open-ended generation. They find that SIT significantly improves performance in these areas.
The paper also discusses the generalization of SIT to different models and tasks, showing that it works across a wide range of instruction datasets. They analyze the impact of sequence length on performance and find that SIT models maintain superior performance even when the sequence length is reduced. The authors conclude that SIT enhances LLMs' ability to follow instructions and perform complex reasoning tasks. The research contributes to the development of more capable LLMs that can handle complex, multi-step tasks.This paper introduces sequential instruction tuning (SIT), a method to enhance large language models (LLMs) in handling complex tasks that require multiple steps. Existing instruction-tuned models often struggle with multi-step queries, leading to poor performance in tasks like coding, math, and open-ended generation. To address this, the authors propose SIT, which incorporates sequential instructions into the fine-tuning process. They manually create interpretable intermediate tasks for multilingual and visual question answering, such as "translate then predict" and "caption then answer." They also automate this process by transforming existing datasets into complex sequential instructions, making their method generalizable.
The authors demonstrate that models fine-tuned with SIT show improved performance in factuality, reasoning, and open-ended generation. They introduce a new benchmark, SeqEval, to evaluate a model's ability to follow all instructions in a sequence. Results show that SIT models outperform traditional instruction-tuned models in instruction-following behaviors.
The methodology involves creating sequential instructions through manual and automatic processes. Manual methods involve breaking down tasks into simpler steps, while automatic methods use existing datasets to generate diverse sequential instructions. The authors also evaluate the effectiveness of SIT across various benchmarks, including factuality, reasoning, and open-ended generation. They find that SIT significantly improves performance in these areas.
The paper also discusses the generalization of SIT to different models and tasks, showing that it works across a wide range of instruction datasets. They analyze the impact of sequence length on performance and find that SIT models maintain superior performance even when the sequence length is reduced. The authors conclude that SIT enhances LLMs' ability to follow instructions and perform complex reasoning tasks. The research contributes to the development of more capable LLMs that can handle complex, multi-step tasks.