MULTI-TASK INFERENCE: Can Large Language Models Follow Multiple Instructions at Once?

MULTI-TASK INFERENCE: Can Large Language Models Follow Multiple Instructions at Once?

6 Jun 2024 | Guijin Son, Sangwon Baek, Sangdae Nam, Ilgyun Jeong, Seungone Kim
This paper explores the capability of large language models (LLMs) to handle multiple instructions simultaneously, a concept known as MULTI-TASK INFERENCE. The authors introduce the MTI BENCH, a comprehensive evaluation benchmark consisting of 5,000 instances across 25 tasks, each involving 2 to 3 sub-tasks. The benchmark is divided into two subsets: MULTI-STEP, which evaluates sequential task dependencies, and MULTI-PART, which focuses on handling multiple, independent sub-tasks. The study compares MULTI-TASK INFERENCE with SINGLE-TASK INFERENCE and BATCH PROMPTING, finding that MULTI-TASK INFERENCE reduces inference time by 1.46 times on average and improves performance by up to 12.4% for state-of-the-art models like LLAMA-2-CHAT-70B and GPT-4. The results suggest that MULTI-TASK INFERENCE is particularly beneficial for more powerful models, offering a significant speed-up and improved accuracy. The paper also includes an analysis of the look-ahead effect, where models use information from subsequent tasks to enhance their performance on the first task. The authors conclude that MULTI-TASK INFERENCE is a promising approach for efficient and effective handling of complex instructions.This paper explores the capability of large language models (LLMs) to handle multiple instructions simultaneously, a concept known as MULTI-TASK INFERENCE. The authors introduce the MTI BENCH, a comprehensive evaluation benchmark consisting of 5,000 instances across 25 tasks, each involving 2 to 3 sub-tasks. The benchmark is divided into two subsets: MULTI-STEP, which evaluates sequential task dependencies, and MULTI-PART, which focuses on handling multiple, independent sub-tasks. The study compares MULTI-TASK INFERENCE with SINGLE-TASK INFERENCE and BATCH PROMPTING, finding that MULTI-TASK INFERENCE reduces inference time by 1.46 times on average and improves performance by up to 12.4% for state-of-the-art models like LLAMA-2-CHAT-70B and GPT-4. The results suggest that MULTI-TASK INFERENCE is particularly beneficial for more powerful models, offering a significant speed-up and improved accuracy. The paper also includes an analysis of the look-ahead effect, where models use information from subsequent tasks to enhance their performance on the first task. The authors conclude that MULTI-TASK INFERENCE is a promising approach for efficient and effective handling of complex instructions.
Reach us at info@study.space