This paper introduces INTERS, a novel instruction-tuning dataset designed to enhance large language models (LLMs) in information retrieval (IR) tasks. The dataset includes 20 tasks across three fundamental IR categories: query understanding, document understanding, and query-document relationship understanding, derived from 43 distinct datasets with manually written templates. The authors evaluate the effectiveness of instruction tuning on various LLMs, including LLaMA, Mistral, and Falcon, showing significant improvements in IR performance. They also conduct extensive experiments to analyze the impact of instruction design, template diversity, few-shot demonstrations, and data volume on model performance. The dataset and fine-tuned models are publicly available at https://github.com/DaoD/INTERS.
The study highlights that instruction tuning significantly enhances LLMs' ability to understand and execute IR tasks. Key findings include the effectiveness of customized templates and task descriptions in improving model performance, the importance of template diversity in enhancing generalizability, and the benefits of combining instruction tuning with few-shot prompting. The results also show that larger models benefit more from instruction tuning, and that the dataset's comprehensive coverage and diversity contribute to improved performance across various tasks.
The paper compares INTERS with existing instruction sets, such as FLAN, and finds that INTERS provides more substantial improvements in search tasks, particularly in query-document relationship understanding. The study also investigates the impact of different ranking strategies and data volumes on model performance, finding that increasing data volume generally improves performance, though the effect varies across tasks. The results demonstrate that instruction tuning is an effective method for enhancing LLMs' performance in IR tasks, with potential applications in future research and development. The authors acknowledge limitations, including the need for further research on larger models and the exploration of alternative architectures for query-document relationship understanding.This paper introduces INTERS, a novel instruction-tuning dataset designed to enhance large language models (LLMs) in information retrieval (IR) tasks. The dataset includes 20 tasks across three fundamental IR categories: query understanding, document understanding, and query-document relationship understanding, derived from 43 distinct datasets with manually written templates. The authors evaluate the effectiveness of instruction tuning on various LLMs, including LLaMA, Mistral, and Falcon, showing significant improvements in IR performance. They also conduct extensive experiments to analyze the impact of instruction design, template diversity, few-shot demonstrations, and data volume on model performance. The dataset and fine-tuned models are publicly available at https://github.com/DaoD/INTERS.
The study highlights that instruction tuning significantly enhances LLMs' ability to understand and execute IR tasks. Key findings include the effectiveness of customized templates and task descriptions in improving model performance, the importance of template diversity in enhancing generalizability, and the benefits of combining instruction tuning with few-shot prompting. The results also show that larger models benefit more from instruction tuning, and that the dataset's comprehensive coverage and diversity contribute to improved performance across various tasks.
The paper compares INTERS with existing instruction sets, such as FLAN, and finds that INTERS provides more substantial improvements in search tasks, particularly in query-document relationship understanding. The study also investigates the impact of different ranking strategies and data volumes on model performance, finding that increasing data volume generally improves performance, though the effect varies across tasks. The results demonstrate that instruction tuning is an effective method for enhancing LLMs' performance in IR tasks, with potential applications in future research and development. The authors acknowledge limitations, including the need for further research on larger models and the exploration of alternative architectures for query-document relationship understanding.