2025 | Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammari
This paper investigates whether in-context learning (ICL) is sufficient for instruction following in large language models (LLMs). The authors evaluate the effectiveness of ICL, specifically the URIAL method, against instruction fine-tuning (IFT) on the MT-Bench benchmark. While URIAL achieves reasonable performance, it still underperforms IFT, especially with more capable base models. The study identifies decoding parameters as crucial for ICL success and shows that adding high-quality demonstrations can improve performance. The authors also compare ICL and IFT in the low-data regime, finding that ICL can be a viable alternative to IFT. They conclude that ICL is not as effective as IFT for multi-turn conversations but can be competitive for single-turn tasks. The study highlights the importance of high-quality instruction data and the role of decoding parameters in ICL. The authors provide insights into the limitations of ICL and suggest that further research is needed to improve its performance. The paper also discusses the potential of ICL for customizing LLMs without fine-tuning.This paper investigates whether in-context learning (ICL) is sufficient for instruction following in large language models (LLMs). The authors evaluate the effectiveness of ICL, specifically the URIAL method, against instruction fine-tuning (IFT) on the MT-Bench benchmark. While URIAL achieves reasonable performance, it still underperforms IFT, especially with more capable base models. The study identifies decoding parameters as crucial for ICL success and shows that adding high-quality demonstrations can improve performance. The authors also compare ICL and IFT in the low-data regime, finding that ICL can be a viable alternative to IFT. They conclude that ICL is not as effective as IFT for multi-turn conversations but can be competitive for single-turn tasks. The study highlights the importance of high-quality instruction data and the role of decoding parameters in ICL. The authors provide insights into the limitations of ICL and suggest that further research is needed to improve its performance. The paper also discusses the potential of ICL for customizing LLMs without fine-tuning.