5 Jun 2024 | Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan
Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning
This paper presents Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for in-context learning (ICL). By treating ICL as a meta-optimization process, the authors explain why LLMs are sensitive to the order of ICL examples. Batch-ICL employs N separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach makes the LLM agnostic to the order of ICL examples. Through extensive experiments, the authors demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples, sometimes even exceeding the performance of the best order for standard ICL while reducing computational resources. A novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization is also developed, which implicitly explores permutations of ICL examples and further enhances ICL performance.
The paper also discusses the efficiency of Batch-ICL, showing that it outperforms standard ICL in terms of computational efficiency. The authors evaluate Batch-ICL on various classification tasks and find that it significantly improves performance across different model sizes and tasks. Additionally, they compare Batch-ICL with other methods such as Parallel Context Windows (PCW) and Fantastically Ordered (F-Ordered), finding that Batch-ICL not only enhances LLMs' ICL performance but also reduces performance variability across different demonstration examples.
The paper also analyzes the effects of the number of examples (N), the aggregation layer index (k), the order of ICL examples, and the number of epochs on the performance of Batch-ICL. The results show that Batch-ICL is robust to N and can more thoroughly exploit LLMs' in-context learning capacity. The authors also find that the performance of Batch-ICL improves with more epochs, with the 7B model showing enhanced performance with an increasing number of epochs, while the 13B model reaches a plateau earlier.
The paper concludes that Batch-ICL is an effective and efficient algorithm for ICL inference, which processes ICL examples in batches, aggregates their meta-gradients, and applies them to a zero-shot forward computation for final predictions. Due to the batch processing, Batch-ICL is agnostic to the order of ICL examples, surpassing the average performance of all order permutations across various tasks and supporting much more examples. The authors also expand Batch-ICL by developing multi-epoch variants that implicitly enumerate permutations of ICL examples, which fosters better interaction between inputs and further improves the method.Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning
This paper presents Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for in-context learning (ICL). By treating ICL as a meta-optimization process, the authors explain why LLMs are sensitive to the order of ICL examples. Batch-ICL employs N separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach makes the LLM agnostic to the order of ICL examples. Through extensive experiments, the authors demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples, sometimes even exceeding the performance of the best order for standard ICL while reducing computational resources. A novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization is also developed, which implicitly explores permutations of ICL examples and further enhances ICL performance.
The paper also discusses the efficiency of Batch-ICL, showing that it outperforms standard ICL in terms of computational efficiency. The authors evaluate Batch-ICL on various classification tasks and find that it significantly improves performance across different model sizes and tasks. Additionally, they compare Batch-ICL with other methods such as Parallel Context Windows (PCW) and Fantastically Ordered (F-Ordered), finding that Batch-ICL not only enhances LLMs' ICL performance but also reduces performance variability across different demonstration examples.
The paper also analyzes the effects of the number of examples (N), the aggregation layer index (k), the order of ICL examples, and the number of epochs on the performance of Batch-ICL. The results show that Batch-ICL is robust to N and can more thoroughly exploit LLMs' in-context learning capacity. The authors also find that the performance of Batch-ICL improves with more epochs, with the 7B model showing enhanced performance with an increasing number of epochs, while the 13B model reaches a plateau earlier.
The paper concludes that Batch-ICL is an effective and efficient algorithm for ICL inference, which processes ICL examples in batches, aggregates their meta-gradients, and applies them to a zero-shot forward computation for final predictions. Due to the batch processing, Batch-ICL is agnostic to the order of ICL examples, surpassing the average performance of all order permutations across various tasks and supporting much more examples. The authors also expand Batch-ICL by developing multi-epoch variants that implicitly enumerate permutations of ICL examples, which fosters better interaction between inputs and further improves the method.