**MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models**
Multi-agent interactions between Large Language Models (LLMs) have shown significant improvements in reasoning tasks, but these methods are computationally expensive and do not provide a single, efficient model for inference. To address this, the paper introduces MAGDi, a method for structured distillation of reasoning interactions between multiple LLMs into smaller LMs. MAGDi represents multi-agent interactions as graphs, augments a base student model with a graph encoder, and distills knowledge using three objectives: next-token prediction, a contrastive loss between correct and incorrect reasoning, and a graph-based objective to model the interaction structure. Experiments on seven widely-used benchmarks demonstrate that MAGDi enhances the reasoning capabilities of smaller models, outperforming single-teacher and multi-teacher distillation methods. Additionally, MAGDi reduces the number of tokens predicted at test time by up to 9x while maintaining or improving performance. The paper also analyzes the generalizability, scalability, and diversity of MAGDi, showing that it can be used to train a unified joint multi-task learning model and scales positively with the size and strength of the base student model.**MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models**
Multi-agent interactions between Large Language Models (LLMs) have shown significant improvements in reasoning tasks, but these methods are computationally expensive and do not provide a single, efficient model for inference. To address this, the paper introduces MAGDi, a method for structured distillation of reasoning interactions between multiple LLMs into smaller LMs. MAGDi represents multi-agent interactions as graphs, augments a base student model with a graph encoder, and distills knowledge using three objectives: next-token prediction, a contrastive loss between correct and incorrect reasoning, and a graph-based objective to model the interaction structure. Experiments on seven widely-used benchmarks demonstrate that MAGDi enhances the reasoning capabilities of smaller models, outperforming single-teacher and multi-teacher distillation methods. Additionally, MAGDi reduces the number of tokens predicted at test time by up to 9x while maintaining or improving performance. The paper also analyzes the generalizability, scalability, and diversity of MAGDi, showing that it can be used to train a unified joint multi-task learning model and scales positively with the size and strength of the base student model.