Finding Transformer Circuits with Edge Pruning

Finding Transformer Circuits with Edge Pruning

24 Jun 2024 | Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen
The paper introduces Edge Pruning, a method for automated circuit discovery in large language models. Circuit discovery aims to understand the model's behavior by identifying sparse computational subgraphs (circuits) that capture specific aspects of its performance. Traditional methods, such as ACDC and EAP, have limitations in terms of efficiency and accuracy, especially for large models. Edge Pruning frames the problem as an optimization task and leverages gradient-based pruning techniques to prune edges between components rather than neurons or nodes. This approach allows for more precise control over the sparsity of the circuits and improves the faithfulness to the full model's predictions. Key contributions of the paper include: 1. **Edge Pruning**: An effective and scalable method for automated circuit discovery. 2. **Performance and Faithfulness**: Edge Pruning finds circuits that are more faithful to the full model's behavior and perform better on complex tasks compared to previous methods. 3. **Scalability**: The method scales well to large datasets and models, demonstrating efficient use of more examples and achieving high faithfulness. 4. **Ground-Truth Recovery**: Edge Pruning perfectly recovers ground-truth circuits in Tracr-compiled models. 5. **Case Study**: A case study on CodeLlama-13B shows that Edge Pruning can scale to models over 100× larger than previous methods, revealing overlapping mechanisms behind instruction-prompting and few-shot learning. The paper evaluates Edge Pruning on various tasks, including IOI, GT, GP, and Tracer, and compares it to ACDC and EAP. It demonstrates that Edge Pruning finds more faithful and better-performing circuits, scales effectively to large datasets, and recovers ground-truth circuits. The case study on CodeLlama-13B highlights the overlap between circuits for instruction-prompting and few-shot learning, suggesting shared mechanisms in large models.The paper introduces Edge Pruning, a method for automated circuit discovery in large language models. Circuit discovery aims to understand the model's behavior by identifying sparse computational subgraphs (circuits) that capture specific aspects of its performance. Traditional methods, such as ACDC and EAP, have limitations in terms of efficiency and accuracy, especially for large models. Edge Pruning frames the problem as an optimization task and leverages gradient-based pruning techniques to prune edges between components rather than neurons or nodes. This approach allows for more precise control over the sparsity of the circuits and improves the faithfulness to the full model's predictions. Key contributions of the paper include: 1. **Edge Pruning**: An effective and scalable method for automated circuit discovery. 2. **Performance and Faithfulness**: Edge Pruning finds circuits that are more faithful to the full model's behavior and perform better on complex tasks compared to previous methods. 3. **Scalability**: The method scales well to large datasets and models, demonstrating efficient use of more examples and achieving high faithfulness. 4. **Ground-Truth Recovery**: Edge Pruning perfectly recovers ground-truth circuits in Tracr-compiled models. 5. **Case Study**: A case study on CodeLlama-13B shows that Edge Pruning can scale to models over 100× larger than previous methods, revealing overlapping mechanisms behind instruction-prompting and few-shot learning. The paper evaluates Edge Pruning on various tasks, including IOI, GT, GP, and Tracer, and compares it to ACDC and EAP. It demonstrates that Edge Pruning finds more faithful and better-performing circuits, scales effectively to large datasets, and recovers ground-truth circuits. The case study on CodeLlama-13B highlights the overlap between circuits for instruction-prompting and few-shot learning, suggesting shared mechanisms in large models.
Reach us at info@study.space
[slides] Finding Transformer Circuits with Edge Pruning | StudySpace