Decomposing and Editing Predictions by Modeling Model Computation

Decomposing and Editing Predictions by Modeling Model Computation

17 Apr 2024 | Harshay Shah, Andrew Ilyas, Aleksander Madry
The paper introduces a task called *component modeling* to understand how machine learning (ML) models transform inputs into predictions by decomposing these predictions into their underlying components. The authors focus on a specific case, *component attribution*, which estimates the impact of individual components on a given prediction. They propose COAR (component attribution via regression), an algorithm for estimating component attributions, which is scalable and effective across various models, datasets, and modalities. COAR enables model editing through targeted interventions, such as fixing model errors, "forgetting" specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. The paper provides a detailed framework, evaluates COAR's effectiveness, and demonstrates its practical utility in model editing tasks.The paper introduces a task called *component modeling* to understand how machine learning (ML) models transform inputs into predictions by decomposing these predictions into their underlying components. The authors focus on a specific case, *component attribution*, which estimates the impact of individual components on a given prediction. They propose COAR (component attribution via regression), an algorithm for estimating component attributions, which is scalable and effective across various models, datasets, and modalities. COAR enables model editing through targeted interventions, such as fixing model errors, "forgetting" specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. The paper provides a detailed framework, evaluates COAR's effectiveness, and demonstrates its practical utility in model editing tasks.
Reach us at info@study.space
[slides and audio] Decomposing and Editing Predictions by Modeling Model Computation