Gradient Surgery for Multi-Task Learning

Gradient Surgery for Multi-Task Learning

22 Dec 2020 | Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
Gradient Surgery for Multi-Task Learning introduces PCGrad, a method to reduce gradient interference between tasks in multi-task learning. The paper identifies three key challenges in multi-task optimization: conflicting gradients, high curvature, and large gradient differences. PCGrad addresses these by modifying gradients through "gradient surgery," projecting conflicting gradients onto the normal plane of other gradients. This approach reduces destructive interference, leading to improved data efficiency and performance in multi-task supervised and reinforcement learning tasks. The method is model-agnostic and can be combined with existing multi-task architectures. Theoretical analysis shows that PCGrad improves convergence in convex settings, while empirical results demonstrate significant gains in multi-task learning, including a 30% improvement in reinforcement learning tasks. The method is applied to various problems, including multi-task CIFAR classification, scene understanding, and goal-conditioned RL. Experiments show that PCGrad outperforms prior methods in data efficiency, optimization speed, and final performance. The paper also highlights the importance of the "tragic triad" of conflicting gradients, high curvature, and large gradient differences in multi-task learning challenges. Overall, PCGrad provides a simple yet effective solution to mitigate gradient interference and enhance multi-task learning performance.Gradient Surgery for Multi-Task Learning introduces PCGrad, a method to reduce gradient interference between tasks in multi-task learning. The paper identifies three key challenges in multi-task optimization: conflicting gradients, high curvature, and large gradient differences. PCGrad addresses these by modifying gradients through "gradient surgery," projecting conflicting gradients onto the normal plane of other gradients. This approach reduces destructive interference, leading to improved data efficiency and performance in multi-task supervised and reinforcement learning tasks. The method is model-agnostic and can be combined with existing multi-task architectures. Theoretical analysis shows that PCGrad improves convergence in convex settings, while empirical results demonstrate significant gains in multi-task learning, including a 30% improvement in reinforcement learning tasks. The method is applied to various problems, including multi-task CIFAR classification, scene understanding, and goal-conditioned RL. Experiments show that PCGrad outperforms prior methods in data efficiency, optimization speed, and final performance. The paper also highlights the importance of the "tragic triad" of conflicting gradients, high curvature, and large gradient differences in multi-task learning challenges. Overall, PCGrad provides a simple yet effective solution to mitigate gradient interference and enhance multi-task learning performance.
Reach us at info@study.space
[slides] Gradient Surgery for Multi-Task Learning | StudySpace