[slides] End-To-End Multi-Task Learning With Attention

The paper introduces a novel multi-task learning architecture called the Multi-Task Attention Network (MTAN), which allows for the automatic learning of both task-shared and task-specific features. MTAN consists of a single shared network that learns a global feature pool, and task-specific attention modules that apply soft attention masks to learn task-specific features. This design enables efficient sharing of informative features across tasks while allowing for task-specific feature extraction. The architecture is trained end-to-end and can be built on any feed-forward neural network, making it simple to implement and parameter-efficient. The authors evaluate MTAN on various datasets, including image-to-image predictions and image classification tasks, demonstrating state-of-the-art performance compared to existing methods and robustness to different weighting schemes in the multi-task loss function. They also propose a novel weighting scheme, Dynamic Weight Average (DWA), which adapts task weights over time based on the rate of change of loss for each task.The paper introduces a novel multi-task learning architecture called the Multi-Task Attention Network (MTAN), which allows for the automatic learning of both task-shared and task-specific features. MTAN consists of a single shared network that learns a global feature pool, and task-specific attention modules that apply soft attention masks to learn task-specific features. This design enables efficient sharing of informative features across tasks while allowing for task-specific feature extraction. The architecture is trained end-to-end and can be built on any feed-forward neural network, making it simple to implement and parameter-efficient. The authors evaluate MTAN on various datasets, including image-to-image predictions and image classification tasks, demonstrating state-of-the-art performance compared to existing methods and robustness to different weighting schemes in the multi-task loss function. They also propose a novel weighting scheme, Dynamic Weight Average (DWA), which adapts task weights over time based on the rate of change of loss for each task.

End-to-End Multi-Task Learning with Attention

5 Apr 2019 | Shikun Liu, Edward Johns, Andrew J. Davison