This paper introduces cross-stitch units for multi-task learning in Convolutional Neural Networks (ConvNets). The cross-stitch unit is a novel sharing mechanism that combines activations from multiple networks to learn optimal shared and task-specific representations. The unit allows for end-to-end training and can adaptively balance shared and task-specific representations based on the tasks at hand. The method is shown to generalize across multiple tasks and significantly improve performance, especially for categories with few training examples.
The paper discusses the challenges of multi-task learning in ConvNets, where existing approaches often require designing multiple network architectures for different tasks, which can be inefficient and not generalize well. The cross-stitch unit addresses this by providing a principled way to combine representations from different tasks, enabling a single network to learn the optimal combination of shared and task-specific features.
The paper presents experiments on various tasks, including semantic segmentation and surface normal prediction on the NYU-v2 dataset, and object detection and attribute prediction on the PASCAL VOC 2008 dataset. Results show that cross-stitch networks outperform baseline methods, particularly for data-starved categories. The method is also shown to be effective in improving performance for tasks with limited data.
The cross-stitch unit is implemented by combining activation maps from different tasks using linear combinations. This allows the network to learn shared representations while still maintaining task-specific features. The unit is trained end-to-end, and its parameters are adjusted to balance the contributions of shared and task-specific features.
The paper also discusses design decisions for cross-stitching, including initialization of parameters, learning rates, and network initialization. Ablative studies are conducted to evaluate the effectiveness of different configurations. The results demonstrate that cross-stitch units provide significant improvements in performance across various tasks and data scenarios. The method is shown to be flexible and effective in a wide range of multi-task learning scenarios.This paper introduces cross-stitch units for multi-task learning in Convolutional Neural Networks (ConvNets). The cross-stitch unit is a novel sharing mechanism that combines activations from multiple networks to learn optimal shared and task-specific representations. The unit allows for end-to-end training and can adaptively balance shared and task-specific representations based on the tasks at hand. The method is shown to generalize across multiple tasks and significantly improve performance, especially for categories with few training examples.
The paper discusses the challenges of multi-task learning in ConvNets, where existing approaches often require designing multiple network architectures for different tasks, which can be inefficient and not generalize well. The cross-stitch unit addresses this by providing a principled way to combine representations from different tasks, enabling a single network to learn the optimal combination of shared and task-specific features.
The paper presents experiments on various tasks, including semantic segmentation and surface normal prediction on the NYU-v2 dataset, and object detection and attribute prediction on the PASCAL VOC 2008 dataset. Results show that cross-stitch networks outperform baseline methods, particularly for data-starved categories. The method is also shown to be effective in improving performance for tasks with limited data.
The cross-stitch unit is implemented by combining activation maps from different tasks using linear combinations. This allows the network to learn shared representations while still maintaining task-specific features. The unit is trained end-to-end, and its parameters are adjusted to balance the contributions of shared and task-specific features.
The paper also discusses design decisions for cross-stitching, including initialization of parameters, learning rates, and network initialization. Ablative studies are conducted to evaluate the effectiveness of different configurations. The results demonstrate that cross-stitch units provide significant improvements in performance across various tasks and data scenarios. The method is shown to be flexible and effective in a wide range of multi-task learning scenarios.