19 Nov 2015 | Wei Liu, Andrew Rabinovich, Alexander C. Berg
ParseNet is a fully convolutional neural network (FCNN) designed for semantic segmentation, which adds global context to improve the performance of FCNNs. The key contribution is the use of global context to clarify local confusions, enhancing the accuracy of FCNNs. The approach involves pooling the feature map of a layer over the entire image to create a context vector, which is then appended to the features sent to the next layer. This method significantly improves performance compared to baseline FCNNs, achieving state-of-the-art results on SiftFlow and PASCAL-Context datasets with minimal computational overhead. The paper also discusses the importance of normalization and learning weights when combining features from multiple layers, and demonstrates the effectiveness of early and late fusion methods. ParseNet's simplicity and robustness make it a promising approach for semantic segmentation tasks.ParseNet is a fully convolutional neural network (FCNN) designed for semantic segmentation, which adds global context to improve the performance of FCNNs. The key contribution is the use of global context to clarify local confusions, enhancing the accuracy of FCNNs. The approach involves pooling the feature map of a layer over the entire image to create a context vector, which is then appended to the features sent to the next layer. This method significantly improves performance compared to baseline FCNNs, achieving state-of-the-art results on SiftFlow and PASCAL-Context datasets with minimal computational overhead. The paper also discusses the importance of normalization and learning weights when combining features from multiple layers, and demonstrates the effectiveness of early and late fusion methods. ParseNet's simplicity and robustness make it a promising approach for semantic segmentation tasks.