APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

15 Jul 2024 | Wangyu Wu, Tianhong Dai, Zhenhong Chen, Xiaowei Huang, Fei Ma, Jimin Xiao
APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation This paper proposes a novel ViT-based method for weakly supervised semantic segmentation (WSSS) called Adaptive Patch Contrast (APC). APC addresses the limitations of existing methods by improving patch embedding learning and enhancing segmentation effectiveness. The method introduces an Adaptive-K Pooling (AKP) layer to overcome the limitations of previous max pooling selection methods and proposes a Patch Contrastive Learning (PCL) to enhance patch embeddings. Additionally, APC improves upon the existing multi-stage training framework by transforming it into an end-to-end single-stage training approach, thereby enhancing training efficiency. APC utilizes an AKP layer to select the optimal K value based on the ratio of prediction scores, addressing the issue of single-point dependence. PCL enhances intra-class compactness and inter-class separability of patch embeddings, generating more accurate patch predictions. The proposed APC method achieves significant improvements in computational performance and outperforms other state-of-the-art WSSS methods on the PASCAL VOC 2012 and MS COCO 2014 datasets within a shorter training duration. The APC method consists of three main components: the adaptive K pooling module, the image patch contrastive module, and the end-to-end encoder and decoder module. The adaptive K pooling module selects the final prediction patches by choosing different K values based on the ratio of prediction scores, aiming to address incorrect predictions dominated by a single patch. The image patch contrastive module adjusts the cosine similarity of patch embeddings belonging to the same categories to be closer and that of patch embeddings belonging to different categories to be farther apart. The end-to-end encoder and decoder module utilizes a decoder head to merge multi-level feature maps for prediction. The APC method is evaluated on the PASCAL VOC 2012 and MS COCO 2014 datasets, demonstrating significant improvements in segmentation results. The method achieves a mean Intersection-Over-Union (mIoU) value of 74.6% on the PASCAL VOC 2012 dataset and outperforms other state-of-the-art methods in both single-stage and multi-stage approaches. The APC method also shows significant improvements in computational efficiency, reducing training time from over 8 hours to approximately 1.5 hours. The method is effective and efficient, outperforming other state-of-the-art WSSS methods on the PASCAL VOC 2012 and MS COCO 2014 datasets within a shorter training duration.APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation This paper proposes a novel ViT-based method for weakly supervised semantic segmentation (WSSS) called Adaptive Patch Contrast (APC). APC addresses the limitations of existing methods by improving patch embedding learning and enhancing segmentation effectiveness. The method introduces an Adaptive-K Pooling (AKP) layer to overcome the limitations of previous max pooling selection methods and proposes a Patch Contrastive Learning (PCL) to enhance patch embeddings. Additionally, APC improves upon the existing multi-stage training framework by transforming it into an end-to-end single-stage training approach, thereby enhancing training efficiency. APC utilizes an AKP layer to select the optimal K value based on the ratio of prediction scores, addressing the issue of single-point dependence. PCL enhances intra-class compactness and inter-class separability of patch embeddings, generating more accurate patch predictions. The proposed APC method achieves significant improvements in computational performance and outperforms other state-of-the-art WSSS methods on the PASCAL VOC 2012 and MS COCO 2014 datasets within a shorter training duration. The APC method consists of three main components: the adaptive K pooling module, the image patch contrastive module, and the end-to-end encoder and decoder module. The adaptive K pooling module selects the final prediction patches by choosing different K values based on the ratio of prediction scores, aiming to address incorrect predictions dominated by a single patch. The image patch contrastive module adjusts the cosine similarity of patch embeddings belonging to the same categories to be closer and that of patch embeddings belonging to different categories to be farther apart. The end-to-end encoder and decoder module utilizes a decoder head to merge multi-level feature maps for prediction. The APC method is evaluated on the PASCAL VOC 2012 and MS COCO 2014 datasets, demonstrating significant improvements in segmentation results. The method achieves a mean Intersection-Over-Union (mIoU) value of 74.6% on the PASCAL VOC 2012 dataset and outperforms other state-of-the-art methods in both single-stage and multi-stage approaches. The APC method also shows significant improvements in computational efficiency, reducing training time from over 8 hours to approximately 1.5 hours. The method is effective and efficient, outperforming other state-of-the-art WSSS methods on the PASCAL VOC 2012 and MS COCO 2014 datasets within a shorter training duration.
Reach us at info@study.space