20 Jan 2024 | Tao Chen, Yazhou Yao, Xingguo Huang, Zechao Li, Liqiang Nie and Jinhui Tang
The paper addresses the challenge of weakly supervised semantic segmentation, where image-level labels are used to guide the segmentation process. The authors propose Spatial Structure Constraints (SSC) to mitigate the issue of object over-activation, which occurs when attention expansion techniques expand the activation area beyond the object boundaries. The main contributions of the paper are:
1. **CAM-Driven Reconstruction Module**: This module reconstructs the input image from class activation maps (CAMs) using a perceptual loss, ensuring that the reconstructed image preserves the coarse spatial structure of the original image content. This helps in constraining the activation within the object area.
2. **Activation Self-Modulation Module**: This module refines CAMs by enhancing regional consistency, which helps in maintaining the high activation of discriminative regions while suppressing over-activation.
3. **Training Objective**: The overall training loss includes the perceptual loss and an alignment loss to refine the CAMs.
The proposed approach is evaluated on the PASCAL VOC 2012 and COCO datasets, achieving 72.7% and 47.0% mIoU, respectively, without using external saliency models. The paper also includes ablation studies and discusses the limitations and failure cases of the method.The paper addresses the challenge of weakly supervised semantic segmentation, where image-level labels are used to guide the segmentation process. The authors propose Spatial Structure Constraints (SSC) to mitigate the issue of object over-activation, which occurs when attention expansion techniques expand the activation area beyond the object boundaries. The main contributions of the paper are:
1. **CAM-Driven Reconstruction Module**: This module reconstructs the input image from class activation maps (CAMs) using a perceptual loss, ensuring that the reconstructed image preserves the coarse spatial structure of the original image content. This helps in constraining the activation within the object area.
2. **Activation Self-Modulation Module**: This module refines CAMs by enhancing regional consistency, which helps in maintaining the high activation of discriminative regions while suppressing over-activation.
3. **Training Objective**: The overall training loss includes the perceptual loss and an alignment loss to refine the CAMs.
The proposed approach is evaluated on the PASCAL VOC 2012 and COCO datasets, achieving 72.7% and 47.0% mIoU, respectively, without using external saliency models. The paper also includes ablation studies and discusses the limitations and failure cases of the method.