BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

2 Aug 2018 | Changqian Yu*1[0000-0002-4488-4157], Jingbo Wang*2[0000-0001-9700-6262], Chao Peng3[0000-0003-4069-4775], Changxin Gao**1[0000-0003-2736-3920], Gang Yu3[0000-0001-5570-2710], and Nong Sang1[0000-0002-9167-1496]
BiSeNet is a bilateral segmentation network designed for real-time semantic segmentation. It addresses the challenge of balancing spatial information preservation and large receptive field in semantic segmentation. The network consists of two paths: the Spatial Path (SP) and the Context Path (CP). The SP preserves spatial details through a small stride, while the CP provides a large receptive field using a fast downsampling strategy. A Feature Fusion Module (FFM) and Attention Refinement Module (ARM) are introduced to combine and refine features from both paths. BiSeNet achieves a mean IOU of 68.4% on the Cityscapes test dataset with a speed of 105 FPS on a single NVIDIA Titan XP card, outperforming existing methods in both speed and accuracy. The network is evaluated on Cityscapes, CamVid, and COCO-Stuff datasets, demonstrating its effectiveness in real-time semantic segmentation. The SP and CP are designed to complement each other, with the SP preserving spatial details and the CP providing a large receptive field. The FFM and ARM further enhance the accuracy by effectively fusing and refining features. The network's architecture allows for efficient computation and high accuracy, making it suitable for real-time applications.BiSeNet is a bilateral segmentation network designed for real-time semantic segmentation. It addresses the challenge of balancing spatial information preservation and large receptive field in semantic segmentation. The network consists of two paths: the Spatial Path (SP) and the Context Path (CP). The SP preserves spatial details through a small stride, while the CP provides a large receptive field using a fast downsampling strategy. A Feature Fusion Module (FFM) and Attention Refinement Module (ARM) are introduced to combine and refine features from both paths. BiSeNet achieves a mean IOU of 68.4% on the Cityscapes test dataset with a speed of 105 FPS on a single NVIDIA Titan XP card, outperforming existing methods in both speed and accuracy. The network is evaluated on Cityscapes, CamVid, and COCO-Stuff datasets, demonstrating its effectiveness in real-time semantic segmentation. The SP and CP are designed to complement each other, with the SP preserving spatial details and the CP providing a large receptive field. The FFM and ARM further enhance the accuracy by effectively fusing and refining features. The network's architecture allows for efficient computation and high accuracy, making it suitable for real-time applications.
Reach us at info@study.space