S³M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving

S³M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving

| Zhiyuan Wu, Yi Feng, Chuang-Wei Liu, Fisher Yu, Qijun Chen, Rui Fan
S³M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving This paper introduces S³M-Net, a novel joint learning framework that simultaneously performs semantic segmentation and stereo matching for autonomous driving. The framework shares features extracted from RGB images between both tasks, enhancing overall scene understanding. A feature fusion adaptation (FFA) module transforms shared features into semantic space and fuses them with encoded disparity features. The entire framework is trained using a semantic consistency-guided (SCG) loss, emphasizing structural consistency in both tasks. Experimental results on the vKITTI2 and KITTI datasets show that S³M-Net outperforms other state-of-the-art single-task networks in both tasks. The framework is trained in a fully supervised manner and can jointly learn semantic segmentation and stereo matching even with limited training data. The main contributions include the joint learning framework, the FFA module, and the SCG loss function. The framework consists of a joint encoder, multi-level GRU update operator, FFA module, densely-connected skip connection decoder, and SCG loss. The joint encoder extracts shared features from RGB images, the GRU operator refines disparity maps, the FFA module aligns channels and resolutions between disparity and semantic feature maps, and the decoder generates semantic predictions. The SCG loss function supervises the joint learning process, emphasizing semantic consistency. The framework is evaluated on the vKITTI2 and KITTI datasets, showing superior performance in both semantic segmentation and stereo matching. The results demonstrate that S³M-Net achieves better performance than other state-of-the-art methods in both tasks. The framework is trained in an end-to-end fashion and can jointly learn semantic segmentation and stereo matching even with limited training data. The framework is also efficient and can be deployed in autonomous vehicles.S³M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving This paper introduces S³M-Net, a novel joint learning framework that simultaneously performs semantic segmentation and stereo matching for autonomous driving. The framework shares features extracted from RGB images between both tasks, enhancing overall scene understanding. A feature fusion adaptation (FFA) module transforms shared features into semantic space and fuses them with encoded disparity features. The entire framework is trained using a semantic consistency-guided (SCG) loss, emphasizing structural consistency in both tasks. Experimental results on the vKITTI2 and KITTI datasets show that S³M-Net outperforms other state-of-the-art single-task networks in both tasks. The framework is trained in a fully supervised manner and can jointly learn semantic segmentation and stereo matching even with limited training data. The main contributions include the joint learning framework, the FFA module, and the SCG loss function. The framework consists of a joint encoder, multi-level GRU update operator, FFA module, densely-connected skip connection decoder, and SCG loss. The joint encoder extracts shared features from RGB images, the GRU operator refines disparity maps, the FFA module aligns channels and resolutions between disparity and semantic feature maps, and the decoder generates semantic predictions. The SCG loss function supervises the joint learning process, emphasizing semantic consistency. The framework is evaluated on the vKITTI2 and KITTI datasets, showing superior performance in both semantic segmentation and stereo matching. The results demonstrate that S³M-Net achieves better performance than other state-of-the-art methods in both tasks. The framework is trained in an end-to-end fashion and can jointly learn semantic segmentation and stereo matching even with limited training data. The framework is also efficient and can be deployed in autonomous vehicles.
Reach us at info@study.space
[slides and audio] S%24%5E%7B3%7D%24M-Net%3A Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving