This paper proposes PSMNet, a pyramid stereo matching network for depth estimation from stereo images. PSMNet consists of two main modules: spatial pyramid pooling (SPP) and a 3D CNN. The SPP module aggregates context information at different scales and locations to form a cost volume, while the 3D CNN learns to regularize the cost volume using stacked multiple hourglass networks with intermediate supervision. The proposed approach was evaluated on several benchmark datasets, achieving state-of-the-art accuracy on the KITTI dataset. PSMNet outperformed other methods in terms of accuracy and was ranked first in the KITTI 2012 and 2015 leaderboards before March 18, 2018. The network is end-to-end trained without post-processing, and it effectively incorporates global context information to improve disparity estimation in ill-posed regions. The SPP module and 3D CNN are key components of PSMNet, with the SPP module enabling the aggregation of multi-scale features and the 3D CNN facilitating the regularization of the cost volume. The network was implemented using PyTorch and trained on the Scene Flow, KITTI 2012, and KITTI 2015 datasets. The results show that PSMNet achieves high accuracy in disparity estimation, particularly in challenging regions such as occlusions, repeated patterns, and textureless areas. The network's performance is validated through both quantitative and qualitative evaluations on benchmark datasets.This paper proposes PSMNet, a pyramid stereo matching network for depth estimation from stereo images. PSMNet consists of two main modules: spatial pyramid pooling (SPP) and a 3D CNN. The SPP module aggregates context information at different scales and locations to form a cost volume, while the 3D CNN learns to regularize the cost volume using stacked multiple hourglass networks with intermediate supervision. The proposed approach was evaluated on several benchmark datasets, achieving state-of-the-art accuracy on the KITTI dataset. PSMNet outperformed other methods in terms of accuracy and was ranked first in the KITTI 2012 and 2015 leaderboards before March 18, 2018. The network is end-to-end trained without post-processing, and it effectively incorporates global context information to improve disparity estimation in ill-posed regions. The SPP module and 3D CNN are key components of PSMNet, with the SPP module enabling the aggregation of multi-scale features and the 3D CNN facilitating the regularization of the cost volume. The network was implemented using PyTorch and trained on the Scene Flow, KITTI 2012, and KITTI 2015 datasets. The results show that PSMNet achieves high accuracy in disparity estimation, particularly in challenging regions such as occlusions, repeated patterns, and textureless areas. The network's performance is validated through both quantitative and qualitative evaluations on benchmark datasets.