21 Mar 2024 | Tongfan Guan, Chen Wang, Yun-Hui Liu
This paper proposes a Neural Markov Random Field (NMRF) model for stereo matching, which addresses the limitations of traditional hand-crafted Markov Random Field (MRF) models and deep learning approaches. The NMRF model uses data-driven neural networks to design both potential functions and message passing, enabling more accurate disparity estimation. The model is built on variational inference theory to ensure convergence and retain the graph inductive bias of MRFs. To make inference tractable and efficient, a Disparity Proposal Network (DPN) is introduced to prune the disparity search space. The proposed approach achieves state-of-the-art performance on the KITTI 2012 and 2015 leaderboards, outperforming prior global methods by more than 50% in the D1 metric. The model also demonstrates strong cross-domain generalization and can recover sharp edges. The NMRF model is fully data-driven, with no hand-crafted components, and is efficient, running faster than 100 ms. The model is evaluated on multiple datasets, including SceneFlow, KITTI, Middlebury, and ETH3D, showing excellent performance across different scenarios. The results indicate that the NMRF model is a promising approach for stereo matching, with potential applications in computer vision and robotics.This paper proposes a Neural Markov Random Field (NMRF) model for stereo matching, which addresses the limitations of traditional hand-crafted Markov Random Field (MRF) models and deep learning approaches. The NMRF model uses data-driven neural networks to design both potential functions and message passing, enabling more accurate disparity estimation. The model is built on variational inference theory to ensure convergence and retain the graph inductive bias of MRFs. To make inference tractable and efficient, a Disparity Proposal Network (DPN) is introduced to prune the disparity search space. The proposed approach achieves state-of-the-art performance on the KITTI 2012 and 2015 leaderboards, outperforming prior global methods by more than 50% in the D1 metric. The model also demonstrates strong cross-domain generalization and can recover sharp edges. The NMRF model is fully data-driven, with no hand-crafted components, and is efficient, running faster than 100 ms. The model is evaluated on multiple datasets, including SceneFlow, KITTI, Middlebury, and ETH3D, showing excellent performance across different scenarios. The results indicate that the NMRF model is a promising approach for stereo matching, with potential applications in computer vision and robotics.