28 May 2024 | Linh Van Ma, Tran Thien Dat Nguyen, Ba-Ngu Vo, Hyunsung Jang, Moongu Jeon
This paper proposes a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras, which automatically initiates/terminates tracks and resolves track appearance-reappearance and occlusions. The approach does not require detector retraining when cameras are reconfigured, only the camera matrices need to be updated. The solution is based on a Bayesian multi-object formulation that integrates track initiation/termination, re-identification, occlusion handling, and data association into a single Bayes filtering recursion. However, the exact filter is numerically intractable due to the exponential growth in terms of the filtering density. To address this, an efficient approximation is developed that incorporates object features and kinematics into the measurement model, improving data association and reducing the number of terms. The approach exploits 2D detections and extracted features from multiple cameras to better approximate the multi-object filtering density, enabling track initiation/termination and re-identification. A tractable geometric occlusion model based on 2D projections of 3D objects on the camera planes is also incorporated to handle occlusions. Evaluation on challenging datasets shows significant improvements and robustness when camera configurations change on-the-fly, compared to existing multi-view MOT solutions. The source code is publicly available at https://github.com/liinh-gist/mv-glmb-ab. The paper introduces a novel multi-object dynamic and measurement model that jointly accounts for object kinematics, shapes, visual features on different cameras, and occlusion. An approximation of the MV-MOT filter is proposed that automatically performs 3D track initialization/termination, re-identification, and occlusion handling using 2D multi-view monocular detection with linear complexity in the number of detections. Extensive experiments on challenging benchmarks including the Curtin multi-camera (CMC) and WILDTRACK (WT) datasets demonstrate significant improvements in tracking accuracy and robustness. The proposed solution processes 2D detections from multiple monocular cameras online to provide trajectories in 3D world frame. The approach leverages advances in 2D object detection and multi-sensor MOT that exploit geometric information from cameras with overlapping fields of view to accurately estimate the shape and position of 3D objects. The proposed multi-view MOT (MV-MOT) algorithm has linear complexity in the number of detections across all cameras and does not require detector retraining when the cameras are reconfigured. Performance evaluations on challenging datasets demonstrate significant improvements in tracking accuracy compared to existing solutions and robustness when camera configurations change on-the-fly. Ablation studies are also presented to illustrate its advantages. The paper is organized as follows: Section 2 discusses related works in 2D/3D object detection and tracking. Section 3 introduces the dynamic and measurement models together with the Bayes recursion that form the 3D visual MV-MOT solution. Section 4 proposes an efficient approximation of the MV-MOT filter that realizes automatic track initiation/re-identification and occlusion resolution. ExtensiveThis paper proposes a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras, which automatically initiates/terminates tracks and resolves track appearance-reappearance and occlusions. The approach does not require detector retraining when cameras are reconfigured, only the camera matrices need to be updated. The solution is based on a Bayesian multi-object formulation that integrates track initiation/termination, re-identification, occlusion handling, and data association into a single Bayes filtering recursion. However, the exact filter is numerically intractable due to the exponential growth in terms of the filtering density. To address this, an efficient approximation is developed that incorporates object features and kinematics into the measurement model, improving data association and reducing the number of terms. The approach exploits 2D detections and extracted features from multiple cameras to better approximate the multi-object filtering density, enabling track initiation/termination and re-identification. A tractable geometric occlusion model based on 2D projections of 3D objects on the camera planes is also incorporated to handle occlusions. Evaluation on challenging datasets shows significant improvements and robustness when camera configurations change on-the-fly, compared to existing multi-view MOT solutions. The source code is publicly available at https://github.com/liinh-gist/mv-glmb-ab. The paper introduces a novel multi-object dynamic and measurement model that jointly accounts for object kinematics, shapes, visual features on different cameras, and occlusion. An approximation of the MV-MOT filter is proposed that automatically performs 3D track initialization/termination, re-identification, and occlusion handling using 2D multi-view monocular detection with linear complexity in the number of detections. Extensive experiments on challenging benchmarks including the Curtin multi-camera (CMC) and WILDTRACK (WT) datasets demonstrate significant improvements in tracking accuracy and robustness. The proposed solution processes 2D detections from multiple monocular cameras online to provide trajectories in 3D world frame. The approach leverages advances in 2D object detection and multi-sensor MOT that exploit geometric information from cameras with overlapping fields of view to accurately estimate the shape and position of 3D objects. The proposed multi-view MOT (MV-MOT) algorithm has linear complexity in the number of detections across all cameras and does not require detector retraining when the cameras are reconfigured. Performance evaluations on challenging datasets demonstrate significant improvements in tracking accuracy compared to existing solutions and robustness when camera configurations change on-the-fly. Ablation studies are also presented to illustrate its advantages. The paper is organized as follows: Section 2 discusses related works in 2D/3D object detection and tracking. Section 3 introduces the dynamic and measurement models together with the Bayes recursion that form the 3D visual MV-MOT solution. Section 4 proposes an efficient approximation of the MV-MOT filter that realizes automatic track initiation/re-identification and occlusion resolution. Extensive