| Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake
The paper presents a novel method for real-time, accurate prediction of 3D body joint positions from a single depth image, without using temporal information. The approach is based on object recognition, where the body is segmented into localized parts that map the complex pose estimation problem to a simpler per-pixel classification task. A large and varied training dataset, generated from synthetic depth images of humans in diverse poses, is used to train a deep randomized decision forest classifier. The classifier is designed to be computationally efficient and robust, handling self-occlusions and varying body shapes and sizes. The final step involves generating confidence-weighted 3D joint proposals by finding local modes of the inferred per-pixel distributions using mean shift. The system runs at 200 frames per second on consumer hardware and achieves state-of-the-art accuracy on both synthetic and real test sets. The paper also discusses the impact of various training parameters and demonstrates improved generalization over exact nearest-neighbor matching.The paper presents a novel method for real-time, accurate prediction of 3D body joint positions from a single depth image, without using temporal information. The approach is based on object recognition, where the body is segmented into localized parts that map the complex pose estimation problem to a simpler per-pixel classification task. A large and varied training dataset, generated from synthetic depth images of humans in diverse poses, is used to train a deep randomized decision forest classifier. The classifier is designed to be computationally efficient and robust, handling self-occlusions and varying body shapes and sizes. The final step involves generating confidence-weighted 3D joint proposals by finding local modes of the inferred per-pixel distributions using mean shift. The system runs at 200 frames per second on consumer hardware and achieves state-of-the-art accuracy on both synthetic and real test sets. The paper also discusses the impact of various training parameters and demonstrates improved generalization over exact nearest-neighbor matching.