2013-12-01 | Yi Yang, Member, IEEE, and Deva Ramanan, Member, IEEE
This paper presents a novel method for articulated human detection and pose estimation in static images using flexible mixtures of parts. The approach models articulation by combining small, non-oriented parts rather than using a family of warped templates. The model captures spatial relations between part locations and co-occurrence relations between part mixtures, enhancing standard pictorial structure models. Key properties of the model include efficient computation through dynamic programming, handling of exponentially large sets of global mixtures, and capturing dependencies between local appearance and global geometry. The parameters are learned using a structured SVM solver, and the model is fast enough to search over scales and image locations. The paper introduces new evaluation criteria for pose estimation and human detection, addressing issues with existing metrics like PCP. Experimental results on standard benchmarks show that the proposed method outperforms previous approaches, achieving state-of-the-art performance on the Parse and Buffy datasets while being significantly faster.This paper presents a novel method for articulated human detection and pose estimation in static images using flexible mixtures of parts. The approach models articulation by combining small, non-oriented parts rather than using a family of warped templates. The model captures spatial relations between part locations and co-occurrence relations between part mixtures, enhancing standard pictorial structure models. Key properties of the model include efficient computation through dynamic programming, handling of exponentially large sets of global mixtures, and capturing dependencies between local appearance and global geometry. The parameters are learned using a structured SVM solver, and the model is fast enough to search over scales and image locations. The paper introduces new evaluation criteria for pose estimation and human detection, addressing issues with existing metrics like PCP. Experimental results on standard benchmarks show that the proposed method outperforms previous approaches, achieving state-of-the-art performance on the Parse and Buffy datasets while being significantly faster.