2005 | PEDRO F. FELZENSZWALB, DANIEL P. HUTTENLOCHER
This paper presents a computationally efficient framework for part-based modeling and recognition of objects. The approach is inspired by pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object as a collection of parts arranged in a deformable configuration. Each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance and are suitable for generic recognition problems.
The paper addresses two main problems: finding instances of an object in an image using pictorial structure models and learning an object model from training examples. Efficient algorithms are presented for both tasks. The techniques are demonstrated by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
The main contributions of this paper are three-fold. First, an efficient algorithm is provided for the classical pictorial structure energy minimization problem, for the case where the connections between parts do not form any cycles and are of a particular type. Many objects, including faces, people, and animals, can be represented by such acyclic multi-part models. Second, a method is introduced for learning these models from training examples, which learns all the model parameters, including the structure of connections between parts. Third, techniques are developed for finding multiple good hypotheses for the location of an object in an image rather than just a single best solution. Finding multiple hypotheses is important for tasks where there may be several instances of an object in an image, as well as for cases where imprecision in the model may result in the desired match not being the one with the minimum energy.
The paper also introduces a statistical setting for the pictorial structure framework, addressing the problems of learning models from examples and hypothesizing multiple matches. The framework is illustrated with examples of face and human body detection.This paper presents a computationally efficient framework for part-based modeling and recognition of objects. The approach is inspired by pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object as a collection of parts arranged in a deformable configuration. Each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance and are suitable for generic recognition problems.
The paper addresses two main problems: finding instances of an object in an image using pictorial structure models and learning an object model from training examples. Efficient algorithms are presented for both tasks. The techniques are demonstrated by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
The main contributions of this paper are three-fold. First, an efficient algorithm is provided for the classical pictorial structure energy minimization problem, for the case where the connections between parts do not form any cycles and are of a particular type. Many objects, including faces, people, and animals, can be represented by such acyclic multi-part models. Second, a method is introduced for learning these models from training examples, which learns all the model parameters, including the structure of connections between parts. Third, techniques are developed for finding multiple good hypotheses for the location of an object in an image rather than just a single best solution. Finding multiple hypotheses is important for tasks where there may be several instances of an object in an image, as well as for cases where imprecision in the model may result in the desired match not being the one with the minimum energy.
The paper also introduces a statistical setting for the pictorial structure framework, addressing the problems of learning models from examples and hypothesizing multiple matches. The framework is illustrated with examples of face and human body detection.