CONDENSATION—Conditional Density Propagation for Visual Tracking
Michael Isard and Andrew Blake
Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK
Received July 16, 1996; Accepted March 3, 1997
Abstract. Tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimodal, cannot represent simultaneous alternative hypotheses. The CONDENSATION algorithm uses "factored sampling", previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set. CONDENSATION uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time.
The paper presents a stochastic framework for tracking curves in visual clutter using a sampling algorithm. The approach is rooted in ideas from statistics, control theory, and computer vision. The problem is to track outlines and features of foreground objects, modelled as curves, as they move in substantial clutter, and to do it at, or close to, video frame-rate. This is challenging because elements in the background clutter may mimic parts of foreground features. The approach aims to dissolve the resulting ambiguity by applying probabilistic models of object shape and motion to analyse the video-stream. The degree of generality of these models is pitched carefully: sufficiently specific for effective disambiguation but sufficiently general to be broadly applicable over entire classes of foreground objects.
Effective methods have arisen in computer vision for modelling shape and motion. When suitable geometric models of a moving object are available, they can be matched effectively to image data, though usually at considerable computational cost. Once an object has been located approximately, tracking it in subsequent images becomes more efficient computationally, especially if motion is modelled as well as shape. One important facility is the modelling of curve segments which interact with images or image sequences. This is more general than modelling entire objects but more clutter-resistant than applying signal-processing to low-level corners or edges. The methods to be discussed here have been applied at this level, to segments of parametric B-spline curves tracking over image sequences.
Prior probability densities can be defined over the curves and their motions, and this constitutes a powerful facility for tracking. Reasonable defaults can be chosen for those densities. However, it is obviously more satisfactory to measure or estimate them from data-sequences. Algorithms to do this, assuming Gaussian densities, are known in the control-theory literature and have been applied in computer vision. Given the learned prior, and an observation density that characterises the statistical variability of image data z given a curve state x, a posteriori distribution can, in principle, be estimated for x_t given z_t at successive times t.
Kalman filters and data-association are discussed.CONDENSATION—Conditional Density Propagation for Visual Tracking
Michael Isard and Andrew Blake
Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK
Received July 16, 1996; Accepted March 3, 1997
Abstract. Tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimodal, cannot represent simultaneous alternative hypotheses. The CONDENSATION algorithm uses "factored sampling", previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set. CONDENSATION uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time.
The paper presents a stochastic framework for tracking curves in visual clutter using a sampling algorithm. The approach is rooted in ideas from statistics, control theory, and computer vision. The problem is to track outlines and features of foreground objects, modelled as curves, as they move in substantial clutter, and to do it at, or close to, video frame-rate. This is challenging because elements in the background clutter may mimic parts of foreground features. The approach aims to dissolve the resulting ambiguity by applying probabilistic models of object shape and motion to analyse the video-stream. The degree of generality of these models is pitched carefully: sufficiently specific for effective disambiguation but sufficiently general to be broadly applicable over entire classes of foreground objects.
Effective methods have arisen in computer vision for modelling shape and motion. When suitable geometric models of a moving object are available, they can be matched effectively to image data, though usually at considerable computational cost. Once an object has been located approximately, tracking it in subsequent images becomes more efficient computationally, especially if motion is modelled as well as shape. One important facility is the modelling of curve segments which interact with images or image sequences. This is more general than modelling entire objects but more clutter-resistant than applying signal-processing to low-level corners or edges. The methods to be discussed here have been applied at this level, to segments of parametric B-spline curves tracking over image sequences.
Prior probability densities can be defined over the curves and their motions, and this constitutes a powerful facility for tracking. Reasonable defaults can be chosen for those densities. However, it is obviously more satisfactory to measure or estimate them from data-sequences. Algorithms to do this, assuming Gaussian densities, are known in the control-theory literature and have been applied in computer vision. Given the learned prior, and an observation density that characterises the statistical variability of image data z given a curve state x, a posteriori distribution can, in principle, be estimated for x_t given z_t at successive times t.
Kalman filters and data-association are discussed.