May 8, 2010 | Ce Liu, Jenny Yuen, Antonio Torralba, and William T. Freeman
SIFT Flow: Dense Correspondence across Scenes and its Applications
Ce Liu, Jenny Yuen, Antonio Torralba, and William T. Freeman
Abstract—While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images, while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration and face recognition.
Index Terms—Scene alignment, dense scene correspondence, SIFT flow, coarse-to-fine, belief propagation, alignment-based large database framework, satellite image registration, face recognition, motion prediction for a single image, motion synthesis via object transfer
## 1 INTRODUCTION
Image alignment, registration and correspondence are central topics in computer vision. There are several levels of scenarios in which image alignment dwells. The simplest level, aligning different views of the same scene, has been studied for the purpose of image stitching $ [51] $ and stereo matching $ [45] $ , e.g. in Figure 1 (a). The considered transformations are relatively simple (e.g. parametric motion for image stitching and 1D disparity for stereo), and images to register are typically assumed to have the same pixel value after applying the geometric transformation.
The image alignment problem becomes more complicated for dynamic scenes in video sequences, e.g. optical flow estimation $ [12] $ , $ [29] $ , $ [38] $ . The correspondence between two adjacent frames in a video is often formulated as an estimation of a 2D flow field. The extra degree of freedom transitioning from 1D in stereo to 2D in optical flow introduces an additional level of complexity. Typical assumptions in optical flow algorithms include brightness constancy and piecewise smoothness of the pixel displacement field $ [3] $ , $ [8] $ .
Image alignment becomes even more difficult in the object recognition scenario, where the goal is to align different instances of the same object category, as illustrated in Figure 1(b). Sophisticated object representations $ [SIFT Flow: Dense Correspondence across Scenes and its Applications
Ce Liu, Jenny Yuen, Antonio Torralba, and William T. Freeman
Abstract—While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images, while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration and face recognition.
Index Terms—Scene alignment, dense scene correspondence, SIFT flow, coarse-to-fine, belief propagation, alignment-based large database framework, satellite image registration, face recognition, motion prediction for a single image, motion synthesis via object transfer
## 1 INTRODUCTION
Image alignment, registration and correspondence are central topics in computer vision. There are several levels of scenarios in which image alignment dwells. The simplest level, aligning different views of the same scene, has been studied for the purpose of image stitching $ [51] $ and stereo matching $ [45] $ , e.g. in Figure 1 (a). The considered transformations are relatively simple (e.g. parametric motion for image stitching and 1D disparity for stereo), and images to register are typically assumed to have the same pixel value after applying the geometric transformation.
The image alignment problem becomes more complicated for dynamic scenes in video sequences, e.g. optical flow estimation $ [12] $ , $ [29] $ , $ [38] $ . The correspondence between two adjacent frames in a video is often formulated as an estimation of a 2D flow field. The extra degree of freedom transitioning from 1D in stereo to 2D in optical flow introduces an additional level of complexity. Typical assumptions in optical flow algorithms include brightness constancy and piecewise smoothness of the pixel displacement field $ [3] $ , $ [8] $ .
Image alignment becomes even more difficult in the object recognition scenario, where the goal is to align different instances of the same object category, as illustrated in Figure 1(b). Sophisticated object representations $ [