RAFT is a novel deep network architecture for optical flow estimation, introduced by Zachary Teed and Jia Deng from Princeton University. The architecture, named Recurrent All-Pairs Field Transforms (RAFT), extracts per-pixel features from input images, constructs multi-scale 4D correlation volumes for all pixel pairs, and iteratively updates a flow field using a recurrent unit that performs lookups on these correlation volumes. RAFT achieves state-of-the-art performance on benchmark datasets such as KITTI and Sintel, with significant error reductions compared to previous methods. It also demonstrates strong cross-dataset generalization and high efficiency in inference time, training speed, and parameter count.
RAFT consists of three main components: a feature encoder that extracts per-pixel features, a correlation layer that constructs a 4D correlation volume for all pixel pairs, and an update operator that iteratively refines the flow field. The feature encoder processes both input images, while the context encoder extracts features from only the first image. The correlation layer computes visual similarity between pixels, and the update operator, based on a recurrent GRU, retrieves values from the correlation volumes to iteratively update the flow field.
RAFT is motivated by traditional optimization-based approaches but differs in that features and motion priors are learned rather than hand-crafted. The architecture is designed to maintain and update a single high-resolution flow field, which allows it to overcome limitations of coarse-to-fine approaches. The update operator is lightweight and recurrent, enabling it to perform many iterations during inference without divergence. It uses a convolutional GRU to perform lookups on multi-scale correlation volumes, which is different from prior work that typically uses plain convolution or correlation layers.
RAFT achieves state-of-the-art performance on both Sintel and KITTI datasets, with significant error reductions compared to previous methods. It also demonstrates strong cross-dataset generalization and high efficiency. The method is implemented in PyTorch and is available at https://github.com/princeton-vl/RAFT. The architecture is evaluated on various datasets and compared with existing methods, showing its effectiveness in optical flow estimation.RAFT is a novel deep network architecture for optical flow estimation, introduced by Zachary Teed and Jia Deng from Princeton University. The architecture, named Recurrent All-Pairs Field Transforms (RAFT), extracts per-pixel features from input images, constructs multi-scale 4D correlation volumes for all pixel pairs, and iteratively updates a flow field using a recurrent unit that performs lookups on these correlation volumes. RAFT achieves state-of-the-art performance on benchmark datasets such as KITTI and Sintel, with significant error reductions compared to previous methods. It also demonstrates strong cross-dataset generalization and high efficiency in inference time, training speed, and parameter count.
RAFT consists of three main components: a feature encoder that extracts per-pixel features, a correlation layer that constructs a 4D correlation volume for all pixel pairs, and an update operator that iteratively refines the flow field. The feature encoder processes both input images, while the context encoder extracts features from only the first image. The correlation layer computes visual similarity between pixels, and the update operator, based on a recurrent GRU, retrieves values from the correlation volumes to iteratively update the flow field.
RAFT is motivated by traditional optimization-based approaches but differs in that features and motion priors are learned rather than hand-crafted. The architecture is designed to maintain and update a single high-resolution flow field, which allows it to overcome limitations of coarse-to-fine approaches. The update operator is lightweight and recurrent, enabling it to perform many iterations during inference without divergence. It uses a convolutional GRU to perform lookups on multi-scale correlation volumes, which is different from prior work that typically uses plain convolution or correlation layers.
RAFT achieves state-of-the-art performance on both Sintel and KITTI datasets, with significant error reductions compared to previous methods. It also demonstrates strong cross-dataset generalization and high efficiency. The method is implemented in PyTorch and is available at https://github.com/princeton-vl/RAFT. The architecture is evaluated on various datasets and compared with existing methods, showing its effectiveness in optical flow estimation.