RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

25 Aug 2020 | Zachary Teed and Jia Deng
RAFT (Recurrent All-Pairs Field Transforms) is a novel deep network architecture for optical flow estimation. It extracts per-pixel features from input images, constructs a 4D correlation volume for all pixel pairs, and iteratively updates a flow field using a recurrent unit that performs lookups on the correlation volumes. RAFT achieves state-of-the-art performance on the KITTI and Sintel datasets, reducing errors by 16% and 30%, respectively. It also demonstrates strong cross-dataset generalization and high efficiency in inference time, training speed, and parameter count. The architecture consists of three main components: a feature encoder, a correlation layer, and an update operator. The feature encoder extracts per-pixel features, the correlation layer constructs a 4D correlation volume, and the update operator iteratively updates the flow field. RAFT's design is inspired by traditional optimization-based approaches but leverages learned features and motion priors. Key innovations include maintaining a single high-resolution flow field, using a recurrent and lightweight update operator, and a novel design for the update operator that uses a convolutional GRU to perform lookups on 4D multi-scale correlation volumes. Experiments on Sintel and KITTI datasets validate the effectiveness of RAFT, demonstrating its accuracy, generalization, and efficiency.RAFT (Recurrent All-Pairs Field Transforms) is a novel deep network architecture for optical flow estimation. It extracts per-pixel features from input images, constructs a 4D correlation volume for all pixel pairs, and iteratively updates a flow field using a recurrent unit that performs lookups on the correlation volumes. RAFT achieves state-of-the-art performance on the KITTI and Sintel datasets, reducing errors by 16% and 30%, respectively. It also demonstrates strong cross-dataset generalization and high efficiency in inference time, training speed, and parameter count. The architecture consists of three main components: a feature encoder, a correlation layer, and an update operator. The feature encoder extracts per-pixel features, the correlation layer constructs a 4D correlation volume, and the update operator iteratively updates the flow field. RAFT's design is inspired by traditional optimization-based approaches but leverages learned features and motion priors. Key innovations include maintaining a single high-resolution flow field, using a recurrent and lightweight update operator, and a novel design for the update operator that uses a convolutional GRU to perform lookups on 4D multi-scale correlation volumes. Experiments on Sintel and KITTI datasets validate the effectiveness of RAFT, demonstrating its accuracy, generalization, and efficiency.
Reach us at info@study.space