Received: date / Accepted: date | Tianfan Xue1 · Baian Chen2 · Jiajun Wu2 · Donglai Wei3 · William T. Freeman2,4
This paper introduces Task-Oriented Flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner for video processing. TOFlow is designed as an end-to-end trainable convolutional network that performs motion analysis and video processing simultaneously. The network consists of three modules: a motion estimation module that estimates motion between input frames, an image transformation module that warps all input frames to a reference frame, and a task-specific image processing module that performs video interpolation, denoising, or super-resolution on registered frames. These modules are jointly trained to minimize the loss between output frames and ground truth.
TOFlow outperforms traditional optical flow on standard benchmarks as well as the newly introduced Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. The paper also introduces a large-scale, high-quality video dataset, Vimeo-90K, for video processing. This dataset consists of 89,800 high-quality video clips (i.e., 720p or higher) downloaded from Vimeo. Three benchmarks are built from these videos for interpolation, denoising or deblocking, and super-resolution, respectively.
The paper makes three contributions: (1) proposing TOFlow, a flow representation tailored to specific video processing tasks, significantly outperforming standard optical flow; (2) proposing an end-to-end trainable video processing framework that handles frame interpolation, video denoising, and video super-resolution; and (3) building a large-scale, high-quality video processing dataset, Vimeo-90K.
TOFlow is trained using a self-supervised loss function, and the flow estimation module is fine-tuned by minimizing a task-specific, self-supervised loss. The paper also evaluates TOFlow on three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. Results show that TOFlow outperforms traditional optical flow and other state-of-the-art methods in these tasks. The paper also demonstrates that TOFlow is robust to occlusion and can handle different video processing tasks. The results show that TOFlow is effective in video processing tasks, and the framework is generalizable to different video processing tasks.This paper introduces Task-Oriented Flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner for video processing. TOFlow is designed as an end-to-end trainable convolutional network that performs motion analysis and video processing simultaneously. The network consists of three modules: a motion estimation module that estimates motion between input frames, an image transformation module that warps all input frames to a reference frame, and a task-specific image processing module that performs video interpolation, denoising, or super-resolution on registered frames. These modules are jointly trained to minimize the loss between output frames and ground truth.
TOFlow outperforms traditional optical flow on standard benchmarks as well as the newly introduced Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. The paper also introduces a large-scale, high-quality video dataset, Vimeo-90K, for video processing. This dataset consists of 89,800 high-quality video clips (i.e., 720p or higher) downloaded from Vimeo. Three benchmarks are built from these videos for interpolation, denoising or deblocking, and super-resolution, respectively.
The paper makes three contributions: (1) proposing TOFlow, a flow representation tailored to specific video processing tasks, significantly outperforming standard optical flow; (2) proposing an end-to-end trainable video processing framework that handles frame interpolation, video denoising, and video super-resolution; and (3) building a large-scale, high-quality video processing dataset, Vimeo-90K.
TOFlow is trained using a self-supervised loss function, and the flow estimation module is fine-tuned by minimizing a task-specific, self-supervised loss. The paper also evaluates TOFlow on three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. Results show that TOFlow outperforms traditional optical flow and other state-of-the-art methods in these tasks. The paper also demonstrates that TOFlow is robust to occlusion and can handle different video processing tasks. The results show that TOFlow is effective in video processing tasks, and the framework is generalizable to different video processing tasks.