Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

2024 | Daniel Geng, Andrew Owens
Motion guidance is a zero-shot technique that enables precise editing of images by specifying dense, complex motion fields indicating where each pixel should move. The method uses an off-the-shelf optical flow network to guide the diffusion sampling process, incorporating gradients to steer the generation towards the desired motion while maintaining visual similarity to the source image. This approach allows for complex motion edits on both real and generated images, achieving high-quality results without requiring training or specific diffusion network architectures. The technique involves a guidance loss that balances motion accuracy and visual fidelity, with additional color loss to preserve object appearance. The method handles occlusions by using an occlusion mask and enables diverse motion edits, including translations, rotations, stretches, and deformations. Qualitative and quantitative results demonstrate the effectiveness of motion guidance in manipulating image structure and appearance, outperforming baselines in terms of motion accuracy and visual quality. The method is also effective in motion transfer, where motion from a video is applied to an image. The approach is simple, efficient, and versatile, offering a new way to edit images using diffusion models with motion guidance.Motion guidance is a zero-shot technique that enables precise editing of images by specifying dense, complex motion fields indicating where each pixel should move. The method uses an off-the-shelf optical flow network to guide the diffusion sampling process, incorporating gradients to steer the generation towards the desired motion while maintaining visual similarity to the source image. This approach allows for complex motion edits on both real and generated images, achieving high-quality results without requiring training or specific diffusion network architectures. The technique involves a guidance loss that balances motion accuracy and visual fidelity, with additional color loss to preserve object appearance. The method handles occlusions by using an occlusion mask and enables diverse motion edits, including translations, rotations, stretches, and deformations. Qualitative and quantitative results demonstrate the effectiveness of motion guidance in manipulating image structure and appearance, outperforming baselines in terms of motion accuracy and visual quality. The method is also effective in motion transfer, where motion from a video is applied to an image. The approach is simple, efficient, and versatile, offering a new way to edit images using diffusion models with motion guidance.
Reach us at info@study.space
Understanding Motion Guidance%3A Diffusion-Based Image Editing with Differentiable Motion Estimators