Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos

19 Mar 2024 | Hadi Alzayer, Zhihao Xia, Xuaner Zhang, Eli Shechtman, Jia-Bin Huang, and Michael Gharbi
Magic Fixup is a method for image editing that uses dynamic videos to improve realism. The approach involves a diffusion model that takes a coarsely edited image and transforms it into a realistic image by learning from video data. Users can rearrange objects in an image using simple 2D transformations, and the model then generates a realistic image that preserves the original content while adjusting for lighting and physical interactions. The method uses a paired dataset created from videos, where each pair consists of source and target frames. The model is trained to translate the warped source frame into the target frame, ensuring the output matches the desired layout. The model is designed to transfer fine details from the source image to the generated image while maintaining the user-specified layout. The approach is evaluated on user edits and shows superior performance compared to existing methods, with 89% of users preferring the results. The method is efficient, with editing times under 5 seconds, and allows for interactive editing. The key contributions include a new interface for image editing called Collage Transform, a new paired data generation approach, and a conditioning procedure that uses warped images and features from a second diffusion model to transfer details and preserve object identity. The method is effective in generating realistic images from user edits and is validated through experiments and user studies.Magic Fixup is a method for image editing that uses dynamic videos to improve realism. The approach involves a diffusion model that takes a coarsely edited image and transforms it into a realistic image by learning from video data. Users can rearrange objects in an image using simple 2D transformations, and the model then generates a realistic image that preserves the original content while adjusting for lighting and physical interactions. The method uses a paired dataset created from videos, where each pair consists of source and target frames. The model is trained to translate the warped source frame into the target frame, ensuring the output matches the desired layout. The model is designed to transfer fine details from the source image to the generated image while maintaining the user-specified layout. The approach is evaluated on user edits and shows superior performance compared to existing methods, with 89% of users preferring the results. The method is efficient, with editing times under 5 seconds, and allows for interactive editing. The key contributions include a new interface for image editing called Collage Transform, a new paired data generation approach, and a conditioning procedure that uses warped images and features from a second diffusion model to transfer details and preserve object identity. The method is effective in generating realistic images from user edits and is validated through experiments and user studies.
Reach us at info@study.space