28 Jul 2024 | Ruining Li, Chuanxia Zheng, Christian Rupprecht, and Andrea Vedaldi
DragAPart is a method that generates images of objects based on input images and drag interactions, enabling part-level motion understanding. Unlike prior works focusing on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. It is trained on a new synthetic dataset, Drag-a-Move, which allows the model to generalize well to real data and unseen categories. The model can be used for segmenting movable parts and analyzing motion prompted by a drag.
The paper introduces DragAPart, an interactive generative model that, given an image and a set of drags, generates a new image of the same object that responds to the action of the drags. The model is trained on a synthetic dataset, Drag-a-Move, which includes a collection of triplets (x, y, D), where x is the final state of the object after applying drags D to the initial state y. The model uses a latent diffusion architecture with a novel drag encoding technique to enable efficient information propagation and better generalization to real-world images and unseen categories.
The model is evaluated on both synthetic and real data, showing significant improvements in part-level motion understanding compared to prior motion-controlled generators. It is also applied to downstream tasks such as segmenting moving parts and analyzing motion for articulated objects. The model's ability to generalize to real-world images and unseen categories is validated through experiments on real data and benchmark datasets. The results demonstrate that DragAPart outperforms existing methods in terms of quantitative metrics such as PSNR, SSIM, and LPIPS. The model's performance is further enhanced through domain randomization, which improves generalization to out-of-distribution data. The paper concludes that DragAPart provides a useful motion model for applications, including motion analysis for articulated objects and moving part segmentation.DragAPart is a method that generates images of objects based on input images and drag interactions, enabling part-level motion understanding. Unlike prior works focusing on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. It is trained on a new synthetic dataset, Drag-a-Move, which allows the model to generalize well to real data and unseen categories. The model can be used for segmenting movable parts and analyzing motion prompted by a drag.
The paper introduces DragAPart, an interactive generative model that, given an image and a set of drags, generates a new image of the same object that responds to the action of the drags. The model is trained on a synthetic dataset, Drag-a-Move, which includes a collection of triplets (x, y, D), where x is the final state of the object after applying drags D to the initial state y. The model uses a latent diffusion architecture with a novel drag encoding technique to enable efficient information propagation and better generalization to real-world images and unseen categories.
The model is evaluated on both synthetic and real data, showing significant improvements in part-level motion understanding compared to prior motion-controlled generators. It is also applied to downstream tasks such as segmenting moving parts and analyzing motion for articulated objects. The model's ability to generalize to real-world images and unseen categories is validated through experiments on real data and benchmark datasets. The results demonstrate that DragAPart outperforms existing methods in terms of quantitative metrics such as PSNR, SSIM, and LPIPS. The model's performance is further enhanced through domain randomization, which improves generalization to out-of-distribution data. The paper concludes that DragAPart provides a useful motion model for applications, including motion analysis for articulated objects and moving part segmentation.