Understanding DiffAssemble%3A A Unified Graph-Diffusion Model for 2D and 3D Reassembly

**DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly** **Authors:** Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari, Pietro Morerio, Alessio Del Bue **Institution:** Pattern Analysis and Computer Vision (PAVIS), Istituto Italiano di Tecnologia (IIT) **Abstract:** Reassembly tasks are fundamental in various fields, and multiple approaches exist to solve specific problems. This paper introduces DiffAssemble, a Graph Neural Network (GNN)-based architecture that uses a diffusion model formulation to address reassembly tasks. DiffAssemble treats elements of a set, such as 2D patches or 3D object fragments, as nodes in a spatial graph. Training involves introducing noise into the position and rotation of these elements and iteratively denoising them to reconstruct the initial pose. DiffAssemble achieves state-of-the-art (SOTA) results in most 2D and 3D reassembly tasks and is the first learning-based approach to solve 2D puzzles for both rotation and translation. It also demonstrates a significant reduction in runtime, performing 11 times faster than the quickest optimization-based method for puzzle solving. **Main Contributions:** - **DiffAssemble:** A unified learning-based solution using diffusion models and GNNs for reassembly tasks, achieving SOTA results in most 2D and 3D scenarios. - **Unified Approach:** DiffAssemble treats 2D and 3D reassembly tasks as a single problem, leveraging shared characteristics. - **Robustness and Efficiency:** DiffAssemble is more robust to missing pieces and significantly faster compared to optimization-based methods. **Related Works:** The paper reviews existing literature on reassembly tasks, including 2D jigsaw puzzles and 3D object reassembly, highlighting the challenges and advancements in solving these problems. **Methodology:** - **Graph Formulation:** Elements are represented as nodes in a complete graph, with each node containing equivariant features, translation vectors, and rotation matrices. - **Diffusion Models:** The forward process adds Gaussian noise to the initial poses, and the reverse process trains the GNN to reverse this process. - **Graph Neural Networks:** An Attention-based GNN with Exphormer is used to handle large graphs efficiently. **Experimental Evaluation:** - **3D Object Reassembly:** DiffAssemble outperforms baselines on metrics such as rotation RMSE, translation RMSE, and part accuracy. - **2D Jigsaw Puzzle:** DiffAssemble achieves SOTA results in CelebA and performs well in Wikiart, even with missing pieces. - **Scaling to Larger Graphs:** DiffAssemble can handle up to 900 elements with minimal memory usage and significant speed improvements over optimization methods. **Conclusion:** DiffAssemble is a powerful framework for reassembly tasks, demonstrating**DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly** **Authors:** Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari, Pietro Morerio, Alessio Del Bue **Institution:** Pattern Analysis and Computer Vision (PAVIS), Istituto Italiano di Tecnologia (IIT) **Abstract:** Reassembly tasks are fundamental in various fields, and multiple approaches exist to solve specific problems. This paper introduces DiffAssemble, a Graph Neural Network (GNN)-based architecture that uses a diffusion model formulation to address reassembly tasks. DiffAssemble treats elements of a set, such as 2D patches or 3D object fragments, as nodes in a spatial graph. Training involves introducing noise into the position and rotation of these elements and iteratively denoising them to reconstruct the initial pose. DiffAssemble achieves state-of-the-art (SOTA) results in most 2D and 3D reassembly tasks and is the first learning-based approach to solve 2D puzzles for both rotation and translation. It also demonstrates a significant reduction in runtime, performing 11 times faster than the quickest optimization-based method for puzzle solving. **Main Contributions:** - **DiffAssemble:** A unified learning-based solution using diffusion models and GNNs for reassembly tasks, achieving SOTA results in most 2D and 3D scenarios. - **Unified Approach:** DiffAssemble treats 2D and 3D reassembly tasks as a single problem, leveraging shared characteristics. - **Robustness and Efficiency:** DiffAssemble is more robust to missing pieces and significantly faster compared to optimization-based methods. **Related Works:** The paper reviews existing literature on reassembly tasks, including 2D jigsaw puzzles and 3D object reassembly, highlighting the challenges and advancements in solving these problems. **Methodology:** - **Graph Formulation:** Elements are represented as nodes in a complete graph, with each node containing equivariant features, translation vectors, and rotation matrices. - **Diffusion Models:** The forward process adds Gaussian noise to the initial poses, and the reverse process trains the GNN to reverse this process. - **Graph Neural Networks:** An Attention-based GNN with Exphormer is used to handle large graphs efficiently. **Experimental Evaluation:** - **3D Object Reassembly:** DiffAssemble outperforms baselines on metrics such as rotation RMSE, translation RMSE, and part accuracy. - **2D Jigsaw Puzzle:** DiffAssemble achieves SOTA results in CelebA and performs well in Wikiart, even with missing pieces. - **Scaling to Larger Graphs:** DiffAssemble can handle up to 900 elements with minimal memory usage and significant speed improvements over optimization methods. **Conclusion:** DiffAssemble is a powerful framework for reassembly tasks, demonstrating

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

29 Feb 2024 | Gianluca Scarpellini*, Stefano Fiorini*, Francesco Giuliani*, Pietro Morerio Alessio Del Bue

29 Feb 2024 | Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliani*, Pietro Morerio Alessio Del Bue