29 Feb 2024 | Gianluca Scarpellini*, Stefano Fiorini*, Francesco Giuliani*, Pietro Morerio Alessio Del Bue
**DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly**
**Authors:** Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari, Pietro Morerio, Alessio Del Bue
**Institution:** Pattern Analysis and Computer Vision (PAVIS), Istituto Italiano di Tecnologia (IIT)
**Abstract:**
Reassembly tasks are fundamental in various fields, and multiple approaches exist to solve specific problems. This paper introduces DiffAssemble, a Graph Neural Network (GNN)-based architecture that uses a diffusion model formulation to address reassembly tasks. DiffAssemble treats elements of a set, such as 2D patches or 3D object fragments, as nodes in a spatial graph. Training involves introducing noise into the position and rotation of these elements and iteratively denoising them to reconstruct the initial pose. DiffAssemble achieves state-of-the-art (SOTA) results in most 2D and 3D reassembly tasks and is the first learning-based approach to solve 2D puzzles for both rotation and translation. It also demonstrates a significant reduction in runtime, performing 11 times faster than the quickest optimization-based method for puzzle solving.
**Main Contributions:**
- **DiffAssemble:** A unified learning-based solution using diffusion models and GNNs for reassembly tasks, achieving SOTA results in most 2D and 3D scenarios.
- **Unified Approach:** DiffAssemble treats 2D and 3D reassembly tasks as a single problem, leveraging shared characteristics.
- **Robustness and Efficiency:** DiffAssemble is more robust to missing pieces and significantly faster compared to optimization-based methods.
**Related Works:**
The paper reviews existing literature on reassembly tasks, including 2D jigsaw puzzles and 3D object reassembly, highlighting the challenges and advancements in solving these problems.
**Methodology:**
- **Graph Formulation:** Elements are represented as nodes in a complete graph, with each node containing equivariant features, translation vectors, and rotation matrices.
- **Diffusion Models:** The forward process adds Gaussian noise to the initial poses, and the reverse process trains the GNN to reverse this process.
- **Graph Neural Networks:** An Attention-based GNN with Exphormer is used to handle large graphs efficiently.
**Experimental Evaluation:**
- **3D Object Reassembly:** DiffAssemble outperforms baselines on metrics such as rotation RMSE, translation RMSE, and part accuracy.
- **2D Jigsaw Puzzle:** DiffAssemble achieves SOTA results in CelebA and performs well in Wikiart, even with missing pieces.
- **Scaling to Larger Graphs:** DiffAssemble can handle up to 900 elements with minimal memory usage and significant speed improvements over optimization methods.
**Conclusion:**
DiffAssemble is a powerful framework for reassembly tasks, demonstrating**DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly**
**Authors:** Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari, Pietro Morerio, Alessio Del Bue
**Institution:** Pattern Analysis and Computer Vision (PAVIS), Istituto Italiano di Tecnologia (IIT)
**Abstract:**
Reassembly tasks are fundamental in various fields, and multiple approaches exist to solve specific problems. This paper introduces DiffAssemble, a Graph Neural Network (GNN)-based architecture that uses a diffusion model formulation to address reassembly tasks. DiffAssemble treats elements of a set, such as 2D patches or 3D object fragments, as nodes in a spatial graph. Training involves introducing noise into the position and rotation of these elements and iteratively denoising them to reconstruct the initial pose. DiffAssemble achieves state-of-the-art (SOTA) results in most 2D and 3D reassembly tasks and is the first learning-based approach to solve 2D puzzles for both rotation and translation. It also demonstrates a significant reduction in runtime, performing 11 times faster than the quickest optimization-based method for puzzle solving.
**Main Contributions:**
- **DiffAssemble:** A unified learning-based solution using diffusion models and GNNs for reassembly tasks, achieving SOTA results in most 2D and 3D scenarios.
- **Unified Approach:** DiffAssemble treats 2D and 3D reassembly tasks as a single problem, leveraging shared characteristics.
- **Robustness and Efficiency:** DiffAssemble is more robust to missing pieces and significantly faster compared to optimization-based methods.
**Related Works:**
The paper reviews existing literature on reassembly tasks, including 2D jigsaw puzzles and 3D object reassembly, highlighting the challenges and advancements in solving these problems.
**Methodology:**
- **Graph Formulation:** Elements are represented as nodes in a complete graph, with each node containing equivariant features, translation vectors, and rotation matrices.
- **Diffusion Models:** The forward process adds Gaussian noise to the initial poses, and the reverse process trains the GNN to reverse this process.
- **Graph Neural Networks:** An Attention-based GNN with Exphormer is used to handle large graphs efficiently.
**Experimental Evaluation:**
- **3D Object Reassembly:** DiffAssemble outperforms baselines on metrics such as rotation RMSE, translation RMSE, and part accuracy.
- **2D Jigsaw Puzzle:** DiffAssemble achieves SOTA results in CelebA and performs well in Wikiart, even with missing pieces.
- **Scaling to Larger Graphs:** DiffAssemble can handle up to 900 elements with minimal memory usage and significant speed improvements over optimization methods.
**Conclusion:**
DiffAssemble is a powerful framework for reassembly tasks, demonstrating