29 Feb 2024 | Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliani, Pietro Morerio, Alessio Del Bue
DiffAssemble is a unified graph diffusion model for 2D and 3D reassembly tasks. The paper introduces a graph neural network (GNN)-based architecture that uses a diffusion model formulation to learn reassembly tasks. The model treats elements of a set, whether 2D patches or 3D object fragments, as nodes in a spatial graph. Training involves adding noise to the position and rotation of elements and iteratively denoising them to reconstruct the initial pose. DiffAssemble achieves state-of-the-art results in most 2D and 3D reassembly tasks and is the first learning-based approach that solves 2D puzzles for both rotation and translation. It also significantly reduces run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving.
The paper argues that 2D jigsaws and 3D objects are two aspects of the same problem, namely reassembly. All these tasks share some properties and potentially common solutions. However, methods that tackle only one of these tasks are too narrow to generalize to the others. DiffAssemble is a general framework for solving reassembly tasks using graph representations and a diffusion model formulation. Unlike prior learning-based approaches that typically tackle the problem in a single step, DiffAssemble uses a multi-step solution strategy leveraging diffusion probabilistic models (DPMs) to guide the process.
The model represents the elements to be reassembled using a graph formulation, allowing it to work with an arbitrary number of pieces. Each piece is modeled as a node containing the piece's visual appearance, extracted with an equivariant encoder, and the piece's position and orientation. By mapping the appearance to a latent space, the model can remove the separation between 2D and 3D tasks and propose a unique solution.
The learning problem is structured using the diffusion probabilistic models (DPM) formulation. The model iteratively adds Gaussian noise to each piece's starting position and orientation until they are randomly placed in the Euclidean space. It then trains an attention-based graph neural network to reverse this noising process and retrieve the pieces' original pose from a random starting position and orientation. A sparsifying mechanism is used on the graph to run DiffAssemble on graphs with up to 900 nodes with minimal loss in accuracy while greatly reducing the memory requirement.
DiffAssemble achieves state-of-the-art performance in most 2D and 3D tasks, showing that these tasks share common characteristics and can thus be solved through a unified approach. In 2D, compared to optimization-based solutions, the model is more robust to missing pieces and much faster. In 3D, the method achieves state-of-the-art results in both rotation and translation accuracy without sacrificing one for the other. The paper also presents experimental results on 3D object reassembly and 2D jigsaw puzzles, showing that DiffAssemble outperforms existing methods in most metrics. The model isDiffAssemble is a unified graph diffusion model for 2D and 3D reassembly tasks. The paper introduces a graph neural network (GNN)-based architecture that uses a diffusion model formulation to learn reassembly tasks. The model treats elements of a set, whether 2D patches or 3D object fragments, as nodes in a spatial graph. Training involves adding noise to the position and rotation of elements and iteratively denoising them to reconstruct the initial pose. DiffAssemble achieves state-of-the-art results in most 2D and 3D reassembly tasks and is the first learning-based approach that solves 2D puzzles for both rotation and translation. It also significantly reduces run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving.
The paper argues that 2D jigsaws and 3D objects are two aspects of the same problem, namely reassembly. All these tasks share some properties and potentially common solutions. However, methods that tackle only one of these tasks are too narrow to generalize to the others. DiffAssemble is a general framework for solving reassembly tasks using graph representations and a diffusion model formulation. Unlike prior learning-based approaches that typically tackle the problem in a single step, DiffAssemble uses a multi-step solution strategy leveraging diffusion probabilistic models (DPMs) to guide the process.
The model represents the elements to be reassembled using a graph formulation, allowing it to work with an arbitrary number of pieces. Each piece is modeled as a node containing the piece's visual appearance, extracted with an equivariant encoder, and the piece's position and orientation. By mapping the appearance to a latent space, the model can remove the separation between 2D and 3D tasks and propose a unique solution.
The learning problem is structured using the diffusion probabilistic models (DPM) formulation. The model iteratively adds Gaussian noise to each piece's starting position and orientation until they are randomly placed in the Euclidean space. It then trains an attention-based graph neural network to reverse this noising process and retrieve the pieces' original pose from a random starting position and orientation. A sparsifying mechanism is used on the graph to run DiffAssemble on graphs with up to 900 nodes with minimal loss in accuracy while greatly reducing the memory requirement.
DiffAssemble achieves state-of-the-art performance in most 2D and 3D tasks, showing that these tasks share common characteristics and can thus be solved through a unified approach. In 2D, compared to optimization-based solutions, the model is more robust to missing pieces and much faster. In 3D, the method achieves state-of-the-art results in both rotation and translation accuracy without sacrificing one for the other. The paper also presents experimental results on 3D object reassembly and 2D jigsaw puzzles, showing that DiffAssemble outperforms existing methods in most metrics. The model is