4 Jan 2024 | Shengtao Li1,2, Ge Gao1,2*, Yudong Liu1,2, Yu-Shen Liu2, Ming Gu1,2
GridFormer: Point-Grid Transformer for Surface Reconstruction
**Authors:** Shengtao Li, Ge Gao, Yudong Liu, Yu-Shen Liu, Ming Gu
**Affiliations:** Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University; School of Software, Tsinghua University
**Abstract:**
Implicit neural networks have emerged as a crucial technology in 3D surface reconstruction. To reconstruct continuous surfaces from discrete point clouds, encoding input points into regular grid features (plane or volume) is a common approach. However, these methods typically use the grid as an index for uniformly scattering point features, which may sacrifice reconstruction details but improve efficiency. To address this, we introduce the Point-Grid Transformer (GridFormer), a novel and high-efficiency attention mechanism that treats the grid as a transfer point connecting the space and point cloud. Our method maximizes the spatial expressiveness of grid features while maintaining computational efficiency. To further enhance precision, we propose a boundary optimization strategy incorporating margin binary cross-entropy loss and boundary sampling. Experiments validate that GridFormer outperforms state-of-the-art approaches under widely used benchmarks, producing more precise geometry reconstructions.
**Introduction:**
Surface reconstruction plays a vital role in converting discrete point clouds into continuous representations. Learning-based approaches have gained popularity, but bridging the gap between continuous space and discrete point clouds remains challenging. Regular grid features capture information uniformly but may overlook shape details, while irregular features can represent 3D shapes more accurately but are difficult to connect with the space. GridFormer addresses these challenges by leveraging point-grid attention to model grid features, enabling the network to learn the relationship between input and grid features.
**Method:**
GridFormer uses a point-grid transformer layer to construct a continuous occupancy function. It first learns per-point features and then initializes grid features by uniformly scattering these points. The U-Net-like network is built based on this layer, taking point and grid features as input. The point-grid transformer layer includes position encoding, point feature aggregation, and grid feature aggregation. The multi-resolution decoder uses grid features to sample query points, and boundary optimization ensures precise predictions near the surface.
**Experiments:**
GridFormer is evaluated on the ShapeNet and Synthetic Rooms datasets. Results show that it outperforms point-based and grid-based methods in terms of reconstruction quality and efficiency. Ablation studies validate the effectiveness of the point-grid attention and boundary optimization.
**Conclusion:**
GridFormer introduces a novel point-grid attention mechanism, improving both object-level and scene-level reconstruction. It achieves smoother surfaces on unseen datasets and reduces the error between estimated and ground-truth occupancy functions. Future work will explore dynamic grid division for more scenarios.GridFormer: Point-Grid Transformer for Surface Reconstruction
**Authors:** Shengtao Li, Ge Gao, Yudong Liu, Yu-Shen Liu, Ming Gu
**Affiliations:** Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University; School of Software, Tsinghua University
**Abstract:**
Implicit neural networks have emerged as a crucial technology in 3D surface reconstruction. To reconstruct continuous surfaces from discrete point clouds, encoding input points into regular grid features (plane or volume) is a common approach. However, these methods typically use the grid as an index for uniformly scattering point features, which may sacrifice reconstruction details but improve efficiency. To address this, we introduce the Point-Grid Transformer (GridFormer), a novel and high-efficiency attention mechanism that treats the grid as a transfer point connecting the space and point cloud. Our method maximizes the spatial expressiveness of grid features while maintaining computational efficiency. To further enhance precision, we propose a boundary optimization strategy incorporating margin binary cross-entropy loss and boundary sampling. Experiments validate that GridFormer outperforms state-of-the-art approaches under widely used benchmarks, producing more precise geometry reconstructions.
**Introduction:**
Surface reconstruction plays a vital role in converting discrete point clouds into continuous representations. Learning-based approaches have gained popularity, but bridging the gap between continuous space and discrete point clouds remains challenging. Regular grid features capture information uniformly but may overlook shape details, while irregular features can represent 3D shapes more accurately but are difficult to connect with the space. GridFormer addresses these challenges by leveraging point-grid attention to model grid features, enabling the network to learn the relationship between input and grid features.
**Method:**
GridFormer uses a point-grid transformer layer to construct a continuous occupancy function. It first learns per-point features and then initializes grid features by uniformly scattering these points. The U-Net-like network is built based on this layer, taking point and grid features as input. The point-grid transformer layer includes position encoding, point feature aggregation, and grid feature aggregation. The multi-resolution decoder uses grid features to sample query points, and boundary optimization ensures precise predictions near the surface.
**Experiments:**
GridFormer is evaluated on the ShapeNet and Synthetic Rooms datasets. Results show that it outperforms point-based and grid-based methods in terms of reconstruction quality and efficiency. Ablation studies validate the effectiveness of the point-grid attention and boundary optimization.
**Conclusion:**
GridFormer introduces a novel point-grid attention mechanism, improving both object-level and scene-level reconstruction. It achieves smoother surfaces on unseen datasets and reduces the error between estimated and ground-truth occupancy functions. Future work will explore dynamic grid division for more scenarios.