[slides] Rethinking Inductive Biases for Surface Normal Estimation

This paper addresses the challenge of estimating surface normals from a single RGB image, a task that is not affected by scale ambiguity and has a compact output space. The authors propose a method that leverages per-pixel ray direction and encodes the relationship between neighboring surface normals by learning their relative rotation. This approach allows for more accurate and piecewise smooth predictions, even for challenging in-the-wild images with arbitrary resolution and aspect ratios. Compared to a recent state-of-the-art model, the proposed method shows stronger generalization ability despite being trained on a much smaller dataset. The key contributions include: 1. **Per-Pixel Ray Direction**: The method uses dense pixel-wise ray direction as input to the network, enabling camera intrinsics-aware inference and improving generalization. 2. **Ray Direction-Based Activation**: A new activation function is introduced to ensure that the predicted normal is visible, i.e., the angle between the ray direction and the normal is greater than 90°. 3. **Rotation Estimation**: The surface normal estimation is recast as rotation estimation, where the relative rotation between neighboring pixels is estimated using axis-angle representation. This allows for piecewise smooth predictions that are crisp at object boundaries. The proposed method is evaluated on several datasets, including indoor scenes, dynamic outdoor scenes, and in-the-wild images, demonstrating superior performance in terms of accuracy and detail. The code for the method is available at <https://github.com/baegwangbin/DSINE>.This paper addresses the challenge of estimating surface normals from a single RGB image, a task that is not affected by scale ambiguity and has a compact output space. The authors propose a method that leverages per-pixel ray direction and encodes the relationship between neighboring surface normals by learning their relative rotation. This approach allows for more accurate and piecewise smooth predictions, even for challenging in-the-wild images with arbitrary resolution and aspect ratios. Compared to a recent state-of-the-art model, the proposed method shows stronger generalization ability despite being trained on a much smaller dataset. The key contributions include: 1. **Per-Pixel Ray Direction**: The method uses dense pixel-wise ray direction as input to the network, enabling camera intrinsics-aware inference and improving generalization. 2. **Ray Direction-Based Activation**: A new activation function is introduced to ensure that the predicted normal is visible, i.e., the angle between the ray direction and the normal is greater than 90°. 3. **Rotation Estimation**: The surface normal estimation is recast as rotation estimation, where the relative rotation between neighboring pixels is estimated using axis-angle representation. This allows for piecewise smooth predictions that are crisp at object boundaries. The proposed method is evaluated on several datasets, including indoor scenes, dynamic outdoor scenes, and in-the-wild images, demonstrating superior performance in terms of accuracy and detail. The code for the method is available at <https://github.com/baegwangbin/DSINE>.

Rethinking Inductive Biases for Surface Normal Estimation

1 Mar 2024 | Gwangbin Bae Andrew J. Davison