27 Sep 2015 | Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller
The paper "Multi-view Convolutional Neural Networks for 3D Shape Recognition" by Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller explores the use of 2D image renderings of 3D shapes for recognition tasks. The authors address the question of whether 3D shapes should be represented using native 3D formats (such as voxel grids or polygon meshes) or view-based descriptors. They present a standard CNN architecture trained to recognize 3D shapes from their rendered views, showing that a single view can achieve high accuracy, even surpassing state-of-the-art 3D shape descriptors. The paper introduces a novel multi-view CNN architecture that combines information from multiple views into a single, compact shape descriptor, further improving recognition performance. This architecture is also applied to recognize human hand-drawn sketches of shapes. The authors conclude that 2D views of 3D shapes are highly informative for recognition tasks and are well-suited for emerging CNN architectures and their derivatives. The paper includes experiments on the ModelNet40 dataset, demonstrating superior performance compared to existing 3D shape descriptors and Fisher vectors. Additionally, the multi-view CNN is shown to be effective in sketch-based 3D shape retrieval, achieving high accuracy in recognizing 3D objects from hand-drawn sketches.The paper "Multi-view Convolutional Neural Networks for 3D Shape Recognition" by Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller explores the use of 2D image renderings of 3D shapes for recognition tasks. The authors address the question of whether 3D shapes should be represented using native 3D formats (such as voxel grids or polygon meshes) or view-based descriptors. They present a standard CNN architecture trained to recognize 3D shapes from their rendered views, showing that a single view can achieve high accuracy, even surpassing state-of-the-art 3D shape descriptors. The paper introduces a novel multi-view CNN architecture that combines information from multiple views into a single, compact shape descriptor, further improving recognition performance. This architecture is also applied to recognize human hand-drawn sketches of shapes. The authors conclude that 2D views of 3D shapes are highly informative for recognition tasks and are well-suited for emerging CNN architectures and their derivatives. The paper includes experiments on the ModelNet40 dataset, demonstrating superior performance compared to existing 3D shape descriptors and Fisher vectors. Additionally, the multi-view CNN is shown to be effective in sketch-based 3D shape retrieval, achieving high accuracy in recognizing 3D objects from hand-drawn sketches.