4 Mar 2024 | Zhishan Zhou, Shihao Zhou, Zhi Lv, Minqiang Zou, Yao Tang, Jiajun Liang
A simple baseline for efficient hand mesh reconstruction is proposed, which decomposes the mesh decoder into a token generator and a mesh regressor. Through extensive ablation experiments, it is found that the token generator should select discriminative and representative points, while the mesh regressor needs to upsample sparse keypoints into dense meshes in multiple stages. This approach achieves high performance with minimal computational resources. The proposed method outperforms state-of-the-art methods by a large margin while maintaining real-time efficiency. On the FreiHAND dataset, the method achieves a PA-MPJPE of 5.8mm and a PA-MPVPE of 6.1mm. On the DexYCB dataset, it achieves a PA-MPJPE of 5.5mm and a PA-MPVPE of 5.5mm. The method also achieves high efficiency, reaching up to 33 fps with HRNet and 70 fps with FastViT-MA36. The method is efficient and effective, achieving state-of-the-art results on multiple datasets. The core structures of the token generator and mesh regressor are identified as key components for achieving high performance with minimal computational resources. The method is designed to be simple and efficient, with a focus on achieving high performance in hand mesh reconstruction. The method is evaluated on multiple datasets and shows significant improvements in accuracy and efficiency compared to existing methods. The method is also tested on different backbones and shows consistent performance across different configurations. The method is effective in reconstructing hand meshes with high accuracy and efficiency, and is suitable for real-time applications. The method is also tested on different datasets and shows consistent performance across different configurations. The method is effective in reconstructing hand meshes with high accuracy and efficiency, and is suitable for real-time applications.A simple baseline for efficient hand mesh reconstruction is proposed, which decomposes the mesh decoder into a token generator and a mesh regressor. Through extensive ablation experiments, it is found that the token generator should select discriminative and representative points, while the mesh regressor needs to upsample sparse keypoints into dense meshes in multiple stages. This approach achieves high performance with minimal computational resources. The proposed method outperforms state-of-the-art methods by a large margin while maintaining real-time efficiency. On the FreiHAND dataset, the method achieves a PA-MPJPE of 5.8mm and a PA-MPVPE of 6.1mm. On the DexYCB dataset, it achieves a PA-MPJPE of 5.5mm and a PA-MPVPE of 5.5mm. The method also achieves high efficiency, reaching up to 33 fps with HRNet and 70 fps with FastViT-MA36. The method is efficient and effective, achieving state-of-the-art results on multiple datasets. The core structures of the token generator and mesh regressor are identified as key components for achieving high performance with minimal computational resources. The method is designed to be simple and efficient, with a focus on achieving high performance in hand mesh reconstruction. The method is evaluated on multiple datasets and shows significant improvements in accuracy and efficiency compared to existing methods. The method is also tested on different backbones and shows consistent performance across different configurations. The method is effective in reconstructing hand meshes with high accuracy and efficiency, and is suitable for real-time applications. The method is also tested on different datasets and shows consistent performance across different configurations. The method is effective in reconstructing hand meshes with high accuracy and efficiency, and is suitable for real-time applications.