26 Jun 2024 | Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
The paper "Situation Awareness Matters in 3D Vision Language Reasoning" by Yunze Man, Liang-Yan Gui, and Yu-Xiong Wang from the University of Illinois Urbana-Champaign addresses the critical challenge of situational awareness in 3D vision language reasoning tasks. The authors introduce SIG3D, an end-to-end model designed to ground situational descriptions in 3D space and enhance visual tokens based on the agent's calculated position. The model tokenizes 3D scenes into sparse voxel representations and proposes a language-grounded situation estimator followed by a situated question answering module. Experiments on the SQA3D and ScanQA datasets demonstrate that SIG3D outperforms state-of-the-art models in situation estimation and question answering, with improvements of over 30% in situation estimation accuracy. The paper also includes a pilot study that highlights the importance of situational understanding in downstream reasoning tasks and provides detailed analysis of the model's architectural choices and performance. The project page is available at <https://yunzeman.github.io/situation3d>.The paper "Situation Awareness Matters in 3D Vision Language Reasoning" by Yunze Man, Liang-Yan Gui, and Yu-Xiong Wang from the University of Illinois Urbana-Champaign addresses the critical challenge of situational awareness in 3D vision language reasoning tasks. The authors introduce SIG3D, an end-to-end model designed to ground situational descriptions in 3D space and enhance visual tokens based on the agent's calculated position. The model tokenizes 3D scenes into sparse voxel representations and proposes a language-grounded situation estimator followed by a situated question answering module. Experiments on the SQA3D and ScanQA datasets demonstrate that SIG3D outperforms state-of-the-art models in situation estimation and question answering, with improvements of over 30% in situation estimation accuracy. The paper also includes a pilot study that highlights the importance of situational understanding in downstream reasoning tasks and provides detailed analysis of the model's architectural choices and performance. The project page is available at <https://yunzeman.github.io/situation3d>.