[slides and audio] NTU RGB%2BD 120%3A A Large-Scale Benchmark for 3D Human Activity Understanding

The paper introduces the NTU RGB+D 120 dataset, a large-scale benchmark for 3D human activity understanding. The dataset includes over 114,000 video samples and 8 million frames, collected from 106 distinct subjects, covering 120 different action classes such as daily, mutual, and health-related activities. The dataset addresses several limitations of existing benchmarks, including a lack of large-scale training samples, realistic class categories, diverse camera views, varied environmental conditions, and a wide range of human subjects. The authors evaluate state-of-the-art 3D activity analysis methods on this dataset, demonstrating the effectiveness of deep learning techniques. They also propose a novel one-shot 3D activity recognition framework, the Action-Part Semantic Relevance-aware (APSR) framework, which improves recognition performance for novel action classes by emphasizing relevant body parts based on semantic relevance. The APSR framework is evaluated and shown to achieve promising results. The paper concludes by highlighting the dataset's potential to advance research in 3D human activity understanding and the application of data-hungry learning techniques.The paper introduces the NTU RGB+D 120 dataset, a large-scale benchmark for 3D human activity understanding. The dataset includes over 114,000 video samples and 8 million frames, collected from 106 distinct subjects, covering 120 different action classes such as daily, mutual, and health-related activities. The dataset addresses several limitations of existing benchmarks, including a lack of large-scale training samples, realistic class categories, diverse camera views, varied environmental conditions, and a wide range of human subjects. The authors evaluate state-of-the-art 3D activity analysis methods on this dataset, demonstrating the effectiveness of deep learning techniques. They also propose a novel one-shot 3D activity recognition framework, the Action-Part Semantic Relevance-aware (APSR) framework, which improves recognition performance for novel action classes by emphasizing relevant body parts based on semantic relevance. The APSR framework is evaluated and shown to achieve promising results. The paper concludes by highlighting the dataset's potential to advance research in 3D human activity understanding and the application of data-hungry learning techniques.

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

10 Jun 2019 | Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C. Kot