Jun 2011 | Heng Wang, Alexander Kläser, Cordelia Schmid, Liu Cheng-Lin
This paper introduces a method for action recognition using dense trajectories. The approach samples dense points from each frame and tracks them using displacement information from a dense optical flow field. The trajectories are robust to fast irregular motions and shot boundaries, and effectively capture motion information in videos. A novel descriptor based on motion boundary histograms is introduced, which is robust to camera motion and outperforms other state-of-the-art descriptors, especially in uncontrolled realistic videos. The video description is evaluated in the context of action classification using a bag-of-features approach. Experimental results show significant improvements over the state of the art on four datasets: KTH, YouTube, Hollywood2, and UCF sports.
The method involves extracting dense trajectories by tracking densely sampled points using optical flow fields. The number of tracked points can be scaled up easily, and global smoothness constraints are imposed among the points in dense optical flow fields, resulting in more robust trajectories. The shape of a trajectory encodes local motion patterns, and the trajectory is described by a sequence of displacement vectors. The resulting vector is normalized by the sum of the magnitudes of the displacement vectors.
Trajectory-aligned descriptors are computed within a space-time volume around the trajectory. The size of the volume is N x N pixels and L frames. The volume is subdivided into a spatio-temporal grid of size nσ x nσ x nτ. The default parameters for our experiments are N = 32, nσ = 2, nτ = 3. The descriptors include HOG, HOF, and MBH, which are normalized with their L2 norm. The MBH descriptor separates the optical flow field into its x and y components and encodes the relative motion between pixels. This descriptor is effective in suppressing background motion and highlighting foreground motion.
The method is evaluated on four standard action datasets: KTH, YouTube, Hollywood2, and UCF sports. The results show that the dense trajectories outperform the KLT trajectories by 2% to 6%. The MBH descriptor consistently outperforms other descriptors on all four datasets, especially on uncontrolled realistic datasets. The method is also compared to state-of-the-art methods, and it is shown to outperform them on several datasets. The results demonstrate the effectiveness of the dense trajectories and the MBH descriptor in action recognition.This paper introduces a method for action recognition using dense trajectories. The approach samples dense points from each frame and tracks them using displacement information from a dense optical flow field. The trajectories are robust to fast irregular motions and shot boundaries, and effectively capture motion information in videos. A novel descriptor based on motion boundary histograms is introduced, which is robust to camera motion and outperforms other state-of-the-art descriptors, especially in uncontrolled realistic videos. The video description is evaluated in the context of action classification using a bag-of-features approach. Experimental results show significant improvements over the state of the art on four datasets: KTH, YouTube, Hollywood2, and UCF sports.
The method involves extracting dense trajectories by tracking densely sampled points using optical flow fields. The number of tracked points can be scaled up easily, and global smoothness constraints are imposed among the points in dense optical flow fields, resulting in more robust trajectories. The shape of a trajectory encodes local motion patterns, and the trajectory is described by a sequence of displacement vectors. The resulting vector is normalized by the sum of the magnitudes of the displacement vectors.
Trajectory-aligned descriptors are computed within a space-time volume around the trajectory. The size of the volume is N x N pixels and L frames. The volume is subdivided into a spatio-temporal grid of size nσ x nσ x nτ. The default parameters for our experiments are N = 32, nσ = 2, nτ = 3. The descriptors include HOG, HOF, and MBH, which are normalized with their L2 norm. The MBH descriptor separates the optical flow field into its x and y components and encodes the relative motion between pixels. This descriptor is effective in suppressing background motion and highlighting foreground motion.
The method is evaluated on four standard action datasets: KTH, YouTube, Hollywood2, and UCF sports. The results show that the dense trajectories outperform the KLT trajectories by 2% to 6%. The MBH descriptor consistently outperforms other descriptors on all four datasets, especially on uncontrolled realistic datasets. The method is also compared to state-of-the-art methods, and it is shown to outperform them on several datasets. The results demonstrate the effectiveness of the dense trajectories and the MBH descriptor in action recognition.