[slides] On Space-Time Interest Points

This paper proposes a method for detecting spatio-temporal interest points in video data, which extend the concept of spatial interest points to the spatio-temporal domain. The method is based on the Harris and Förstner interest point operators and detects local structures in space-time where the image values have significant variations in both space and time. The approach involves estimating the spatio-temporal extents of detected events and computing scale-invariant spatio-temporal descriptors. These descriptors are then used to classify events and construct video representations in terms of labeled space-time points. The method is illustrated for the problem of human motion analysis, where it allows for the detection of walking people in scenes with occlusions and dynamic backgrounds. The paper describes the detection of interest points in the spatial and spatio-temporal domains. In the spatial domain, interest points are detected based on significant variations in image intensities. In the spatio-temporal domain, interest points are detected by considering variations in both spatial and temporal dimensions. The method uses a spatio-temporal scale-space representation and selects scales that correspond to the size of detected events in space and their durations in time. The paper also discusses the classification of events using k-means clustering and point descriptors defined by local spatio-temporal image derivatives. The results show that the method is effective for video interpretation, particularly for detecting walking people in complex scenes with occlusions and dynamic backgrounds. The method is also shown to be robust to variations in size and background, and it provides a stable and efficient representation of video data for interpretation.This paper proposes a method for detecting spatio-temporal interest points in video data, which extend the concept of spatial interest points to the spatio-temporal domain. The method is based on the Harris and Förstner interest point operators and detects local structures in space-time where the image values have significant variations in both space and time. The approach involves estimating the spatio-temporal extents of detected events and computing scale-invariant spatio-temporal descriptors. These descriptors are then used to classify events and construct video representations in terms of labeled space-time points. The method is illustrated for the problem of human motion analysis, where it allows for the detection of walking people in scenes with occlusions and dynamic backgrounds. The paper describes the detection of interest points in the spatial and spatio-temporal domains. In the spatial domain, interest points are detected based on significant variations in image intensities. In the spatio-temporal domain, interest points are detected by considering variations in both spatial and temporal dimensions. The method uses a spatio-temporal scale-space representation and selects scales that correspond to the size of detected events in space and their durations in time. The paper also discusses the classification of events using k-means clustering and point descriptors defined by local spatio-temporal image derivatives. The results show that the method is effective for video interpretation, particularly for detecting walking people in complex scenes with occlusions and dynamic backgrounds. The method is also shown to be robust to variations in size and background, and it provides a stable and efficient representation of video data for interpretation.

Space-time Interest Points

2003 | Ivan Laptev and Tony Lindeberg