VOL. 14, NO. 8, AUGUST 2015 | Liang Zheng, Yi Yang, and Alexander G. Hauptmann
Person re-identification (re-ID) has become increasingly popular in the computer vision community due to its applications in surveillance and research. The task involves identifying a person across different cameras, even when their appearance changes. Early methods relied on hand-crafted algorithms and small datasets, but recent advances in deep learning and large-scale datasets have significantly improved performance. Current re-ID methods are classified into image-based and video-based, with both approaches using either hand-crafted systems or deep learning models. Two new tasks, end-to-end re-ID and fast re-ID in large galleries, are also discussed.
This survey introduces the history of person re-ID, its relationship with image classification and instance retrieval, and reviews a broad range of hand-crafted and deep learning methods in both image- and video-based re-ID. It also discusses critical future directions, including end-to-end re-ID and fast retrieval in large galleries, and highlights important underdeveloped issues.
The history of person re-ID began with multi-camera tracking, where appearance models were integrated with geometry calibration. In 2005, the term "person re-identification" was first used in a paper by Zajdel et al. In 2006, Gheissari et al. introduced image-based re-ID, separating it from multi-camera tracking. Video-based re-ID initially focused on image matching, but later incorporated video information. Deep learning methods, such as siamese neural networks, were introduced in 2014, leading to significant improvements in re-ID accuracy.
Hand-crafted systems use features like color, texture, and spatial-temporal information, while deep learning systems use CNNs to extract discriminative features. Distance metric learning is crucial for re-ID, with methods like KISSME and LMNN being widely used. Deep learning has shown superior performance on most datasets, especially for large-scale re-ID.
Datasets like VIPeR, CUHK01, iLIDS, PRID 450S, CUHK03, and Market-1501 have been widely used for re-ID research. Evaluation metrics like CMC and mAP are commonly used to assess performance. Over the years, re-ID accuracy has improved significantly, with deep learning methods outperforming hand-crafted systems on most datasets.
Video-based re-ID involves matching bounding boxes across video sequences, with methods using spatial-temporal features and temporal information. Deep learning methods, such as CNNs and RNNs, have been applied to video-based re-ID, leading to better performance. The MARS dataset, a large-scale video re-ID dataset, has been recently released, highlighting the growing importance of video-based re-ID.
Future research directions include improving end-to-end re-ID, fast retrieval in large galleries, and incorporating temporal information in video-based re-ID. The survey concludes that deep learning methods are currently the most effective for re-ID, but there is still room for improvement, especially with largerPerson re-identification (re-ID) has become increasingly popular in the computer vision community due to its applications in surveillance and research. The task involves identifying a person across different cameras, even when their appearance changes. Early methods relied on hand-crafted algorithms and small datasets, but recent advances in deep learning and large-scale datasets have significantly improved performance. Current re-ID methods are classified into image-based and video-based, with both approaches using either hand-crafted systems or deep learning models. Two new tasks, end-to-end re-ID and fast re-ID in large galleries, are also discussed.
This survey introduces the history of person re-ID, its relationship with image classification and instance retrieval, and reviews a broad range of hand-crafted and deep learning methods in both image- and video-based re-ID. It also discusses critical future directions, including end-to-end re-ID and fast retrieval in large galleries, and highlights important underdeveloped issues.
The history of person re-ID began with multi-camera tracking, where appearance models were integrated with geometry calibration. In 2005, the term "person re-identification" was first used in a paper by Zajdel et al. In 2006, Gheissari et al. introduced image-based re-ID, separating it from multi-camera tracking. Video-based re-ID initially focused on image matching, but later incorporated video information. Deep learning methods, such as siamese neural networks, were introduced in 2014, leading to significant improvements in re-ID accuracy.
Hand-crafted systems use features like color, texture, and spatial-temporal information, while deep learning systems use CNNs to extract discriminative features. Distance metric learning is crucial for re-ID, with methods like KISSME and LMNN being widely used. Deep learning has shown superior performance on most datasets, especially for large-scale re-ID.
Datasets like VIPeR, CUHK01, iLIDS, PRID 450S, CUHK03, and Market-1501 have been widely used for re-ID research. Evaluation metrics like CMC and mAP are commonly used to assess performance. Over the years, re-ID accuracy has improved significantly, with deep learning methods outperforming hand-crafted systems on most datasets.
Video-based re-ID involves matching bounding boxes across video sequences, with methods using spatial-temporal features and temporal information. Deep learning methods, such as CNNs and RNNs, have been applied to video-based re-ID, leading to better performance. The MARS dataset, a large-scale video re-ID dataset, has been recently released, highlighting the growing importance of video-based re-ID.
Future research directions include improving end-to-end re-ID, fast retrieval in large galleries, and incorporating temporal information in video-based re-ID. The survey concludes that deep learning methods are currently the most effective for re-ID, but there is still room for improvement, especially with larger