12 Apr 2016 | Abhinav Shrivastava1 Abhinav Gupta1 Ross Girshick2
This paper introduces an online hard example mining (OHEM) algorithm for training region-based ConvNet object detectors. The algorithm eliminates several heuristics and hyperparameters commonly used in region-based ConvNets, and instead selects hard examples based on their loss during training. This approach leads to consistent and significant improvements in detection performance on benchmarks like PASCAL VOC 2007 and 2012. The effectiveness of OHEM increases as datasets become larger and more difficult, as demonstrated by results on the MS COCO dataset. OHEM is also complementary to recent improvements in object detection, such as multiscale testing and iterative bounding-box regression. When combined with these techniques, OHEM achieves state-of-the-art results of 78.9% and 76.3% mAP on PASCAL VOC 2007 and 2012, respectively.
OHEM works by selecting the hardest examples for training based on their loss, which allows the model to focus on difficult examples and improve detection accuracy. The algorithm is a simple modification to stochastic gradient descent (SGD) in which training examples are sampled according to a non-uniform, non-stationary distribution that depends on the current loss of each example. This method takes advantage of the structure of detection problems, where each SGD mini-batch consists of only one or two images, but thousands of candidate examples. The candidate examples are subsampled according to a distribution that favors diverse, high loss instances. Gradient computation is still efficient because it only uses a small subset of all candidates.
The paper also evaluates the performance of OHEM on the PASCAL VOC and MS COCO datasets. On VOC 2007, OHEM improves the mAP of FRCN from 67.2% to 69.9% (and 70.0% to 74.6% with extra data). On VOC 12, OHEM leads to an improvement of 4.1 points in mAP (from 65.7% to 69.8%). With extra data, the mAP is improved to 71.9% compared to 68.4% of FRCN, an improvement of 3.5 points. On MS COCO, OHEM improves the AP from 19.7% to 22.6%. Using the VOC overlap metric of IoU ≥ 0.5, OHEM gives a 6.6 points boost in AP 50. It is also interesting to note that OHEM helps improve the AP of medium sized objects by 4.9 points on the strict COCO AP evaluation metric.
The paper also shows that OHEM is orthogonal to recent bells and whistles that enhance object detection accuracy. OHEM with the following two additions yields state-of-the-art results onThis paper introduces an online hard example mining (OHEM) algorithm for training region-based ConvNet object detectors. The algorithm eliminates several heuristics and hyperparameters commonly used in region-based ConvNets, and instead selects hard examples based on their loss during training. This approach leads to consistent and significant improvements in detection performance on benchmarks like PASCAL VOC 2007 and 2012. The effectiveness of OHEM increases as datasets become larger and more difficult, as demonstrated by results on the MS COCO dataset. OHEM is also complementary to recent improvements in object detection, such as multiscale testing and iterative bounding-box regression. When combined with these techniques, OHEM achieves state-of-the-art results of 78.9% and 76.3% mAP on PASCAL VOC 2007 and 2012, respectively.
OHEM works by selecting the hardest examples for training based on their loss, which allows the model to focus on difficult examples and improve detection accuracy. The algorithm is a simple modification to stochastic gradient descent (SGD) in which training examples are sampled according to a non-uniform, non-stationary distribution that depends on the current loss of each example. This method takes advantage of the structure of detection problems, where each SGD mini-batch consists of only one or two images, but thousands of candidate examples. The candidate examples are subsampled according to a distribution that favors diverse, high loss instances. Gradient computation is still efficient because it only uses a small subset of all candidates.
The paper also evaluates the performance of OHEM on the PASCAL VOC and MS COCO datasets. On VOC 2007, OHEM improves the mAP of FRCN from 67.2% to 69.9% (and 70.0% to 74.6% with extra data). On VOC 12, OHEM leads to an improvement of 4.1 points in mAP (from 65.7% to 69.8%). With extra data, the mAP is improved to 71.9% compared to 68.4% of FRCN, an improvement of 3.5 points. On MS COCO, OHEM improves the AP from 19.7% to 22.6%. Using the VOC overlap metric of IoU ≥ 0.5, OHEM gives a 6.6 points boost in AP 50. It is also interesting to note that OHEM helps improve the AP of medium sized objects by 4.9 points on the strict COCO AP evaluation metric.
The paper also shows that OHEM is orthogonal to recent bells and whistles that enhance object detection accuracy. OHEM with the following two additions yields state-of-the-art results on