A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: first forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human fixations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch achieve only 84%.
The model is based on graph theory, using Markov chains to compute activation and normalization. It defines dissimilarity between feature maps and uses this to form activation maps. The normalization step concentrates mass on activation maps, enhancing their predictive power. The model is compared against existing benchmarks on a dataset of grayscale images of natural environments, showing superior performance in predicting human fixations.
The proposed method, GBVS, outperforms traditional approaches in predicting human fixations, achieving a higher ROC area. It is robust to variations in salient regions and is biologically plausible due to its parallel nature. The model is extended to a multiresolution version, improving performance with minimal additional computation. The results demonstrate that GBVS is a powerful and efficient method for computing visual saliency, showing remarkable consistency with human attentional deployment.A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: first forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human fixations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch achieve only 84%.
The model is based on graph theory, using Markov chains to compute activation and normalization. It defines dissimilarity between feature maps and uses this to form activation maps. The normalization step concentrates mass on activation maps, enhancing their predictive power. The model is compared against existing benchmarks on a dataset of grayscale images of natural environments, showing superior performance in predicting human fixations.
The proposed method, GBVS, outperforms traditional approaches in predicting human fixations, achieving a higher ROC area. It is robust to variations in salient regions and is biologically plausible due to its parallel nature. The model is extended to a multiresolution version, improving performance with minimal additional computation. The results demonstrate that GBVS is a powerful and efficient method for computing visual saliency, showing remarkable consistency with human attentional deployment.