April 10, 2007 | Thomas Serre**, Aude Oliva*, and Tomaso Poggio**
A feedforward architecture accounts for rapid categorization. This study presents a computational model of object recognition that extends the Hubel and Wiesel simple-to-complex cell hierarchy and accounts for anatomical and physiological constraints. The model, which is a feedforward architecture, is tested on a rapid masked animal vs. non-animal categorization task and shows performance comparable to humans. The model is based on a hierarchy of simple (S) and complex (C) cells, with S units responding to oriented bars and edges, and C units pooling inputs to become invariant to position and scale. The model is trained using a combination of unsupervised and supervised learning, with the latter being task-specific. The model's performance is evaluated on a task involving four categories of animal images, with varying distances from the camera. The model shows high accuracy and agreement with human observers, and is robust to image rotation. The study suggests that a feedforward architecture can account for rapid visual processing, as it matches human performance in a task requiring ultra-rapid categorization. The model is also robust to parameter variations and learning rules, and is consistent with physiological data on the visual cortex. The study supports the idea that a feedforward architecture can provide a satisfactory description of information processing in the ventral stream of the visual cortex.A feedforward architecture accounts for rapid categorization. This study presents a computational model of object recognition that extends the Hubel and Wiesel simple-to-complex cell hierarchy and accounts for anatomical and physiological constraints. The model, which is a feedforward architecture, is tested on a rapid masked animal vs. non-animal categorization task and shows performance comparable to humans. The model is based on a hierarchy of simple (S) and complex (C) cells, with S units responding to oriented bars and edges, and C units pooling inputs to become invariant to position and scale. The model is trained using a combination of unsupervised and supervised learning, with the latter being task-specific. The model's performance is evaluated on a task involving four categories of animal images, with varying distances from the camera. The model shows high accuracy and agreement with human observers, and is robust to image rotation. The study suggests that a feedforward architecture can account for rapid visual processing, as it matches human performance in a task requiring ultra-rapid categorization. The model is also robust to parameter variations and learning rules, and is consistent with physiological data on the visual cortex. The study supports the idea that a feedforward architecture can provide a satisfactory description of information processing in the ventral stream of the visual cortex.