[slides and audio] Selective cortical representation of attended speaker in multi-talker speech perception

The study by Nima Mesgarani and Edward F. Chang investigates how the human auditory system processes attended speech in a multi-talker environment. Using multi-electrode recordings from the non-primary auditory cortex of subjects, the researchers demonstrate that population responses in this region encode critical features of the attended speaker's speech. Speech spectrograms reconstructed from cortical responses to a mixture of two speakers reveal the salient spectral and temporal features of the attended speaker, as if the subjects were listening to that speaker alone. A simple classifier trained on single-speaker examples can decode both attended words and speaker identity. Task performance is well predicted by the rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings suggest that the cortical representation of speech is not merely a reflection of the external acoustic environment but gives rise to the perceptual aspects relevant for the listener's intended goal. The study also highlights the distributed nature of attentional modulation, which is not driven by a few spatially discrete sites but is spread across the population of responsive sites. These results have implications for models of auditory scene analysis and provide insights into how the brain solves the cocktail party problem, which remains a significant challenge for automatic speech recognition algorithms.The study by Nima Mesgarani and Edward F. Chang investigates how the human auditory system processes attended speech in a multi-talker environment. Using multi-electrode recordings from the non-primary auditory cortex of subjects, the researchers demonstrate that population responses in this region encode critical features of the attended speaker's speech. Speech spectrograms reconstructed from cortical responses to a mixture of two speakers reveal the salient spectral and temporal features of the attended speaker, as if the subjects were listening to that speaker alone. A simple classifier trained on single-speaker examples can decode both attended words and speaker identity. Task performance is well predicted by the rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings suggest that the cortical representation of speech is not merely a reflection of the external acoustic environment but gives rise to the perceptual aspects relevant for the listener's intended goal. The study also highlights the distributed nature of attentional modulation, which is not driven by a few spatially discrete sites but is spread across the population of responsive sites. These results have implications for models of auditory scene analysis and provide insights into how the brain solves the cocktail party problem, which remains a significant challenge for automatic speech recognition algorithms.

Selective cortical representation of attended speaker in multi-talker speech perception

2012 May 10; 485(7397): . . doi:10.1038/nature11020. | Nima Mesgarani and Edward F. Chang