A physiologically inspired model for solving the cocktail party problem

Chou, Kenny F.; Dong, Junzi; Colburn, H. Steven; Sen, Kamal

doi:10.1007/s10162-019-00732-4

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

A physiologically inspired model for solving the cocktail party problem
Citation	Chou KF, Dong J, Colburn HS, Sen K. J. Assoc. Res. Otolaryngol. 2019; ePub(ePub): ePub.
Affiliation	Hearing Research Center, Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Room 412, Boston, MA, 02215, USA. kamalsen@bu.edu.
Copyright	(Copyright © 2019, Holtzbrinck Springer Nature Publishing Group)
DOI	10.1007/s10162-019-00732-4
PMID	31392449
Abstract	At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects. Language: en
Keywords	cocktail party problem; cortical mechanisms; sound segregation; spatial tuning