9 Jun 2017 | Santiago Pascual, Antonio Bonafonte, Joan Serrà
The paper introduces SEGAN (Speech Enhancement Generative Adversarial Network), a novel approach to speech enhancement using generative adversarial networks (GANs). Unlike traditional methods that operate on spectral domains or higher-level features, SEGAN works at the waveform level, training an end-to-end model to enhance speech contaminated by noise. The model incorporates 28 speakers and 40 different noise conditions, sharing parameters across them to achieve generalizability. The effectiveness of SEGAN is evaluated using an independent test set with two speakers and 20 noise conditions, both objectively and subjectively. Objective metrics such as PESQ, CSIG, CBAK, COVL, and SSNR show that SEGAN outperforms the Wiener method in reducing speech distortion and removing noise. Subjective evaluations, conducted by 16 listeners, also favor SEGAN over the noisy signal and the Wiener baseline, with SEGAN being preferred in 67% of cases compared to the noisy signal and in 53% of cases compared to the Wiener system. The study opens the door to further exploration of GANs in speech enhancement, potentially improving performance through additional design choices.The paper introduces SEGAN (Speech Enhancement Generative Adversarial Network), a novel approach to speech enhancement using generative adversarial networks (GANs). Unlike traditional methods that operate on spectral domains or higher-level features, SEGAN works at the waveform level, training an end-to-end model to enhance speech contaminated by noise. The model incorporates 28 speakers and 40 different noise conditions, sharing parameters across them to achieve generalizability. The effectiveness of SEGAN is evaluated using an independent test set with two speakers and 20 noise conditions, both objectively and subjectively. Objective metrics such as PESQ, CSIG, CBAK, COVL, and SSNR show that SEGAN outperforms the Wiener method in reducing speech distortion and removing noise. Subjective evaluations, conducted by 16 listeners, also favor SEGAN over the noisy signal and the Wiener baseline, with SEGAN being preferred in 67% of cases compared to the noisy signal and in 53% of cases compared to the Wiener system. The study opens the door to further exploration of GANs in speech enhancement, potentially improving performance through additional design choices.