SEGAN: Speech Enhancement Generative Adversarial Network

SEGAN: Speech Enhancement Generative Adversarial Network

9 Jun 2017 | Santiago Pascual, Antonio Bonafonte, Joan Serrà
SEGAN is a speech enhancement method based on generative adversarial networks (GANs). Unlike traditional methods that operate in the spectral domain, SEGAN works directly on the raw audio waveform, enabling end-to-end training. It incorporates 28 speakers and 40 noise conditions into a single model, sharing parameters across them. The model is evaluated on an independent test set with two speakers and 20 noise conditions, showing effective enhancement. Objective and subjective evaluations confirm its effectiveness. SEGAN's advantages include fast processing, no need for hand-crafted features, and generalization across speakers and noise types. It uses a fully convolutional architecture with skip connections and latent vectors, and employs LSGAN for training. The model outperforms traditional methods like Wiener filtering in speech quality metrics and subjective tests. The study opens new possibilities for generative architectures in speech enhancement, potentially incorporating more speech-centric design choices. The work is supported by a research project and uses TensorFlow for implementation. The code and results are available online.SEGAN is a speech enhancement method based on generative adversarial networks (GANs). Unlike traditional methods that operate in the spectral domain, SEGAN works directly on the raw audio waveform, enabling end-to-end training. It incorporates 28 speakers and 40 noise conditions into a single model, sharing parameters across them. The model is evaluated on an independent test set with two speakers and 20 noise conditions, showing effective enhancement. Objective and subjective evaluations confirm its effectiveness. SEGAN's advantages include fast processing, no need for hand-crafted features, and generalization across speakers and noise types. It uses a fully convolutional architecture with skip connections and latent vectors, and employs LSGAN for training. The model outperforms traditional methods like Wiener filtering in speech quality metrics and subjective tests. The study opens new possibilities for generative architectures in speech enhancement, potentially incorporating more speech-centric design choices. The work is supported by a research project and uses TensorFlow for implementation. The code and results are available online.
Reach us at info@study.space
[slides] SEGAN%3A Speech Enhancement Generative Adversarial Network | StudySpace