MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

9 Dec 2019 | Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville
The paper introduces MelGAN, a non-autoregressive, fully convolutional architecture designed to generate high-quality raw audio waveforms using GANs. The authors address the challenges of generating coherent raw audio waveforms with GANs by introducing architectural changes and simple training techniques. Subjective evaluation using Mean Opinion Score (MOS) shows the effectiveness of the proposed approach for mel-spectrogram inversion. The model is evaluated in various tasks, including speech synthesis, music domain translation, and unconditional music synthesis, demonstrating its generality and performance. MelGAN is also shown to be significantly faster than competing models, running at more than 100x the speed of real-time on a GTX 1080Ti GPU and more than 2x the speed on a CPU. The paper provides guidelines for designing general-purpose discriminators and generators for conditional sequence synthesis tasks and discusses the importance of normalization techniques and feature matching in the training process.The paper introduces MelGAN, a non-autoregressive, fully convolutional architecture designed to generate high-quality raw audio waveforms using GANs. The authors address the challenges of generating coherent raw audio waveforms with GANs by introducing architectural changes and simple training techniques. Subjective evaluation using Mean Opinion Score (MOS) shows the effectiveness of the proposed approach for mel-spectrogram inversion. The model is evaluated in various tasks, including speech synthesis, music domain translation, and unconditional music synthesis, demonstrating its generality and performance. MelGAN is also shown to be significantly faster than competing models, running at more than 100x the speed of real-time on a GTX 1080Ti GPU and more than 2x the speed on a CPU. The paper provides guidelines for designing general-purpose discriminators and generators for conditional sequence synthesis tasks and discusses the importance of normalization techniques and feature matching in the training process.
Reach us at info@study.space
Understanding MelGAN%3A Generative Adversarial Networks for Conditional Waveform Synthesis