28 Nov 2017 | Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Coho, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis
The paper introduces a novel method called *Probability Density Distillation* to train a parallel feed-forward network from a trained WaveNet model, achieving high-fidelity speech synthesis at over 20 times faster speed than real-time. This method allows for efficient sampling while maintaining the quality of the original WaveNet, making it suitable for real-time production settings. The resulting system is deployed by Google Assistant, serving multiple English and Japanese voices. The paper details the original WaveNet model, the parallel WaveNet architecture, and the distillation process. Experimental results show no significant loss in quality compared to the original WaveNet and superior performance over previous benchmarks. The system has been successfully deployed in production at Google, serving millions of users.The paper introduces a novel method called *Probability Density Distillation* to train a parallel feed-forward network from a trained WaveNet model, achieving high-fidelity speech synthesis at over 20 times faster speed than real-time. This method allows for efficient sampling while maintaining the quality of the original WaveNet, making it suitable for real-time production settings. The resulting system is deployed by Google Assistant, serving multiple English and Japanese voices. The paper details the original WaveNet model, the parallel WaveNet architecture, and the distillation process. Experimental results show no significant loss in quality compared to the original WaveNet and superior performance over previous benchmarks. The system has been successfully deployed in production at Google, serving millions of users.