LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

5 Apr 2019 | Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu
This paper introduces a new speech corpus called "LibriTTS," designed for text-to-speech (TTS) applications. Derived from the LibriSpeech corpus, LibriTTS addresses several limitations of LibriSpeech, such as lower sampling rates, improper sentence splitting, and the absence of punctuation and capitalization. The corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and their corresponding texts. Experimental results show that neural end-to-end TTS models trained from LibriTTS achieved mean opinion scores (MOS) above 4.0 in naturalness for five out of six evaluation speakers. The corpus is freely available for download and is expected to accelerate TTS research. Future work includes evaluating the impact of speaker imbalance, preserving punctuation and capitalization, and expanding the corpus with more speakers and languages.This paper introduces a new speech corpus called "LibriTTS," designed for text-to-speech (TTS) applications. Derived from the LibriSpeech corpus, LibriTTS addresses several limitations of LibriSpeech, such as lower sampling rates, improper sentence splitting, and the absence of punctuation and capitalization. The corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and their corresponding texts. Experimental results show that neural end-to-end TTS models trained from LibriTTS achieved mean opinion scores (MOS) above 4.0 in naturalness for five out of six evaluation speakers. The corpus is freely available for download and is expected to accelerate TTS research. Future work includes evaluating the impact of speaker imbalance, preserving punctuation and capitalization, and expanding the corpus with more speakers and languages.
Reach us at info@study.space