LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

5 Apr 2019 | Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu
This paper introduces LibriTTS, a new speech corpus designed for text-to-speech (TTS) applications. It is derived from the LibriSpeech corpus, which was originally used for automatic speech recognition (ASR) research. LibriTTS addresses several issues that make LibriSpeech less suitable for TTS, such as low sampling rate (16kHz), speech splitting at silences, text normalization, and missing contextual information. The new corpus includes 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and corresponding texts. It is freely available for download from http://www.openslr.org/60/. The LibriTTS corpus was created by modifying the original LibriSpeech data to improve its suitability for TTS tasks. This includes increasing the sampling rate to 24kHz, splitting speech at sentence boundaries, including both original and normalized texts, and removing utterances with significant background noise. The corpus also provides contextual information for better prosody modeling. The paper presents experimental results showing that neural end-to-end TTS models trained on LibriTTS achieved high mean opinion scores (MOS) in naturalness. The results indicate that higher sampling rates (24kHz) lead to better performance. However, there is still a gap between natural and synthesized speech, suggesting the need for further improvements in TTS models. The LibriTTS corpus is a valuable resource for TTS research, offering a large, diverse, and high-quality dataset. It is designed to support a wide range of TTS tasks, including multi-speaker systems, data-efficient training, and voice adaptation. Future work includes evaluating the impact of speaker imbalance, preserving punctuation and capitalization, and the relationship between training data size and synthesized speech quality. The corpus is also planned to be expanded with more speakers and languages.This paper introduces LibriTTS, a new speech corpus designed for text-to-speech (TTS) applications. It is derived from the LibriSpeech corpus, which was originally used for automatic speech recognition (ASR) research. LibriTTS addresses several issues that make LibriSpeech less suitable for TTS, such as low sampling rate (16kHz), speech splitting at silences, text normalization, and missing contextual information. The new corpus includes 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and corresponding texts. It is freely available for download from http://www.openslr.org/60/. The LibriTTS corpus was created by modifying the original LibriSpeech data to improve its suitability for TTS tasks. This includes increasing the sampling rate to 24kHz, splitting speech at sentence boundaries, including both original and normalized texts, and removing utterances with significant background noise. The corpus also provides contextual information for better prosody modeling. The paper presents experimental results showing that neural end-to-end TTS models trained on LibriTTS achieved high mean opinion scores (MOS) in naturalness. The results indicate that higher sampling rates (24kHz) lead to better performance. However, there is still a gap between natural and synthesized speech, suggesting the need for further improvements in TTS models. The LibriTTS corpus is a valuable resource for TTS research, offering a large, diverse, and high-quality dataset. It is designed to support a wide range of TTS tasks, including multi-speaker systems, data-efficient training, and voice adaptation. Future work includes evaluating the impact of speaker imbalance, preserving punctuation and capitalization, and the relationship between training data size and synthesized speech quality. The corpus is also planned to be expanded with more speakers and languages.
Reach us at info@study.space
[slides] LibriTTS%3A A Corpus Derived from LibriSpeech for Text-to-Speech | StudySpace