The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

11 Jun 2024 | Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units aims to explore the potential of discrete speech representations in various applications, including automatic speech recognition (ASR), text-to-speech (TTS), and singing voice synthesis (SVS). The challenge includes three main tasks: multilingual ASR, TTS (divided into a single-speaker track and a vocoder track), and SVS. The paper outlines the challenge designs, baseline systems, and evaluation metrics, and presents preliminary results from submitted systems. Key findings include the effectiveness of SSL-based discrete units in ASR and TTS, and the robust performance of SSL-based units in SVS. The challenge seeks to advance innovation in discrete speech unit processing and provide a unified evaluation platform for future research.The Interspeech 2024 Challenge on Speech Processing Using Discrete Units aims to explore the potential of discrete speech representations in various applications, including automatic speech recognition (ASR), text-to-speech (TTS), and singing voice synthesis (SVS). The challenge includes three main tasks: multilingual ASR, TTS (divided into a single-speaker track and a vocoder track), and SVS. The paper outlines the challenge designs, baseline systems, and evaluation metrics, and presents preliminary results from submitted systems. Key findings include the effectiveness of SSL-based discrete units in ASR and TTS, and the robust performance of SSL-based units in SVS. The challenge seeks to advance innovation in discrete speech unit processing and provide a unified evaluation platform for future research.
Reach us at info@study.space
Understanding The Interspeech 2024 Challenge on Speech Processing Using Discrete Units