Manuscript received November 11, 2015. Manuscript revised February 16, 2016. Manuscript publicized April 5, 2016. | Masanori MORISE†(a), Member, Fumiya YOKOMORI††, Nonmember, and Kenji OZAWA†, Member
The paper introduces WORLD, a vocoder-based speech synthesis system designed to improve sound quality and real-time processing capabilities. WORLD consists of three analysis algorithms (F0 estimation using DIO, spectral envelope estimation using CheapTrick, and aperiodic parameter extraction using PLATINUM) and one synthesis algorithm. The system was evaluated for sound quality and processing speed, showing superior performance compared to conventional systems like STRAIGHT and TANDEM-STRAIGHT. WORLD achieved over ten times faster processing speed and maintained high sound quality, making it suitable for real-time applications such as voice conversion and singing synthesizers. The paper discusses the effectiveness of WORLD, highlighting its advantages in sound quality and processing speed, and outlines future work, including improvements in noise robustness and real-time capabilities.The paper introduces WORLD, a vocoder-based speech synthesis system designed to improve sound quality and real-time processing capabilities. WORLD consists of three analysis algorithms (F0 estimation using DIO, spectral envelope estimation using CheapTrick, and aperiodic parameter extraction using PLATINUM) and one synthesis algorithm. The system was evaluated for sound quality and processing speed, showing superior performance compared to conventional systems like STRAIGHT and TANDEM-STRAIGHT. WORLD achieved over ten times faster processing speed and maintained high sound quality, making it suitable for real-time applications such as voice conversion and singing synthesizers. The paper discusses the effectiveness of WORLD, highlighting its advantages in sound quality and processing speed, and outlines future work, including improvements in noise robustness and real-time capabilities.