CoMoSVC: Consistency Model-based Singing Voice Conversion

CoMoSVC: Consistency Model-based Singing Voice Conversion

3 Jan 2024 | Yiwen Lu, Zhen Ye, Wei Xue††, Xu Tan, Qifeng Liu, Yike Guo††
The paper introduces CoMoSVC, a consistency model-based Singing Voice Conversion (SVC) method designed to achieve high-quality, high-similarity, and high-speed conversion. CoMoSVC uses a diffusion-based teacher model and a student model distilled from it to achieve one-step sampling, significantly improving inference speed compared to existing diffusion-based SVC methods. The teacher model, based on the EDM architecture, is trained to generate mel-spectrograms, while the student model, derived from the teacher, performs one-step sampling. Experiments on the M4Singer and OpenSinger datasets show that CoMoSVC outperforms state-of-the-art (SOTA) methods in both subjective and objective metrics, achieving comparable or superior conversion performance with faster inference speeds. The paper also discusses the training and sampling processes, and evaluates the impact of sampling steps on conversion quality.The paper introduces CoMoSVC, a consistency model-based Singing Voice Conversion (SVC) method designed to achieve high-quality, high-similarity, and high-speed conversion. CoMoSVC uses a diffusion-based teacher model and a student model distilled from it to achieve one-step sampling, significantly improving inference speed compared to existing diffusion-based SVC methods. The teacher model, based on the EDM architecture, is trained to generate mel-spectrograms, while the student model, derived from the teacher, performs one-step sampling. Experiments on the M4Singer and OpenSinger datasets show that CoMoSVC outperforms state-of-the-art (SOTA) methods in both subjective and objective metrics, achieving comparable or superior conversion performance with faster inference speeds. The paper also discusses the training and sampling processes, and evaluates the impact of sampling steps on conversion quality.
Reach us at info@study.space