Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

27 Apr 2024 | Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialong Zuo, Shulei Wang, Zhou Zhao
The paper "Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models" addresses the challenges in using discrete acoustic codecs for downstream speech language models. The authors identify several gaps, including the limited training data for codecs, the need for numerous codebooks, and the excessive information in the initial channel of codebooks. To address these issues, they propose Language-Codec, which introduces a Masked Channel Residual Vector Quantization (MCRVQ) mechanism, improved Fourier transform structures, larger training datasets, and optimized hyperparameters. The Language-Codec model achieves excellent audio reconstruction quality with only four codebook channels, outperforming competing audio compression algorithms across various metrics and datasets. The paper also validates the efficiency of Language-Codec on downstream speech language models and provides open-source code and pre-trained models. The contributions of Language-Codec include its innovative MCRVQ structure, enhanced decoder structure, and superior performance in audio reconstruction and downstream tasks.The paper "Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models" addresses the challenges in using discrete acoustic codecs for downstream speech language models. The authors identify several gaps, including the limited training data for codecs, the need for numerous codebooks, and the excessive information in the initial channel of codebooks. To address these issues, they propose Language-Codec, which introduces a Masked Channel Residual Vector Quantization (MCRVQ) mechanism, improved Fourier transform structures, larger training datasets, and optimized hyperparameters. The Language-Codec model achieves excellent audio reconstruction quality with only four codebook channels, outperforming competing audio compression algorithms across various metrics and datasets. The paper also validates the efficiency of Language-Codec on downstream speech language models and provides open-source code and pre-trained models. The contributions of Language-Codec include its innovative MCRVQ structure, enhanced decoder structure, and superior performance in audio reconstruction and downstream tasks.
Reach us at info@study.space
[slides] Language-Codec%3A Reducing the Gaps Between Discrete Codec Representation and Speech Language Models | StudySpace