30 Apr 2024 | Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley
SemantiCodec is a novel audio codec designed to compress audio into fewer than 100 tokens per second across diverse audio types, including speech, general audio, and music, without compromising quality. It features a dual-encoder architecture: a semantic encoder using a self-supervised AudioMAE and an acoustic encoder. The semantic encoder captures semantic details through k-means clustering on extensive audio data, while the acoustic encoder captures remaining details. The outputs of both encoders are used to reconstruct audio via a diffusion-model-based decoder. SemantiCodec is presented in three variants with token rates of 25, 50, and 100 tokens per second, supporting ultra-low bitrates between 0.31 kbps and 1.43 kbps. Experimental results demonstrate that SemantiCodec significantly outperforms the state-of-the-art Descript codec on reconstruction quality and contains significantly richer semantic information, even at lower bitrates. The code and demos are available at <https://haoheliu.github.io/SemantiCodec/>.SemantiCodec is a novel audio codec designed to compress audio into fewer than 100 tokens per second across diverse audio types, including speech, general audio, and music, without compromising quality. It features a dual-encoder architecture: a semantic encoder using a self-supervised AudioMAE and an acoustic encoder. The semantic encoder captures semantic details through k-means clustering on extensive audio data, while the acoustic encoder captures remaining details. The outputs of both encoders are used to reconstruct audio via a diffusion-model-based decoder. SemantiCodec is presented in three variants with token rates of 25, 50, and 100 tokens per second, supporting ultra-low bitrates between 0.31 kbps and 1.43 kbps. Experimental results demonstrate that SemantiCodec significantly outperforms the state-of-the-art Descript codec on reconstruction quality and contains significantly richer semantic information, even at lower bitrates. The code and demos are available at <https://haoheliu.github.io/SemantiCodec/>.