Skip to content

[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound

Notifications You must be signed in to change notification settings

mesolitica/UniCodec-fix

 
 

Repository files navigation

UniCodec (ACL 2025 Main)

UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
Yidi Jiang,Qian Chen,Shengpeng Ji,Yu Xi,Wen Wang,Chong Zhang,Xianghu Yue,Shiliang Zhang,Haizhou Li
National University of Singapore; Tongyi Speech Lab

In this work, we introduce UniCodec, a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound.

comparison

To achieve this, we propose a partitioned domain-adaptive codebook method with domain Mixture-of-Experts strategy to capture the distinct characteristics of each audio domain. Furthermore, to enrich the semantic density of the codec without auxiliary modules, we propose a self-supervised mask prediction modeling approach.

As a single unified codec model, UniCodec achieves superior subjective reconstruction performance while maintaining a high compression rate in all three domains (speech/music/sound).

main

Installation

pip3 install git+https://github.com/mesolitica/UniCodec-fix

Encode decode

from encodec.utils import convert_audio
from unicodec.decoder.pretrained import Unicodec
import torchaudio
import torch
config = 'configs/unicodec_frame75_10s_nq1_code16384_dim512_finetune.yaml'

# !wget https://huggingface.co/Yidiii/UniCodec_ckpt/resolve/main/unicode.ckpt
model = Unicodec.from_pretrained0802(config, 'unicode.ckpt')

wav, sr = torchaudio.load('husein-assistant-trim.mp3')
wav = convert_audio(wav, sr, 24000, 1) 
bandwidth_id = torch.tensor([0])

# 0 for speech, 1 for music, 2 for audio based on https://github.com/mesolitica/UniCodec-fix/blob/main/encoder/quantization/simvq_moe.py#L161
_, discrete_code = model.encode_infer(wav, '2', bandwidth_id=bandwidth_id)
features = model.codes_to_features(discrete_code)

audio_out = model.decode(features, bandwidth_id=bandwidth_id)

Config and model also already mirrored at https://huggingface.co/huseinzol05/UniCodec-mirror

Citation

@article{jiang2025unicodec,
  title={UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook},
  author={Jiang, Yidi and Chen, Qian and Ji, Shengpeng and Xi, Yu and Wang, Wen and Zhang, Chong and Yue, Xianghu and Zhang, ShiLiang and Li, Haizhou},
  journal={arXiv preprint arXiv:2502.20067},
  year={2025}
}

About

[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.9%
  • Jupyter Notebook 43.1%