Skip to content

tatujan/MimiCodec

Repository files navigation

Mimi Codec

Standalone Rust implementation of the Mimi streaming neural audio codec, extracted from the Moshi monorepo. Includes a native Candle backend, ONNX Runtime backend, JNI bridge for Android, and Python bindings.

24 kHz mono audio, 12.5 Hz frame rate, configurable 1–32 codebooks.

Crates

Crate Description
mimi-core Candle-based codec: SEANet encoder/decoder, transformer, RVQ
mimi-onnx ONNX Runtime backend with streaming state management
mimi-cli CLI tool for encode/decode/roundtrip
mimi-jni JNI bridge for Android (used by MimiDemo)
mimi-pyo3 Python bindings via PyO3

ONNX Streaming Models

Pre-exported streaming ONNX models are available on Hugging Face: BMekiker/mimi-onnx-streaming

Variant Encoder Decoder Total Bitrate Precision
8 codebooks 194 MB 170 MB 364 MB ~1.1 kbps FP32
16 codebooks 242 MB 186 MB 428 MB ~2.2 kbps FP32
8 codebooks (fp16) 119 MB 93 MB 212 MB ~1.1 kbps FP16 weights
16 codebooks (fp16) 167 MB 109 MB 276 MB ~2.2 kbps FP16 weights

The FP16 variants store weights as float16 (~40% smaller) while keeping float32 graph I/O. Computation runs in float32 at runtime — no quality loss, drop-in replacement.

Exporting ONNX models

pip install torch transformers onnx onnxruntime

# Export FP32 models
python scripts/export_streaming_onnx.py --num-codebooks 8 --output-dir onnx-models/streaming-8cb
python scripts/export_streaming_onnx.py --num-codebooks 16 --output-dir onnx-models/streaming-16cb

# Convert to weight-only FP16
python scripts/weight_fp16.py --input-dir onnx-models/streaming-8cb --output-dir onnx-models/streaming-8cb-fp16
python scripts/weight_fp16.py --input-dir onnx-models/streaming-16cb --output-dir onnx-models/streaming-16cb-fp16

The exported models accept any chunk size that is a multiple of 80ms (1920 samples) at runtime — the chunk size is not baked into the graph.

CLI Usage

# Build
cargo build --release -p mimi-cli

# Roundtrip (encode + decode)
mimi-cli roundtrip --input test.wav --output out.wav --num-codebooks 8

# ONNX backend
mimi-cli roundtrip --input test.wav --output out.wav \
  --backend onnx \
  --encoder-model onnx-models/streaming-8cb/encoder_model.onnx \
  --decoder-model onnx-models/streaming-8cb/decoder_model.onnx \
  --num-codebooks 8 --streaming

Building for Android

cargo install cargo-ndk

cd mimi-jni
cargo ndk -t arm64-v8a -P 24 build --release --features onnx

# Copy .so to the Android project
cp target/aarch64-linux-android/release/libmimi_jni.so \
   ../MimiDemo/app/src/main/jniLibs/arm64-v8a/

Scripts

Script Description
export_streaming_onnx.py Export PyTorch Mimi to streaming ONNX with causal attention masking
weight_fp16.py Convert FP32 ONNX models to weight-only FP16 (~40% smaller, no quality loss)
export_onnx.py Export batch (non-streaming) ONNX models
compare_backends.py Numerical comparison between PyTorch, ONNX batch, and ONNX streaming
convert_to_gguf.py Convert safetensors to GGUF (Q4_0, Q8_0)

License

This project contains code derived from Moshi by Kyutai.

  • Code: MIT License — see LICENSE
  • Model weights (Mimi codec): CC-BY 4.0 by Kyutai

About

Rust implementation of Mimi audio codec from Kyutai/moshi

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors