Standalone Rust implementation of the Mimi streaming neural audio codec, extracted from the Moshi monorepo. Includes a native Candle backend, ONNX Runtime backend, JNI bridge for Android, and Python bindings.
24 kHz mono audio, 12.5 Hz frame rate, configurable 1–32 codebooks.
| Crate | Description |
|---|---|
mimi-core |
Candle-based codec: SEANet encoder/decoder, transformer, RVQ |
mimi-onnx |
ONNX Runtime backend with streaming state management |
mimi-cli |
CLI tool for encode/decode/roundtrip |
mimi-jni |
JNI bridge for Android (used by MimiDemo) |
mimi-pyo3 |
Python bindings via PyO3 |
Pre-exported streaming ONNX models are available on Hugging Face: BMekiker/mimi-onnx-streaming
| Variant | Encoder | Decoder | Total | Bitrate | Precision |
|---|---|---|---|---|---|
| 8 codebooks | 194 MB | 170 MB | 364 MB | ~1.1 kbps | FP32 |
| 16 codebooks | 242 MB | 186 MB | 428 MB | ~2.2 kbps | FP32 |
| 8 codebooks (fp16) | 119 MB | 93 MB | 212 MB | ~1.1 kbps | FP16 weights |
| 16 codebooks (fp16) | 167 MB | 109 MB | 276 MB | ~2.2 kbps | FP16 weights |
The FP16 variants store weights as float16 (~40% smaller) while keeping float32 graph I/O. Computation runs in float32 at runtime — no quality loss, drop-in replacement.
pip install torch transformers onnx onnxruntime
# Export FP32 models
python scripts/export_streaming_onnx.py --num-codebooks 8 --output-dir onnx-models/streaming-8cb
python scripts/export_streaming_onnx.py --num-codebooks 16 --output-dir onnx-models/streaming-16cb
# Convert to weight-only FP16
python scripts/weight_fp16.py --input-dir onnx-models/streaming-8cb --output-dir onnx-models/streaming-8cb-fp16
python scripts/weight_fp16.py --input-dir onnx-models/streaming-16cb --output-dir onnx-models/streaming-16cb-fp16The exported models accept any chunk size that is a multiple of 80ms (1920 samples) at runtime — the chunk size is not baked into the graph.
# Build
cargo build --release -p mimi-cli
# Roundtrip (encode + decode)
mimi-cli roundtrip --input test.wav --output out.wav --num-codebooks 8
# ONNX backend
mimi-cli roundtrip --input test.wav --output out.wav \
--backend onnx \
--encoder-model onnx-models/streaming-8cb/encoder_model.onnx \
--decoder-model onnx-models/streaming-8cb/decoder_model.onnx \
--num-codebooks 8 --streamingcargo install cargo-ndk
cd mimi-jni
cargo ndk -t arm64-v8a -P 24 build --release --features onnx
# Copy .so to the Android project
cp target/aarch64-linux-android/release/libmimi_jni.so \
../MimiDemo/app/src/main/jniLibs/arm64-v8a/| Script | Description |
|---|---|
export_streaming_onnx.py |
Export PyTorch Mimi to streaming ONNX with causal attention masking |
weight_fp16.py |
Convert FP32 ONNX models to weight-only FP16 (~40% smaller, no quality loss) |
export_onnx.py |
Export batch (non-streaming) ONNX models |
compare_backends.py |
Numerical comparison between PyTorch, ONNX batch, and ONNX streaming |
convert_to_gguf.py |
Convert safetensors to GGUF (Q4_0, Q8_0) |