spine

A neural audio codec for expressive speech.

Spine encodes 24 kHz mono audio into multi-scale FSQ tokens at 1.57 kbps (~88 tokens/s) across four temporal scales (~6 / 12 / 23 / 47 Hz), keeping sequences short for downstream language models. The convolutional decoder is hard-bandlimited at 6 kHz by a fixed crossover; a filtered-noise branch and a complex-STFT head synthesize the high band under purely adversarial supervision, eliminating the high-frequency static typical of GAN codecs.

115M-parameter generator: conv encoder/decoder with a 512-d transformer bottleneck (8 + 12 layers)
Multi-scale FSQ (pool → quantize → repeat) on a shared latent, with no codebook collapse
Reconstruction losses bandlimited below the crossover; the high band is owned by the DDSP split

Installation

pip install spine-codec

Training pulls in extra dependencies (wandb):

pip install "spine-codec[train]"

For development, clone this repo and run uv sync.

Usage

The pretrained model is downloaded from twangodev/spine-codec on first use; pass --checkpoint to use a local training checkpoint instead.

spine encode --input speech.wav --output codes.pt
spine decode --input codes.pt --output speech.wav
spine recon  --input speech.wav --output roundtrip.wav

import torchaudio
from spine import Spine

model = Spine.from_pretrained("twangodev/spine-codec")
audio, sr = torchaudio.load("speech.wav")  # 24 kHz mono
codes = model.encode(audio.unsqueeze(0))
reconstruction = model.decode(codes)

Training

spine train --config configs/train.yaml

Training configs live in the repo (not the wheel), so train from a git checkout with the train extra installed.

YAML configs are sparse overrides on top of the defaults in spine/config.py.

Acknowledgements

The architecture builds on Mimi, SNAC, DAC, FSQ, and DDSP.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
configs		configs
spine		spine
tests		tests
.gitignore		.gitignore
.python-version		.python-version
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
release-please-config.json		release-please-config.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spine

Installation

Usage

Training

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spine

Installation

Usage

Training

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages