Un-0 is an image-generation model built on Kuramoto dynamics: it generates an image by integrating the phase dynamics of a population of coupled oscillators — no diffusion schedule, no adversary, no iterative denoising.
It comes from Unconventional AI, where we research dynamical systems as a computing substrate: they map onto analog and physical hardware that points toward orders-of-magnitude (on the order of 1000×) lower energy for AI than today's digital accelerators.
This repository is a plain-PyTorch reference implementation with two independent training pipelines — CIFAR-10 (32×32, 10 classes) and ImageNet-64 (64×64, 1000 classes) — trained separately. It ships pretrained checkpoints for inference and the full training recipe to reproduce the results below. Training is verified on NVIDIA A100, H200, and B200 GPUs.
For more details, please read this blogpost.
The tables below report FID for the released checkpoints, by oscillator count. CIFAR-10 is scored with clean-FID; ImageNet-64 with the OpenAI ADM evaluator (the field-standard ImageNet-64 number), so the two columns are not directly comparable. See Evaluation for both methodologies.
| CIFAR-10 | params | clean-FID | ImageNet-64 | params | ADM FID | |
|---|---|---|---|---|---|---|
cifar10/n4096 |
19.4M | 8.86 | imagenet64/n16384 |
322M | 6.74 | |
cifar10/n2048 |
4.9M | 9.38 | imagenet64/n10240 |
130M | 8.04 | |
cifar10/n1024 |
1.3M | 11.16 | imagenet64/n6656 |
57M | 8.36 |
Requires Python ≥ 3.11 and PyTorch ≥ 2.11; a CUDA GPU is recommended. Install with uv:
uv sync --group devTwo optional dependency groups are pulled in only when you need them:
logging— Weights & Biases (uv sync --group dev --group logging), for training metrics and sample grids. Without it, no W&B calls are made.eval—clean-fid(uv sync --group eval), for FID evaluation.
The fastest way to try out the repo is to load a pretrained checkpoint from the Hugging Face Hub — the architecture is rebuilt from the checkpoint's own config, so the right model is constructed automatically:
import torch
from un0 import ConditionalImplicitKuramotoGenerator
from torchvision.utils import save_image # already a dependency
model = ConditionalImplicitKuramotoGenerator.from_pretrained("cifar10/n4096")
class_ids = torch.tensor([0, 1, 2, 3, 4]) # one image per listed class
images = model.sample_images(class_ids) # (5, 3, 32, 32) in [0, 1]
save_image(images, "samples.png", nrow=5) # writes a viewable PNGThe snippet above writes samples.png — one image each for classes 0–4
(airplane, automobile, bird, cat, deer):
When the CLI or helpers use the default auto device selection, they prefer
CUDA, then Apple Silicon MPS, then CPU.
Six checkpoints are available, named {task}/n{oscillators}:
cifar10/n1024, cifar10/n2048, cifar10/n4096, imagenet64/n6656,
imagenet64/n10240, imagenet64/n16384.
For example, cifar10/n2048 means it's a checkpoint trained from cifar10, which has 2048
oscillators in the Kuramoto model.
inference.py arranges all the samples into a single image grid (one PNG, one
row per class) and writes it to --output. Point it at a released name or a
local checkpoint.
# All 10 CIFAR-10 classes, 10 images each → one 10×10 grid PNG:
uv run python un0/inference.py \
--checkpoint checkpoints/cifar10/final.pt \
--output samples/cifar10.png
# A subset of classes, from released weights → one 3×8 grid PNG:
uv run python un0/inference.py \
--pretrained imagenet64/n16384 \
--classes 0 1 207 \
--samples-per-class 8 \
--output samples/imagenet64.png--pretrained and --checkpoint are mutually exclusive; the same CLI serves
both tasks.
Training has been verified on NVIDIA B200, H200, and A100 GPUs.
Self-contained: the first run downloads uoft-cs/cifar10 (~170 MB) into the
HuggingFace cache (no account or token needed); later runs are instant.
Single GPU:
uv run python un0/train_cifar10.py --checkpoint-dir checkpoints/cifar10Multi-GPU (8-GPU DDP, matching the reference effective batch of 8192):
uv run torchrun --nproc_per_node=8 un0/train_cifar10.py \
--checkpoint-dir checkpoints/cifar10 \
--wandb-project <your-wandb-project>The defaults reproduce the reference configuration. Every hyperparameter is a
CLI flag — run uv run python un0/train_cifar10.py --help for the full list and
its defaults. When --wandb-project is set, rank 0 logs per-step metrics and
uploads a 10×10 class-conditional sample grid every 100 epochs.
To skip live DINO on batch reals (generator DINO stays live), precompute a
view bank once, then train with --queue-size 0 and
--precomputed-dino-features:
# One-time: CUDA required; writes data/cifar10_train_dino_views.pt (~1.7 GiB bf16)
uv run python scripts/precompute_dino_features.py \
--output data/cifar10_train_dino_views.pt
uv run python un0/train_cifar10.py \
--checkpoint-dir checkpoints/cifar10 \
--queue-size 0 \
--precomputed-dino-features data/cifar10_train_dino_views.ptRow i in the bank matches HuggingFace train index i. Queue mode is
incompatible with this path.
ImageNet-64 uses a separate entry point, un0/train_imagenet.py, and the
1000-class model from build_imagenet64_model.
Bring your own ImageNet data. Unlike CIFAR-10, ImageNet-64 training does not download anything for you. The repo provides the reference 64×64 preprocessing transform and a reference filesystem-backed dataloader; you supply the source ImageNet images and adapt the loader to your own storage. What ships is enough to reproduce the data path, not a turnkey download.
1. Preprocess ImageNet into a 64×64 PNG tree. preprocess_imagenet in
scripts/imagenet_preprocessing.py is the reference transform: box-then-bicubic
resize with a center crop, matching OpenAI's VIRTUAL_imagenet64_labeled.npz
and the ADM pipeline so FID is comparable to ADM/EDM/DiT. Apply it to each
source image and store the result losslessly as
<root>/<split>/<class_id:05d>/<index:08d>.png — the layout
build_imagenet64_dataloader (in un0/imagenet_data.py) reads. How you iterate
the source images and where you persist the tree is up to you.
2. Train (8-GPU DDP):
uv run torchrun --nproc_per_node=8 un0/train_imagenet.py \
--data-root /data/imagenet64/train \
--val-root /data/imagenet64/val \
--batch-size 2048 \
--fid-every-epochs 50 \
--checkpoint-dir checkpoints/imagenet64 \
--wandb-project <your-wandb-project>--batch-size is per device; on 8 GPUs 2048 gives the reference global batch
of 16384. build_imagenet64_dataloader is a reference loader over the
preprocessed tree — point it at your own data and swap in your storage backend
(object store, webdataset, a streaming loader) as long as it yields the same
{"data", "class_id"} batch contract.
We report clean-FID (Parmar et al., CVPR 2022). Generated samples are class-balanced so label marginals match the real set.
un0/eval.py loads a checkpoint, generates class-balanced samples, and scores
them against the CIFAR-10 train split using clean-fid's precomputed reference
statistics:
uv sync --group eval
uv run python un0/eval.py \
--checkpoint checkpoints/cifar10/final.pt \
--num-samples 50000 \
--output fid.jsonPrints FID: <value> and, with --output, writes a JSON summary. On first use,
clean-fid downloads the CIFAR-10 reference statistics (~60 MB). Runtime is a
few minutes on a single GPU, dominated by writing 50k PNGs to a tempdir for the
reference pipeline.
The only FID computed in this repo for ImageNet-64 is the in-training proxy
logged by un0/train_imagenet.py: clean-FID against custom validation statistics
built from --val-root, conditioning generation on the real validation labels
1-to-1. Under DDP the generation is sharded across ranks and scored on rank 0.
The headline ImageNet-64 numbers reported by ADM/EDM/DiT/EDM2 instead come from
OpenAI's ADM evaluator
scored against VIRTUAL_imagenet64_labeled.npz. The
preprocessing transform matches that reference's pixels, so those results are
comparable — but the ADM evaluator is a separate TensorFlow tool reporting a
different number from clean-FID, and running it is left to the reader.
To measure how much of the generation quality comes from the Kuramoto dynamics (versus the decoder alone), the model is run through an eight-experiment ablation under the release recipe (AdamW, dense coupling), each swept over learning rate (one LR per GPU):
| ablation | flags | dynamics |
|---|---|---|
decoder_only_raw |
--num-steps 0 --encoding raw |
none; decoder reads raw phases (no readout transform) |
decoder_only |
--num-steps 0 |
none; decoder reads the sin/cos readout of random phases |
reservoir_euler1 |
--num-steps 1 --solver euler --freeze-dynamics |
frozen random |
reservoir_euler10 |
--num-steps 10 --solver euler --freeze-dynamics |
frozen random |
trained_euler1/2/5/10 |
--solver euler --num-steps 1/2/5/10 |
trained |
The dynamics-free rows (decoder_only*, reservoir_*) train only the decoder.
The two decoder_only rows isolate the readout transform: decoder_only feeds
the sin/cos readout to the decoder, decoder_only_raw feeds the raw phases
directly.
export WANDB_API_KEY=...
ablation_study/run_ablation.sh --wandb-project un0-ablationsPhase 1 sweeps the learning rate per experiment and ranks by FID; Phase 2 runs
the best LR per experiment at full length. The best LR per experiment is written
to outputs/dynamics/best_lr.json, and each run's FID lands in its own
fid.json. See ablation_study/ for the runner and
LR-sweep launcher.
MIT — see LICENSE.
Third-party components:
- ImageNet preprocessing in
scripts/imagenet_preprocessing.pyis adapted from openai/guided-diffusion (MIT, Copyright (c) 2021 OpenAI). - The DINOv2 backbone is loaded at runtime from facebookresearch/dinov2 (Apache-2.0); not vendored here.

