lance-mlx

Note: "Lance" here refers to ByteDance Intelligent Creation Lab's unified multimodal model (paper, weights), not Lance/LanceDB (the columnar data format).

MLX port of Lance for Apple Silicon. Lance is a 3B-active / ~12B-total parameter dual-stream Mixture-of-Transformer-Experts model that unifies image and video understanding, generation, and editing in a single framework. This package brings Lance to Apple Silicon via MLX, with weights hosted on the mlx-community HuggingFace organization.

📦 Weights on Hugging Face (`mlx-community`)

All three repos live in the Lance MLX collection for one-click browsing.

Repo	Status	Use for
`mlx-community/Lance-3B-bf16`	🟢 Production	`t2i`, `image_edit`, `x2t_image` (full quality, ~15 GB)
`mlx-community/Lance-3B-8bit`	🟢 Production	Same as above, 2.7× faster, 16 GB Mac-friendly (~9 GB)
`mlx-community/Wan2.2-VAE-Lance-bf16`	🟢 Production	48-ch Wan2.2 VAE (standalone, shared by image + video pipelines)
`mlx-community/Lance-3B-Video-bf16`	🟢 Functional	`t2v` (painterly aesthetic by design), `x2t_video`, `video_edit`

Status

🟡 Image MVP is production; video has a port quality issue under investigation (2026-05-21). All six Lance task families run end-to-end on Apple Silicon. Image (t2i, image_edit, x2t_image) reproduces the bf16 PyTorch reference quality. Video pipelines (t2v, video_edit, x2t_video) produce painterly output where the Phase 0 PyTorch reference produces photorealistic 3D-cinematic — a port-side numerical or routing bug we just identified (issue #2). We're documenting this transparently rather than shipping the wrong framing — earlier model cards described the painterly look as "by design," which the oracle data shows is incorrect.

Capability	Status
Convert HF safetensors → MLX bf16 (both checkpoints + Wan2.2 VAE)	✅ `scripts/02_convert.py`, `scripts/06_convert_wan_vae.py`
Load `Lance_3B` + `Lance_3B_Video` into `LanceModel`	✅ 0 missing keys, dummy forward verified
x2t_image VQA (image → text answer)	✅ Production. Content-correct across all 6 oracle cases.
KV cache for fast autoregressive decode	✅ 1.7×–2.8× speedup on long generations
t2i (text → image generation)	✅ Production. Photorealistic, prompt-aligned output.
image_edit (instruction-based)	✅ Production. "Remove hat" preserves identity + style + signature; "Add pearl necklace" leaves rest intact.
t2v (text → video)	🚧 Port quality bug. Runs end-to-end, prompt-aligned content recognizable, but output is painterly where the PyTorch oracle is photorealistic 3D-cinematic. Tracked as issue #2.
x2t_video (video VQA)	✅ Validated against Phase 0 oracle. Cooking video → kitchen+pan+spatula+tomato+meat all content-correct in 17.5 s. (Unaffected by the t2v bug — pure ViT+UND-tower path.)
video_edit (instruction-based)	🚧 Inherits t2v quality issue. End-to-end works ("Change balls to red" recolors); cinematic fidelity blocked on the t2v fix.
8-bit + 4-bit quants + HF community variants	⏳ Phase 5b

Try it:

# Install
git clone https://github.com/xocialize/lance-mlx && cd lance-mlx && uv sync

# Download production-ready image MVP (~15 GB):
HF_HUB_DISABLE_XET=1 uv run huggingface-cli download mlx-community/Lance-3B-bf16

# t2i — photorealistic text-to-image:
HF_HUB_DISABLE_XET=1 uv run python scripts/07_t2i_demo.py \
    --prompt "A photorealistic tabby cat holding a colorful STOP sign." \
    --lance-weights ~/.cache/huggingface/hub/models--mlx-community--Lance-3B-bf16/snapshots/*/ \
    --vae-weights   ~/.cache/huggingface/hub/models--mlx-community--Lance-3B-bf16/snapshots/*/vae.safetensors

# image_edit — instruction-based editing:
HF_HUB_DISABLE_XET=1 uv run python scripts/13_image_edit_demo.py \
    --input-image my_photo.jpg \
    --instruction "Remove the hat from the painting." \
    --lance-weights .../Lance-3B-bf16 --vae-weights .../vae.safetensors

# x2t_image — image VQA:
HF_HUB_DISABLE_XET=1 uv run python scripts/04_x2t_image_demo.py \
    --case 03 \
    --lance-weights .../Lance-3B-bf16 \
    --vit-weights   .../Lance-3B-bf16/vit.safetensors

See HANDOFF.md for the phased roadmap (start with the ⚠ Verified findings (2026-05-19) section — it supersedes earlier guesses). Phase 0 parity-oracle capture runbook lives at Docs/RUNPOD_PHASE0.md. Per-phase technical notes in notes/.

Quick start (after PyPI release)

uv pip install lance-mlx
# Image generation
lance-mlx generate --task t2i --prompt "..." --weights mlx-community/Lance-3B-bf16
# Image editing
lance-mlx generate --task image_edit --image foo.jpg --instruction "..." --weights mlx-community/Lance-3B-bf16
# Image understanding (VQA)
lance-mlx generate --task x2t_image --image foo.png --prompt "What is this?"
# Video generation (alpha)
lance-mlx generate --task t2v --prompt "..." --weights mlx-community/Lance-3B-Video-bf16

Tasks supported

t2i — text-to-image (768²)
t2v — text-to-video (480p, 12 fps, ≤121 frames)
image_edit — instruction-based image editing
video_edit — instruction-based video editing
x2t_image — image understanding / VQA / captioning
x2t_video — video understanding / VQA / captioning

Architecture

Two expert towers (LLM_UND, LLM_GEN), each initialized from Qwen2.5-VL-3B-Instruct, with per-expert FFN, output projection, and QK-norm
Modality-deterministic routing: text + Qwen2.5-VL ViT semantic tokens → LLM_UND (autoregressive next-token); Wan2.2 3D causal VAE latent tokens → LLM_GEN (flow-matching velocity prediction)
MaPE — modality-aware RoPE with per-modality temporal offset
Wan2.2 3D causal VAE (16× spatial / 4× temporal compression, 48-channel latent — Lance bundles its own VAE; do NOT use the public 16-ch wan2.2_vae.safetensors)
Untied LM head

Building blocks reused

Blaizzy/mlx-vlm for the Qwen2.5-VL ViT and autoregressive decode infrastructure
Blaizzy/mlx-video for the Wan2.2 VAE and flow-matching sampler

Hardware

Minimum: Apple Silicon Mac with 16 GB unified memory (4-bit quantized image only)
Recommended: 32 GB+ for bf16 image, 64 GB+ for video
Reference platform: M5 Max 128 GB (macOS 26.2+ for Neural Accelerator support)

Layout

.
├── HANDOFF.md                 phased port plan (this is the spec)
├── pyproject.toml             uv-managed
├── src/lance_mlx/
│   ├── __init__.py
│   ├── __main__.py            CLI entry point
│   ├── bench.py               Timer + RunRecord + JSONL logging
│   ├── io.py                  image/video IO + muxing
│   ├── model/
│   │   ├── lance_llm.py       dual-expert MoT backbone
│   │   ├── mape.py            modality-aware RoPE
│   │   ├── flow_head.py       velocity prediction head
│   │   └── routing.py         token modality routing
│   ├── pipeline/
│   │   ├── t2i.py             text-to-image flow loop
│   │   ├── t2v.py             text-to-video flow loop
│   │   ├── image_edit.py
│   │   ├── video_edit.py
│   │   └── understanding.py   x2t_image + x2t_video AR decode
│   └── convert.py             HF → MLX weight conversion
├── scripts/
│   ├── 00_capture_oracle.py   Phase 0 PyTorch reference capture (runs on cloud GPU)
│   ├── 01_inspect_keys.py     Phase 1a weight topology audit
│   ├── 02_convert.py          Phase 1e weight conversion
│   ├── 03_run_understanding.py Phase 2 x2t pipeline
│   ├── 04_run_t2i.py          Phase 3 T2I
│   ├── 05_quantize.py         Phase 5a quantization
│   └── 06_publish_hf.py       Phase 5c HF upload (dry-run default)
├── prompts/
│   ├── t2i_eval.json
│   ├── t2v_eval.json
│   └── understanding_eval.json
├── tests/
│   ├── fixtures/              Phase 0 PyTorch reference outputs
│   ├── test_routing.py
│   ├── test_mape.py
│   ├── test_vae_roundtrip.py
│   └── test_parity_t2i.py
├── notes/                     phase-by-phase educational notes
└── vendor/                    read-only reference clones

License

This MLX port: Apache 2.0.

Lance model weights: Apache 2.0 (ByteDance Intelligent Creation Lab). Wan2.2 VAE: Apache 2.0 (Alibaba). Qwen2.5-VL: Apache 2.0 (Alibaba).

See LICENSE and NOTICE for full attribution.

Citation

@article{fu2026lance,
  title={Lance: Unified Multimodal Modeling by Multi-Task Synergy},
  author={Fu, Fengyi and Huang, Mengqi and Wu, Shaojin and others},
  journal={arXiv preprint arXiv:2605.18678},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lance-mlx

📦 Weights on Hugging Face (`mlx-community`)

Status

Quick start (after PyPI release)

Tasks supported

Architecture

Building blocks reused

Hardware

Layout

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Docs		Docs
notes		notes
outputs		outputs
prompts		prompts
scripts		scripts
src/lance_mlx		src/lance_mlx
tests		tests
vendor		vendor
.gitignore		.gitignore
HANDOFF.md		HANDOFF.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

lance-mlx

📦 Weights on Hugging Face (mlx-community)

Status

Quick start (after PyPI release)

Tasks supported

Architecture

Building blocks reused

Hardware

Layout

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

📦 Weights on Hugging Face (`mlx-community`)

Packages