Skip to content

xocialize-code/lens-mlx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lens-mlx

Apple MLX port of microsoft/Lens — a 3.8B GPT-OSS-conditioned text-to-image DiT with a FLUX.2 VAE decoder, for inference on Apple Silicon.

Status: Phases 0–3 ✅ — the port is functionally complete and generates images. Parity vs the PT reference: encoder cosine 0.998 · DiT cosine 0.999999 · VAE 57.65 dB · full e2e image PSNR 45.26 dB. 14/14 tests green. End-to-end generate() produces a 1024×1024 image in ~33 s (DiT bf16, 20 steps, 38.8 GB peak) on Apple Silicon. Published to mlx-community — collection: Lens-3.8B-bf16 · -4bit (2.35 GB) · -8bit (4.39 GB). Load a converted repo via LensPipeline.from_pretrained(base, dit_repo="mlx-community/Lens-3.8B-4bit"). See per-phase docs under docs/. Next: Swift mirror.

bf16 · int4 (same prompt/seed — int4 perturbs the trajectory into a different, equally sharp image):

import mlx.core as mx
from lens_mlx.pipeline_mlx import LensPipeline

pipe = LensPipeline.from_pretrained("weights/Lens", dit_dtype=mx.bfloat16)
img = pipe("A serene lake below snow-capped mountains, golden hour.",
           height=1024, width=1024, num_inference_steps=20, seed=42)
img.save("out.png")

Pipeline

GPT-OSS-20B multi-layer text features ([5,11,17,23])
  → Lens DiT (48-layer double-stream flow-matching)
  → FLUX.2 VAE decode

Scope (v1)

  • Variant: Lens (RL-tuned, 20-step) only. Turbo / Base deferred.
  • Precision: bf16 first; quantization (int4 DiT) deferred to a later pass.
  • Strategy: validate DiT parity against mflux's FLUX.2 VAE + flow-match scaffolding, then this standalone fork → xocialize-code/lens-mlx Swift mirror.

Layout

lens_mlx/
├── model/transformer.py     # LensTransformer2DModel  (Phase 2 — the bulk)
├── model/text_encoder.py    # gpt_oss capture wrapper  (Phase 1)
├── pipeline_mlx.py          # from_pretrained, denoise, CFG, decode (Phase 3)
├── resolution.py            # ported verbatim from upstream (pure Python)
└── utils/weights.py         # split-safetensors load, HF fetch
recipes/convert_lens.py      # per-component conversion recipe
tests/{parity,smoke}/        # PT is a [parity] dev-only extra
refs/Lens/                   # reference oracle (depth-1 clone)
refs/configs/                # checkpoint configs (reconciled in Phase 0)

Dev setup

uv venv --python 3.12 .venv && source .venv/bin/activate
uv pip install -e ".[dev]"     # mlx, mlx-lm, mflux + parity (torch/transformers/diffusers)
pytest tests/smoke

License

Lens code/weights MIT; GPT-OSS-20B Apache-2.0. The FLUX.2 VAE weights license is unverified — from_pretrained pulls the VAE from its original source rather than re-hosting a bundled copy. See handoff §8.

About

Apple MLX port of microsoft/Lens — GPT-OSS-conditioned text-to-image DiT + FLUX.2 VAE for Apple Silicon

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages