Part of the Vector Lab — vector methods for vector theory. Overview and map · Org profile
Tier: comparative model tool. Object: generative trajectories of diffusion models.
Sibling instruments: Vectorscope · Manifold Atlas · Manifoldscope · Theoryscope · LLMbench
Manifold geometry and benchmarking for diffusion models.
Author: David M. Berry Institution: University of Sussex Version: 0.3.13 Date: 28 April 2026 Licence: MIT
Diffusion Atlas is a vector-native research tool for studying how diffusion models generate images geometrically. Where Manifold Atlas reads the static manifold of a frozen embedding model, Diffusion Atlas reads the generative trajectory of a denoising model: the path through latent space that produces an image. It unifies an Atlas view (interpretability and geometry) with a Bench view (scored compositional evaluation) in a single instrument.
The tool extends Vector Theory from text-embedding manifolds to generative diffusion. The asymmetry between the two regimes is itself a finding: the manifold framing migrates more cleanly to diffusion than to autoregressive text, because in diffusion the latent space is genuinely geometric rather than inferred from token logits. Position, orientation, and proximity are not metaphors here; they are the substrate on which generation runs.
Diffusion Atlas emerges from three converging research programmes.
Vector theory. Berry (2026) Vector Theory argues that the vectorial turn introduces a new computational regime in which definition is replaced by position, truth by orientation, argument by interpolation, and contradiction by cosine proximity. Diffusion models are this regime in its most explicit form: the entire generative process is a trajectory through a learned latent geometry. Diffusion Atlas operationalises this view with Atlas operations that read the geometry directly.
Compositionality and the limits of generation. Diffusion models excel at texture and aesthetic coherence but fail in characteristic ways on compositional tasks: object counts, attribute binding, spatial relations, negation. The GenEval benchmark and successors have made these failures legible. The Bench view in Diffusion Atlas operationalises compositional evaluation alongside Atlas-side geometry so that the cost of a model's geometric decisions is visible in the same session.
Comparative, multi-backend method. No single backend gives an honest picture of a diffusion model. Hosted APIs hide intermediate latents; local backends require GPU access. Diffusion Atlas is built to run the same operation across both, treating capability differences as findings rather than friction.
Diffusion Atlas distinguishes two analytical surfaces that most diffusion tools keep apart.
Atlas operations treat diffusion as a vector process. They read the geometry of latent space, the shape of denoising trajectories, and the structure of neighbourhoods around a chosen point. They are interpretive: they ask where in the latent space the model decides, where the manifold thickens, where it thins, and where small perturbations produce categorical jumps in the output.
Bench operations treat diffusion as a system to be scored. They run controlled task packs (single object, two objects, counting, colour, spatial relations) against the model and report per-category accuracy. They are evaluative: they ask what the model's geometric decisions cost in compositional fidelity.
The two views run on the same generated images. A finding in Bench (the model fails attribute binding for red-and-blue object pairs) can be cross-checked in Atlas (does the latent neighbourhood for that prompt show two distinct basins, or one collapsed mode?). This is the methodological reason for a single app rather than two.
| Operation | View | Core question | Backend |
|---|---|---|---|
| Denoise Trajectory | Atlas | What path does the model take through latent space? | Local (NDJSON stream) |
| Guidance Sweep | Atlas | How does CFG bend the trajectory? Where does mode collapse? | Hosted or Local |
| Latent Neighbourhood | Atlas | What does the local manifold look like around an anchor? | Hosted or Local |
| Compositional Bench | Bench | How well does the model bind, count, and place? | Hosted or Local |
| Library | — | What did we run, and what did it produce? | Local (IndexedDB) |
Trace the iterative denoising path through latent space. The local backend streams per-step latents over NDJSON, with a thumbnail decoded through the VAE every Nth step (configurable via the Preview every field). The client projects the latents to 3D via PCA (or UMAP, or a 35mm-style Film view that renders every captured step as a contact sheet with white edge-print metadata strips and clickable RGB histograms per frame). The 3D view renders the path in Three.js with start (gold) and end (burgundy) markers; thumbnails appear as billboard sprites along the curve, with a Thumbnails every slider to thin the swarm so the curve geometry stays readable. Local backend required: hosted providers do not expose intermediate latents.
Temp / locked layers (v0.3.8): every completed run is automatically added to the layers list as a temporary layer (dashed border, italic label, neutral colour). Running again replaces the temp layer so the list does not fill with garbage. Click the padlock to lock a layer in place — locked layers survive future runs and gain a palette colour for overlay comparison. The seed input has shuffle/increment toggles next to the dice button: shuffle rolls a fresh random seed before every run; increment bumps the seed by +1 before every run (the standard "walk neighbouring seeds" pattern). The dice spins on shuffle and bumps upward on increment so the kind of roll is legible at a glance.
Generate the same prompt and seed across a list of CFG values (default 1, 2.5, 4, 7.5, 12). The image grid is keyed by CFG with per-cell status while the run is in flight. The sweep is sequential per lane rather than Promise.all so a single failure doesn't tank the run, and rate-limited responses (Replicate's free tier triggers them quickly) are honoured: each cell shows a live Retrying in Ns countdown then resumes. A drift curve above each grid plots normalised perceptual-hash distance from the baseline (CFG nearest 7.5), so the controllability surface — and where mode collapse begins — is visible at a glance.
Cross-backend comparison (v0.3.0): tick Compare with a second provider to run a parallel sweep against any other configured provider (Replicate ↔ Fal, hosted ↔ Local, etc.). Both lanes share the same prompt, seed, steps, and CFG list and run in parallel — independent rate limits, independent retry loops. Side-by-side grids and drift curves make geometry that is structural (consistent across backends) distinguishable from geometry that is contingent (specific to one provider's training).
Sample k images around an anchor seed at a configurable radius. Deterministic seed offsets so the run is reproducible. Hosted mode samples by varying the seed (each seed maps to a different starting latent); true Gaussian perturbation of the initial latent at a chosen sigma is queued for the local backend.
Cross-backend comparison (v0.3.1): tick Compare with a second provider to run the same anchor + seed offsets against a second provider in parallel. Two thumbnail grids stack with their provider labels, so you can see how a 24 GB local SD 1.5 and a hosted flux-schnell organise the local manifold around the same prompt. Where the two neighbourhoods cluster differently is where the geometry is contingent on the model rather than structural to diffusion.
GenEval-lite task pack: 4 categories (single object, two objects, counting, colour binding) × 3 prompts each = 12 tasks. Generate the pack at a fixed seed; mark each result pass or fail with the live per-category scoring panel, or click Auto-score (CLIP) to score every image against its prompt with openai/clip-vit-base-patch32 on the local backend and set verdicts from cosine similarity at a configurable threshold (0.25 is the conventional cutoff). Each card shows the numeric score; verdicts can be overridden manually at any time.
All saved runs across Sweep, Neighbourhood, Bench, and Trajectory, grouped by kind, newest first. Stored entirely in the browser's IndexedDB; nothing leaves the machine. Per-run delete and a Clear all action.
Diffusion Atlas supports two classes of backend, selected per-session in Settings.
Hosted providers return final images via API. They are the right choice for Guidance Sweep, Latent Neighbourhood (seed-jitter mode), and Compositional Bench. They cannot serve Denoise Trajectory because they do not expose intermediate latents.
| Provider | Status | Notes | Sign up |
|---|---|---|---|
| Replicate | wired | Broad model selection (SDXL, SD3, FLUX, custom). Aggressive rate limits on the free tier. | replicate.com |
| Fal | wired | An order of magnitude more permissive on rate limits than Replicate. Best choice for sweep- and bench-heavy work. | fal.ai |
| Together | planned | OpenAI-compatible inference, image and chat | together.ai |
| Stability AI | planned | Stability's own SD3, SDXL, and Stable Image series | platform.stability.ai |
When you switch providers in Settings, the model id auto-updates to a sensible starting point for that provider (Replicate uses owner/model ids; Fal uses fal-ai/<model>). Override afterwards as needed.
The local backend is a small FastAPI service that wraps the diffusers library. It runs on your own hardware (CUDA or Apple Silicon MPS) and is the only way to run Denoise Trajectory or true latent-space neighbourhood sampling. No API key is needed; no data leaves your machine. Suitable models include Stable Diffusion 1.5, SDXL, SD3, and FLUX-schnell, depending on memory.
To use the local backend, install the FastAPI service in backend/ (see Getting Started) and select Local in Settings.
Why both Atlas and Bench in one app? In diffusion the manifold is the benchmark. Controllability, compositional generalisation, and mode coverage are all geometric properties of the latent space. Splitting them into separate tools forces you to compute the same geometry twice and pretend the scores are independent. Diffusion Atlas treats Bench as a derived view of the Atlas substrate so a finding in one is checkable in the other.
Why hosted plus local? Hosted is the path of least resistance and serves most of the Bench view and the cheaper Atlas operations. Local is the only way to study the trajectory itself, because hosted providers do not expose intermediate latents. Treating them as peer providers, with capability mismatches surfaced as typed errors, lets the same operation degrade gracefully across backends.
Why cache latents and images in IndexedDB? A single trajectory at 30 steps and 1024x1024 resolution can take 5 to 30 seconds. Identical queries should never cost twice. Latents (Float32Array) and images (Blob) are cached deterministically by full generation parameters. Image blobs use LRU eviction with a configurable cap (default 500 MB).
Why a browser-only frontend? The instrument is for research. Running entirely in the browser, with API keys stored client-side and the local backend on localhost, keeps the deployment surface minimal and the data on the researcher's own machine. No tracking, no telemetry.
Why editable models/*.md files? The pace of model releases outruns any sensible rebuild cadence. Keeping the model registry in markdown lets researchers add a new model as soon as it appears, without touching compiled artefacts.
- Node.js 18 or later
- For Atlas operations beyond Guidance Sweep: a GPU (CUDA) or Apple Silicon (MPS) with at least 12 GB of available memory
- For hosted operations: an API key from at least one of Replicate, Fal, Together, or Stability AI
git clone https://github.com/vector-lab-tools/diffusionatlas.git
cd diffusion-atlas
npm install
npm run devOpen http://localhost:3000 in your browser.
The local backend is required for Denoise Trajectory and true latent-space neighbourhood sampling. It is optional otherwise.
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reloadThe backend listens on http://localhost:8000. In Diffusion Atlas Settings, switch the backend to Local and confirm the URL.
- Click Settings (top right)
- Choose Hosted or Local as the backend
- For hosted: select a provider and paste your API key
- For local: confirm the FastAPI URL and select a model
- Close settings and start generating
Model lists are defined in markdown under public/models/, one file per provider (replicate.md, fal.md, together.md, stability.md, local.md). Add a model with one line:
model-id | Display Name | notes
Save the file and reload the app. Lines starting with # are comments.
src/
app/
api/diffuse/ # Hosted + local image generation dispatcher
api/trajectory/ # Local-only NDJSON stream of per-step latents
api/bench/ # Scored task runs
api/keys/ # Server-side API key proxy (optional)
api/local-diffuse/ # Proxy to FastAPI backend
components/
operations/ # Denoise Trajectory, Guidance Sweep,
# Latent Neighbourhood, Compositional Bench
viz/ # TrajectoryThree, GuidanceGridPlot,
# NeighbourhoodScatter, BenchLeaderboard
layout/ # Header, TabNav, StatusBar, SettingsPanel,
# AboutModal, HelpDropdown
shared/ # ResultCard, DeepDive, ImageGalleryGrid,
# PromptChips, ErrorDisplay, OperationStub
context/ # DiffusionSettingsContext, LatentCacheContext,
# ImageBlobCacheContext
lib/
providers/ # Replicate, Fal, Together, Stability, Local
# plus dispatcher and shared types
operations/ # Pure compute for each operation
geometry/ # PCA, UMAP, latent distance utilities
bench/ # GenEval-lite tasks and scoring
cache/ # IndexedDB wrappers
export/ # PDF and CSV
types/ # Shared TypeScript types
backend/ # Local FastAPI service (optional)
main.py # FastAPI app
config/ # CORS, settings
models/ # diffusers pipeline session
operations/ # denoise_trajectory, guidance_sweep,
# latent_neighbourhood, generate
geometry/ # latent reduce / IO
Latent vectors and image blobs are cached in IndexedDB. Settings persist in localStorage. No server-side database, no authentication.
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router), React 19 |
| Language | TypeScript 5 (strict) |
| Styling | Tailwind CSS 3, CCS-WB editorial design system |
| Visualisation | Plotly.js (GL3D), Three.js (@react-three/fiber) |
| Dimensionality reduction | umap-js (browser-side), custom PCA |
| Caching | IndexedDB via idb |
| Validation | Zod |
| Local backend | FastAPI, diffusers, PyTorch |
Diffusion Atlas is a research instrument for the vector theory programme developed by David M. Berry. The vectorial turn introduces a new computational regime in which meaning is encoded as position and inference is performed as movement through a learned manifold. Diffusion models are this regime in its most explicit form: the entire generative process is a trajectory through latent space, and the final image is a projection of where that trajectory ended.
The Atlas operations test specific claims of the framework. Denoise Trajectory makes the path of generation visible as a curve. Guidance Sweep tests how aggressively the conditional vector pushes the trajectory off its unconditional course. Latent Neighbourhood tests the local geometry of the manifold around any chosen point. Compositional Bench, on the Bench side, measures the cost of these geometric decisions in tasks that require categorical compositionality (binding, counting, placement) which the manifold tends to handle as smooth interpolation rather than discrete combination.
- App shell with Atlas / Bench / Library tabs (v0.1.0)
- Vector Lab branding and theme-aware tool icon (v0.1.1)
- Provider abstraction with Replicate hosted provider (v0.1.2 – 0.1.7)
- Guidance Sweep with image grid and rate-limit retry (v0.1.8 – 0.1.9)
- Latent Neighbourhood with seed jitter (v0.1.10)
- Compositional Bench (GenEval-lite) with manual scoring (v0.1.11)
- Library browse for saved runs (v0.1.12)
- Local FastAPI backend skeleton with /generate (v0.2.0)
- Denoise Trajectory with NDJSON streaming and 3D PCA path (v0.2.1)
- Per-step preview thumbnails along the trajectory + drift curve in Guidance Sweep (v0.2.2)
- CLIP-based auto-scoring for Compositional Bench (v0.2.3)
- Fal.ai hosted provider + UMAP toggle on the trajectory projection (v0.2.4)
- Cross-backend agreement view in Guidance Sweep (v0.3.0) — same prompt + seed, two providers in parallel, side-by-side grids and drift curves
- Cross-backend comparison extended to Latent Neighbourhood (v0.3.1)
- Cross-backend comparison extended to Compositional Bench + Deep Dive panels with CSV/PDF/JSON export across all operations (v0.3.2)
- Per-step preview decoding, scrubber, camera roll modal, and image-statistics panel with R/G/B/Luma histograms (v0.3.3 – v0.3.6)
- Trajectory deep-dive expands per layer; per-frame metadata strip + clickable RGB histograms; smaller modals (v0.3.7)
- Temp-vs-locked layer model with per-row padlock toggle, shuffle/increment seed modes with per-mode dice animations, DPMSolverMultistepScheduler (DPM++ 2M Karras) swap to fix the SD 1.5 PNDM
index 1001bounds bug, native-resolution auto-snap in Width/Height selects via a newBackendHealthContext, per-layer PDF grouping, sticky StatusBar, and IDB-resilience layer withwithDB()retry-on-close (v0.3.8) - Stop / abort button for in-flight trajectory runs (AbortController-wired); partial trajectories are kept as temp layers so a stopped run is still inspectable. Shuffle / increment seed modes propagated to Guidance Sweep, Latent Neighbourhood, and Compositional Bench, with a
seedRefpattern that lets in-flight closures pick up the freshly-rolled seed without waiting for React to re-render (v0.3.9) - Memory hygiene for 24 GB unified-memory boxes:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7set before torch import,torch.mps.empty_cache()+synchronize()on every model swap, opt-inMIXED_PRECISION_VAEflag (fp16 U-Net + fp32 VAE) for SD 1.x/2.x,POST /warmupendpoint at 256×256 to cache MPS kernels without spiking activation memory, pre-load fit check with HTTP 413 +overrideMemoryCheckescape hatch, StatusBar memory warning when the loaded model wouldn't fit (v0.3.10) - Stop everywhere + click-to-modal everywhere + CFG fixed: stop buttons on Sweep / Neighbourhood / Bench (matching Trajectory); cells across all four ops open the same
FrameModalwith RGB-overlay histogram and image-stats panel; CFG explained per operation with proper formulaunconditional + CFG × (conditional − unconditional), datalist of canonical values; CFG bug fix in Neighbourhood + Bench (was readingsettings.defaults.cfg = 0from the FLUX-schnell-tuned default and producing prompt-off noise on SD 1.5/SDXL — both now expose their own CFG input defaulting to 7.5); local-lane resolution now readsnativeWidth/Heightfrom/healthwith a 512×512 fallback so 1024×SD-1.5 OOMs are eliminated; backend pre-flight resolution check (HTTP 422), empty-prompt + empty-negative-prompt defences, NaN-in-VAE detection with diagnostic message; PDF exports gain Diffusion Atlas branding (top wordmark + per-page footer stamp); switched to DPM++ 2M (no Karras sigmas) to avoid the late-trajectory underflow that produced black images at CFG ≈ 7.5; Bench CLIP score line includes plain-English bands; Clippy bubble dismiss × (v0.3.11–0.3.12) - Sweep + Neighbourhood get the contact-sheet aesthetic (dark sprocketed strip + white edge-print metadata band per lane, matching the Trajectory FilmStrip). Generic stacking primitive extracted to
useLayerStack+<LayerStackPanel>— Trajectory refactored onto it; Sweep / Neighbourhood / Bench can adopt the same shape in ~20 lines each. Persistent layers in IDB (op_statestore via structured clone — Float32Array latents survive). Per-operation Reset button wired through the shared panel. CFG plain-English captions per cell (1 — no amplification,7.5 — balanced default,12 — aggressive, etc.) via sharedcfgCaption(). CFG dropdown replaces the free-form input in Trajectory / Neighbourhood / Bench (Sweep keeps the multi-value list). Default CFG list dropped fragile2.5slot (sat in the empirical fp32-attention-NaN range on MPS); now1, 4, 7.5, 12, 18. Drift curve moved to Deep Dive with proper gridlines + axis labels + a plain-English caption; drawn natively in the PDF viajsPDFline primitives (no SVG-to-PNG roundtrip). PNG export for the contact-sheet film reel sits next to the SVG button (universal compatibility). Step Inspector now a click-to-open modal with a 200×200 thumbnail, full scalar dl, RGB histogram of the decoded preview, and Play / Stop + speed selector so you can watch the denoising as a looping animation. EulerDiscreteScheduler pinned (DPM++ 2M produced sporadic NaN on MPS at certain CFGs); scheduler is reset to a fresh instance on every forward to dodge thestep_indexleak that producedIndexError: index 21 out of bounds. bfloat16 attempt for SD 1.5 reverted because bf16's 7-bit mantissa quantisesalphas_cumprodenough to break some scheduler index calcs on MPS. Cell error UX: long backend diagnostics shortened tomodel couldn't render/out of memory/resolution too largeetc. with the verbatim message in the hover tooltip, separated by a divider. NaN tooltip explains IEEE-754 NaN concretely. Help dropdown gains "What is MPS warmup?" entry (v0.3.13) - Clippy easter egg — type
clippyanywhere outside an input field to summon a diffusion-flavoured paperclip with quips on latents-vs-tokens, the vector turn, and meta-commentary on what you're doing in the app - Hackerman easter egg
- Remote-backend deploy path (long-range): Atlas operations need a your-own-process diffusers runtime, currently localhost-only. The same
backend/FastAPI app could be wrapped as a Modal / fal Custom App / Replicate Cog / RunPod Serverless container so researchers without a CUDA box can run Trajectory + sweeps at SDXL / SD3 / FLUX scale on rented GPU. The frontend already accepts a configurablelocalBaseUrl— generalising it to "Backend URL" + bearer token in Settings is a small change. Modal is the favoured target (NDJSON streaming maps directly onto async generators; warm-pool keeps the pipeline hot across a research session; scales to zero when idle; $30/month free tier covers most academic work). The Help / About panel would then describe three deploy modes: laptop, rented serverless GPU, or self-hosted server - Object-detection-based bench scoring (proper GenEval rather than CLIP cosine)
- True Gaussian-perturbation neighbourhood mode for the local backend
- Together / Stability hosted providers
- Attention-map and cross-attention visualisation
- h-space steering
- Berry, D. M. (2026) 'Vector Theory', Stunlaw. Available at: https://stunlaw.blogspot.com/2026/02/vector-theory.html
- Berry, D. M. (2026) 'What is Vector Space?', Stunlaw. Available at: https://stunlaw.blogspot.com/2026/03/what-is-vector-space.html
- Berry, D. M. (2026) Artificial Intelligence and Critical Theory. MUP.
- Ho, J. and Salimans, T. (2022) 'Classifier-Free Diffusion Guidance'. Available at: https://arxiv.org/abs/2207.12598
- Ghosh, D. et al. (2023) 'GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment'. Available at: https://arxiv.org/abs/2310.11513
- Rombach, R. et al. (2022) 'High-Resolution Image Synthesis with Latent Diffusion Models'. CVPR. Available at: https://arxiv.org/abs/2112.10752
- Park, Y.-H. et al. (2023) 'Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry'. Available at: https://arxiv.org/abs/2307.12868
Concept and Design by David M. Berry, implemented with Claude Code. Design system adapted from the CCS Workbench.
MIT