Lens — Model Error Analysis Platform

Lens is a human-in-the-loop error analysis tool for image classifiers. It trains a model, collects failures, generates per-failure hypotheses using a vision-language model, clusters those hypotheses across three independent dimensions, and surfaces the results in a review UI and dashboard.

What it does

Train — fine-tune a model on a dataset, track per-class val loss every epoch
Collect failures — save every misclassification with top-3 predictions and confidence scores
Hypothesize — send each failure image to a local VLM (qwen3-vl via Ollama), get a structured 3-part diagnosis:
- DATA ISSUE — is this a labeling error or ambiguous image?
- VISUAL CAUSE — what specific visual property caused the confusion?
- FIX — what augmentation or data change would help?
Cluster — embed and cluster each dimension independently using BAAI/bge-base-en-v1.5 + KMeans, label clusters via LLM
Review — human annotates failures in a web UI, records agree/disagree/skip verdicts
Dashboard — summary view of all clusters, data quality issues, and actionable fixes

Experiments

Experiment	Dataset	Model	README
resnet_baseline	CIFAR-10	ResNet-18 (pretrained)	experiments/cifar_resnet/README.md

Stack

PyTorch + torchvision
Ollama (local LLM inference) — qwen3-vl:8b for hypotheses, qwen3:latest for cluster labels
SentenceTransformers — BAAI/bge-base-en-v1.5 for hypothesis embeddings
scikit-learn — KMeans + silhouette score
FastAPI + vanilla JS — review UI and dashboard
uv — Python environment management

Setup

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Pull Ollama models
ollama pull qwen3-vl:8b
ollama pull qwen3:latest

Quick start

# Train + collect failures (no LLM)
uv run python train_cifar_resnet.py --run-name=my_run --no-augment --no-llm --epochs=3

# Generate hypotheses (samples 15 failures per class for speed)
uv run python train_cifar_resnet.py --run-name=my_run --no-augment --llm-samples=15

# Cluster across 3 dimensions
uv run python cluster_dimensions.py --run-name=my_run

# Launch review UI + dashboard
uv run python review_ui.py --run-name=my_run
# Open http://localhost:8000        (review)
# Open http://localhost:8000/dashboard  (cluster dashboard)

Repo structure

train_cifar_resnet.py     # CIFAR-10 ResNet-18 pipeline (train → failures → hypotheses → cluster)
cluster_dimensions.py     # Dimensional clustering (data issue / visual cause / fix)
review_ui.py              # FastAPI review UI and dashboard
analyze_training_log.py   # Per-class loss analysis across epochs
train_cifar.py            # CIFAR-10 scratch CNN (ablation experiments)
train_and_analyze.py      # MNIST pipeline (original prototype)
runs/                     # Per-experiment results (training logs, analysis, feedback)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
experiments/cifar_resnet		experiments/cifar_resnet
runs		runs
.gitignore		.gitignore
INSTRUCTIONS.md		INSTRUCTIONS.md
README.md		README.md
analyze_training_log.py		analyze_training_log.py
cluster_dimensions.py		cluster_dimensions.py
extract_images.py		extract_images.py
main.py		main.py
pyproject.toml		pyproject.toml
review_ui.py		review_ui.py
slides.html		slides.html
train_and_analyze.py		train_and_analyze.py
train_cifar.py		train_cifar.py
train_cifar_resnet.py		train_cifar_resnet.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lens — Model Error Analysis Platform

What it does

Experiments

Stack

Setup

Quick start

Repo structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lens — Model Error Analysis Platform

What it does

Experiments

Stack

Setup

Quick start

Repo structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages