mlstack turns Claude Code and Codex from one generic assistant into a team of ML/data specialists you can summon on demand.
Ten opinionated workflow skills for Claude Code and Codex. Analysis planning, statistical review, ML architecture review, notebook authoring, exploratory data analysis, feature engineering, model critique, performance optimization, retrospectives, and analysis shipping — all as slash commands.
- The agent takes your analysis request literally — it never asks if you're framing the problem correctly
- It will fit whatever model you asked for, even when the data violates that model's assumptions
- "Explore this dataset" gives inconsistent depth every time
- The agent writes code but never explains why it chose that approach over alternatives
- You still do methodology review by hand: check distributions, validate assumptions, eyeball residuals
- Notebooks are walls of code with no narrative thread
| Skill | Mode | What it does |
|---|---|---|
/plan-science-review |
Principal Investigator | Rethink the research question. Challenge framing, identify confounds, map the causal structure before anyone touches data. |
/plan-stats-review |
Biostatistician / Methods Lead | Lock in the analysis plan: study design, power analysis, assumption checks, multiple comparison strategy, pre-registration. |
/plan-ml-review |
ML Architect | Challenge the modeling strategy across data modalities: architecture choices, representation strategy, training dynamics, compute-performance tradeoffs. |
/review-notebook |
Paranoid methods reviewer | Find the statistical sins that pass a cursory glance but invalidate conclusions. Not a style nitpick pass. |
/eda |
Senior Data Analyst | Systematic exploratory analysis with structured reports, distribution profiles, missingness maps, and relationship matrices. |
/feature-eng |
ML Engineer | Feature engineering, selection, and transformation with domain-aware rationale and leakage detection. |
/model-critique |
ML Research Scientist | Adversarial model evaluation. Challenge every modeling choice: was this the right algorithm, the right metric, the right validation strategy? |
/review-perf |
ML Performance Engineer | Review training/inference code for efficiency: GPU utilization, data loading throughput, memory footprint, precision, unnecessary computation. |
/retro-analysis |
Analytics Manager | Team-aware retrospective: analysis velocity, reproducibility, insight yield, and per-person growth. |
/ship-analysis |
Release Analyst | Package analysis into a reproducible deliverable: freeze environment, validate outputs, generate executive summary, archive artifacts. |
You're a data scientist, ML engineer, or quantitative researcher who uses Claude Code or Codex as a force multiplier. You want your AI assistant to think like a team of specialists — not just write code, but challenge your methodology, catch statistical errors before they become retracted papers, and produce analyses that would survive peer review.
Requirements: Claude Code and/or Codex, Git.
Clone and run setup:
git clone https://github.com/tim-krausz/mlstack.git ~/.claude/skills/mlstack
cd ~/.claude/skills/mlstack && ./setupThe setup script:
- Registers all skills with Claude Code (
~/.claude/skills/) and Codex (~/.codex/skills/) - Creates or updates
CLAUDE.mdandAGENTS.mdwith the skill list (auto-detects the right directory — see below) - Dynamically discovers skills from the repo — no hardcoded list to maintain
If you only use one platform:
# Claude Code only
cd ~/.claude/skills/mlstack && ./setup --claude
# Codex only
git clone https://github.com/tim-krausz/mlstack.git ~/.codex/skills/mlstack
cd ~/.codex/skills/mlstack && ./setup --codexcp -Rf ~/.claude/skills/mlstack .claude/skills/mlstack
rm -rf .claude/skills/mlstack/.git
cd .claude/skills/mlstack && ./setupSetup auto-detects that it's inside .claude/skills/ and updates CLAUDE.md and AGENTS.md at the project root.
- Skill symlinks in
~/.claude/skills/(Claude Code) and/or~/.codex/skills/(Codex) ## mlstack skillssection auto-injected intoCLAUDE.mdandAGENTS.mdat the detected project root- Nothing touches your PATH or runs in the background
mlstack is designed to coexist with gstack. All skill names are unique across both stacks — no collisions. If you have both installed, all skills from both stacks are available simultaneously.
cd ~/.claude/skills/mlstack
git pull origin main
./setupThe setup script re-discovers skills and updates all symlinks and documentation files. If you added new skills or renamed existing ones, setup handles it automatically.
To update a project-level install:
cp -Rf ~/.claude/skills/mlstack .claude/skills/mlstack
rm -rf .claude/skills/mlstack/.git
cd .claude/skills/mlstack && ./setup# Remove skill symlinks and the mlstack directory
cd ~/.claude/skills/mlstack && SKILLS=$(ls -d */ | sed 's/\///g'); cd ~
for s in $SKILLS; do rm -f ~/.claude/skills/$s ~/.codex/skills/$s; done
rm -rf ~/.claude/skills/mlstack ~/.codex/skills/mlstackThen remove the ## mlstack skills section from your CLAUDE.md and AGENTS.md.
Usage: ./setup [OPTIONS]
Options:
--claude Register skills with Claude Code only
--codex Register skills with Codex only
--no-docs Skip updating CLAUDE.md / AGENTS.md
--working-dir DIR Update CLAUDE.md / AGENTS.md in DIR (default: auto-detected)
--help Show this help message
Without --claude or --codex, setup registers with all detected platforms.
Working directory auto-detection: Setup walks up from its own location looking for .claude/ or .codex/ parent directories. If found, it targets the project root (the parent of .claude/ or .codex/). This means cd ~/.claude/skills/mlstack && ./setup updates ~/CLAUDE.md — not the mlstack directory itself. Use --working-dir to override.
The script is idempotent — run it as many times as you want. It replaces the ## mlstack skills section in-place if it already exists, preserving all other content in your documentation files.
-
Start with
/plan-science-reviewbefore touching data. This is your PI asking: "Is this the right question? What would actually constitute evidence? What are the confounders you haven't thought about?" It forces you to articulate the causal model before fitting anything. -
Lock in methods with
/plan-stats-review. Your biostatistician reviews the analysis plan: sample size adequate? Assumptions checkable? Multiple comparisons handled? This catches the methodological landmines before you step on them. -
Choose your modeling strategy with
/plan-ml-review. Your ML architect asks: is this the right representation for this data? Is a pretrained backbone appropriate? Is this architecture justified for this dataset size? Should you learn features end-to-end or hand-engineer them? This catches the modeling decisions that are invisible to classical statistics. -
Explore with
/eda. Systematic exploration — not justdf.describe(). Distribution profiling, missingness analysis, multicollinearity checks, target leakage scans. Produces a structured report you can reference throughout the project. -
Engineer features with
/feature-eng. Domain-aware feature creation with explicit rationale for every transformation. Leakage detection built in. Tracks which features survived selection and why. -
Critique with
/model-critiqueafter training. Your adversarial reviewer asks: "Was nested cross-validation actually necessary here? Did you check if a simple baseline beats this? Your pipeline assumed normality but these features violate that assumption." This is the review that saves you from publishing embarrassing results. -
Optimize with
/review-perfon your training and inference code. Your performance engineer profiles the implementation and finds the bottlenecks: GPU sitting idle waiting on the dataloader, full FP32 weights that should be BF16, redundant forward passes on frozen backbones. Every suggestion comes with an estimated speedup and a concrete code fix. -
Review notebooks with
/review-notebookbefore sharing. Catches the sins: p-hacking patterns, undisclosed multiple comparisons, leaky preprocessing, conclusions that don't follow from the evidence. -
Ship with
/ship-analysis. Freeze the environment, validate all outputs reproduce, generate an executive summary, archive everything. Your analysis is now a citable, reproducible artifact. -
Reflect with
/retro-analysis. Track analysis velocity, reproducibility rate, insight yield, and methodology quality over time.
MIT