Remove HuggingFace mirror by thomas-schweich · Pull Request #8 · thomas-schweich/PAWN

thomas-schweich · 2026-03-28T00:16:36Z

Summary

PAWN no longer mirrors the GitHub repo to HuggingFace. The individual model repos (pawn-small, pawn-base, pawn-large) and dataset repos (pawn-lichess-full, stockfish-nodes1) on HF serve their purpose — a mirror of the code repo added no value and caused the README to be formatted as a model card.

Remove sync-to-hf.yml GitHub Action that pushed to HF on every commit to main
Strip HF YAML frontmatter from README.md so it reads as a normal GitHub README
Add acknowledgments table to templates/hf_model_card_template.md — references to papers and projects PAWN builds on, to be included in the individual model cards on HF

… README New templates/ directory for generation templates. Model card template now includes acknowledgments table with references to the projects and papers PAWN builds on. Remove HF YAML frontmatter from GitHub README.

templates/hf_model_card.md.j2: Jinja2 template replacing the old {PLACEHOLDER} template. Includes acknowledgments, probe results, diagnostics, and all architecture/training details. scripts/generate_model_cards.py: fetches metrics.jsonl and eval_results.json from each HF model repo, renders the template, and optionally uploads. No hardcoded metrics — everything is pulled from the source of truth. Usage: python scripts/generate_model_cards.py --push

cards/model/pawn-{small,base,large}.md: generated model cards checked into the repo as the source of truth. Changes go through PRs. cards/hf_model_card.md.j2: Jinja2 template (moved from templates/). .github/workflows/sync-model-cards.yml: on push to main, uploads cards/model/*.md to the corresponding HF model repos as README.md. scripts/generate_model_cards.py: updated output dir to cards/model/, warns loudly on missing optional fields (top5, legal_rate) from older training runs but does not silently default to zero.

fetch_best_metrics() now merges top5_accuracy, legal_move_rate, and perplexity from the best val record that has them, when the overall best-loss record doesn't. This handles the case where val was logged every 500 steps but backfilled extended metrics only exist at 5K-step checkpoint boundaries.

1. is_alive returns (alive, exit_code) tuple — no more lost exit codes from zombie reaping (#1) 2. _spawn wraps Popen in try/finally for log file handle (#2) 3. kill() no longer releases GPU immediately — monitor detects actual process exit and releases then, avoiding double-assignment during graceful shutdown (#3) 4. Trial.from_dict filters to known dataclass fields — survives schema evolution (#4) 5. shutdown() cancels monitor tasks cleanly (#6) 6. recover() emits trial_completed/trial_failed events for trials that died during server downtime (#7) 7. check_health NaN threshold scales with total_steps (#8)

* Add pawn-lab MCP server for trial management Adds a stateful daemon that manages GPU-isolated training processes with Optuna-driven hyperparameter sweeps. Exposes 12 MCP tools (lab_status, lab_launch, lab_sweep, lab_kill, lab_results, etc.) over stdio so Claude Code can drive experiments without manual process management. Key features: - Process spawning with CUDA_VISIBLE_DEVICES isolation - Async metrics monitoring (polls metrics.jsonl every 5s) - Optuna ask/tell autopilot with auto-advance on completion - State persistence for recovery across MCP server restarts - Event log for trial completions, failures, and health warnings - Progress log rendering (pod_manager.md) * Fix runner for ROCm and zombie process detection - GPU discovery via PyTorch instead of nvidia-smi (works on both CUDA and ROCm) - gpu_utilization() uses torch.cuda memory APIs - Fix zombie detection: _is_alive() now calls waitpid() first to reap child zombies before falling back to kill(pid, 0) - Derive step rate from elapsed/step when step_time is unavailable (adapter training scripts log epoch_time_s, not step_time) - Handle multi-word python commands (e.g. "uv run python") * Gitignore checkpoints directory * Fix sweep to use custom search space; improve tool descriptions - _autopilot_next now prefers sweep_distributions when explicitly configured, instead of always falling back to built-in distributions. This was causing custom search_space to be ignored. - Expanded all MCP tool descriptions with examples, parameter documentation, and usage guidance for future agents. * Fix run dir discovery to pick most recent metrics.jsonl * Add MCP log-message notifications for trial events When a trial completes, fails, or all GPUs go idle, the runner pushes a notification via the MCP session's send_log_message(). Claude Code surfaces these as log messages in the conversation, enabling event- driven workflow instead of pure polling. The session is captured from the request context on the first tool call and passed to the runner via set_notify(). Falls back silently if notifications aren't supported. * Refactor pawn-lab: split into modules, switch to FastMCP Split the monolithic runner.py (1085 lines) into focused modules: - state.py (73) — Trial dataclass, duration formatting - monitor.py (128) — metrics reading, health checks, process liveness - sweep.py (149) — Optuna search spaces, study management - runner.py (784) — core process lifecycle, GPU management, events - server.py (175) — FastMCP @mcp.tool decorators, lifespan context FastMCP replaces the manual 200-line TOOLS dict + dispatch with decorated async functions. Lifespan context initializes the runner on server startup. Background notifications use ctx.session captured on first tool call. Switches dependency from mcp>=1.0.0 to fastmcp>=2.0.0. * Lazy GPU discovery to avoid torch import at MCP server startup * Move GPU discovery to subprocess to avoid torch CPU spin Torch's ROCm/HIP runtime spawns ~16 background threads that busy-spin at ~30% CPU permanently once imported. Moving GPU discovery to a subprocess keeps the MCP server process clean. Also simplified gpu_utilization() to not require torch. * Fix lab_seed to accept JSON strings for params/values * Fix seed_trial to use sweep distributions when available * Persist custom search space specs across MCP server restarts * Add logging to notification path for debugging * Add manage-pod skill; persist sweep search space; debug logging - Un-ignore .claude/skills/manage-pod/ and track SKILL.md - Skill now instructs agents to maintain Lab Notes in CLAUDE.md as a handwritten research log for context compaction recovery - Persist sweep_search_space to lab_state.json so custom distributions survive MCP server restarts - Fix seed_trial to use sweep distributions matching the study - Add debug logging to notification path * Remove autopilot, add lab_suggest and lab_log Simplify the runner by removing all autopilot/sweep machinery: - No more autopilot state, configure_sweep, pause, resume, pin, seed - No more notification plumbing (_notify, _push, _session) - _on_complete/_on_failed are now sync (no autopilot to await) Replace lab_sweep with lab_suggest: creates an ephemeral Optuna study seeded from completed trials, returns a suggestion. The agent decides whether to launch — no auto-advance. Add lab_log: returns the last N lines of a trial's stdout log for debugging failures without manual file access. Compact lab_status output: key HPs inline instead of full CLI commands. Results include wall_time per trial. runner.py: 583 lines (was 784). server.py: 154 lines (was 175). sweep.py: 87 lines (was 149). Total: 1048 (was 1332). * Fold Optuna suggestion into lab_results, drop lab_suggest tool lab_results(suggest_strategy="bottleneck") now includes an optuna_suggestion field with suggested params, seeded from all completed trials. The suggestion is ephemeral (in-memory study, no persistence). Agents see it passively when reviewing results rather than needing to call a separate tool. 9 tools → 8 tools. server.py: 95 lines (was 154). * Update manage-pod skill: remove autopilot references, document agent-driven loop * Gitignore pawn-lab ephemeral files; remove stale optuna-storage * Move lab state to runs/ directory; lab notes via PostCompact hook - Runner defaults workspace to runs/ locally (still /workspace on pods) - All ephemeral files (lab_state.json, lab_events.jsonl, pod_manager.md, sweep_results/, lab-notes.md) now live under gitignored runs/ - PostCompact hook in .claude/settings.json re-injects runs/lab-notes.md after context compaction (not committed — project-local config) - Skill updated to reference runs/lab-notes.md instead of CLAUDE.md - Lab notes removed from CLAUDE.md (migrated to runs/lab-notes.md) * Address code review: 7 fixes 1. is_alive returns (alive, exit_code) tuple — no more lost exit codes from zombie reaping (#1) 2. _spawn wraps Popen in try/finally for log file handle (#2) 3. kill() no longer releases GPU immediately — monitor detects actual process exit and releases then, avoiding double-assignment during graceful shutdown (#3) 4. Trial.from_dict filters to known dataclass fields — survives schema evolution (#4) 5. shutdown() cancels monitor tasks cleanly (#6) 6. recover() emits trial_completed/trial_failed events for trials that died during server downtime (#7) 7. check_health NaN threshold scales with total_steps (#8) * Fix Pareto front: use proper 2D dominance instead of 1D frontier * lab_results: require strategy, always show 3 Optuna suggestions - strategy parameter is now required (determines search space) - Always generates 3 suggestions via ephemeral Optuna study, even with zero completed trials (pure exploration from the prior) - Suggestions seeded from completed trials when available - Exhaustive strategy list in tool docstring * Fix kill GPU release: poll until process actually exits

thomas-schweich added 13 commits March 27, 2026 17:14

Remove HF sync action (no longer mirroring repo to HuggingFace)

69e74b7

Fetch model specs from HF config.json instead of hardcoding

fe416c2

Count params from safetensors weights instead of estimating

ee62c61

Load accuracy ceilings from data/theoretical_ceiling.json

c194be6

Fail loudly if theoretical_ceiling.json is missing

774e949

Remove all hardcoded fallbacks — fail loudly on missing data

20f5509

Replace all .get() fallbacks with strict key access

7541656

Remove last silent fallback (head_dim division guard)

24dc7ab

Regenerate small model card after fixing metrics.jsonl on HF

1f67847

thomas-schweich merged commit af2531e into main Mar 28, 2026

thomas-schweich deleted the remove-hf-mirror branch March 28, 2026 04:30

claude bot mentioned this pull request Apr 2, 2026

Add pawn-lab MCP server for trial management #30

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove HuggingFace mirror#8

Remove HuggingFace mirror#8
thomas-schweich merged 13 commits intomainfrom
remove-hf-mirror

thomas-schweich commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thomas-schweich commented Mar 28, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant