Skip to content

Remove HuggingFace mirror#8

Merged
thomas-schweich merged 13 commits intomainfrom
remove-hf-mirror
Mar 28, 2026
Merged

Remove HuggingFace mirror#8
thomas-schweich merged 13 commits intomainfrom
remove-hf-mirror

Conversation

@thomas-schweich
Copy link
Copy Markdown
Owner

Summary

PAWN no longer mirrors the GitHub repo to HuggingFace. The individual model repos (pawn-small, pawn-base, pawn-large) and dataset repos (pawn-lichess-full, stockfish-nodes1) on HF serve their purpose — a mirror of the code repo added no value and caused the README to be formatted as a model card.

  • Remove sync-to-hf.yml GitHub Action that pushed to HF on every commit to main
  • Strip HF YAML frontmatter from README.md so it reads as a normal GitHub README
  • Add acknowledgments table to templates/hf_model_card_template.md — references to papers and projects PAWN builds on, to be included in the individual model cards on HF

… README

New templates/ directory for generation templates. Model card template
now includes acknowledgments table with references to the projects and
papers PAWN builds on. Remove HF YAML frontmatter from GitHub README.
templates/hf_model_card.md.j2: Jinja2 template replacing the old
{PLACEHOLDER} template. Includes acknowledgments, probe results,
diagnostics, and all architecture/training details.

scripts/generate_model_cards.py: fetches metrics.jsonl and
eval_results.json from each HF model repo, renders the template,
and optionally uploads. No hardcoded metrics — everything is pulled
from the source of truth.

Usage: python scripts/generate_model_cards.py --push
cards/model/pawn-{small,base,large}.md: generated model cards checked
into the repo as the source of truth. Changes go through PRs.

cards/hf_model_card.md.j2: Jinja2 template (moved from templates/).

.github/workflows/sync-model-cards.yml: on push to main, uploads
cards/model/*.md to the corresponding HF model repos as README.md.

scripts/generate_model_cards.py: updated output dir to cards/model/,
warns loudly on missing optional fields (top5, legal_rate) from older
training runs but does not silently default to zero.
fetch_best_metrics() now merges top5_accuracy, legal_move_rate, and
perplexity from the best val record that has them, when the overall
best-loss record doesn't. This handles the case where val was logged
every 500 steps but backfilled extended metrics only exist at 5K-step
checkpoint boundaries.
@thomas-schweich thomas-schweich merged commit af2531e into main Mar 28, 2026
@thomas-schweich thomas-schweich deleted the remove-hf-mirror branch March 28, 2026 04:30
@claude claude bot mentioned this pull request Apr 2, 2026
7 tasks
thomas-schweich added a commit that referenced this pull request Apr 2, 2026
1. is_alive returns (alive, exit_code) tuple — no more lost exit
   codes from zombie reaping (#1)
2. _spawn wraps Popen in try/finally for log file handle (#2)
3. kill() no longer releases GPU immediately — monitor detects
   actual process exit and releases then, avoiding double-assignment
   during graceful shutdown (#3)
4. Trial.from_dict filters to known dataclass fields — survives
   schema evolution (#4)
5. shutdown() cancels monitor tasks cleanly (#6)
6. recover() emits trial_completed/trial_failed events for trials
   that died during server downtime (#7)
7. check_health NaN threshold scales with total_steps (#8)
thomas-schweich added a commit that referenced this pull request Apr 2, 2026
* Add pawn-lab MCP server for trial management

Adds a stateful daemon that manages GPU-isolated training processes
with Optuna-driven hyperparameter sweeps. Exposes 12 MCP tools
(lab_status, lab_launch, lab_sweep, lab_kill, lab_results, etc.)
over stdio so Claude Code can drive experiments without manual
process management.

Key features:
- Process spawning with CUDA_VISIBLE_DEVICES isolation
- Async metrics monitoring (polls metrics.jsonl every 5s)
- Optuna ask/tell autopilot with auto-advance on completion
- State persistence for recovery across MCP server restarts
- Event log for trial completions, failures, and health warnings
- Progress log rendering (pod_manager.md)

* Fix runner for ROCm and zombie process detection

- GPU discovery via PyTorch instead of nvidia-smi (works on both
  CUDA and ROCm)
- gpu_utilization() uses torch.cuda memory APIs
- Fix zombie detection: _is_alive() now calls waitpid() first to
  reap child zombies before falling back to kill(pid, 0)
- Derive step rate from elapsed/step when step_time is unavailable
  (adapter training scripts log epoch_time_s, not step_time)
- Handle multi-word python commands (e.g. "uv run python")

* Gitignore checkpoints directory

* Fix sweep to use custom search space; improve tool descriptions

- _autopilot_next now prefers sweep_distributions when explicitly
  configured, instead of always falling back to built-in distributions.
  This was causing custom search_space to be ignored.
- Expanded all MCP tool descriptions with examples, parameter
  documentation, and usage guidance for future agents.

* Fix run dir discovery to pick most recent metrics.jsonl

* Add MCP log-message notifications for trial events

When a trial completes, fails, or all GPUs go idle, the runner pushes
a notification via the MCP session's send_log_message(). Claude Code
surfaces these as log messages in the conversation, enabling event-
driven workflow instead of pure polling.

The session is captured from the request context on the first tool
call and passed to the runner via set_notify(). Falls back silently
if notifications aren't supported.

* Refactor pawn-lab: split into modules, switch to FastMCP

Split the monolithic runner.py (1085 lines) into focused modules:
- state.py (73) — Trial dataclass, duration formatting
- monitor.py (128) — metrics reading, health checks, process liveness
- sweep.py (149) — Optuna search spaces, study management
- runner.py (784) — core process lifecycle, GPU management, events
- server.py (175) — FastMCP @mcp.tool decorators, lifespan context

FastMCP replaces the manual 200-line TOOLS dict + dispatch with
decorated async functions. Lifespan context initializes the runner
on server startup. Background notifications use ctx.session captured
on first tool call.

Switches dependency from mcp>=1.0.0 to fastmcp>=2.0.0.

* Lazy GPU discovery to avoid torch import at MCP server startup

* Move GPU discovery to subprocess to avoid torch CPU spin

Torch's ROCm/HIP runtime spawns ~16 background threads that
busy-spin at ~30% CPU permanently once imported. Moving GPU
discovery to a subprocess keeps the MCP server process clean.
Also simplified gpu_utilization() to not require torch.

* Fix lab_seed to accept JSON strings for params/values

* Fix seed_trial to use sweep distributions when available

* Persist custom search space specs across MCP server restarts

* Add logging to notification path for debugging

* Add manage-pod skill; persist sweep search space; debug logging

- Un-ignore .claude/skills/manage-pod/ and track SKILL.md
- Skill now instructs agents to maintain Lab Notes in CLAUDE.md
  as a handwritten research log for context compaction recovery
- Persist sweep_search_space to lab_state.json so custom
  distributions survive MCP server restarts
- Fix seed_trial to use sweep distributions matching the study
- Add debug logging to notification path

* Remove autopilot, add lab_suggest and lab_log

Simplify the runner by removing all autopilot/sweep machinery:
- No more autopilot state, configure_sweep, pause, resume, pin, seed
- No more notification plumbing (_notify, _push, _session)
- _on_complete/_on_failed are now sync (no autopilot to await)

Replace lab_sweep with lab_suggest: creates an ephemeral Optuna study
seeded from completed trials, returns a suggestion. The agent decides
whether to launch — no auto-advance.

Add lab_log: returns the last N lines of a trial's stdout log for
debugging failures without manual file access.

Compact lab_status output: key HPs inline instead of full CLI commands.
Results include wall_time per trial.

runner.py: 583 lines (was 784). server.py: 154 lines (was 175).
sweep.py: 87 lines (was 149). Total: 1048 (was 1332).

* Fold Optuna suggestion into lab_results, drop lab_suggest tool

lab_results(suggest_strategy="bottleneck") now includes an
optuna_suggestion field with suggested params, seeded from all
completed trials. The suggestion is ephemeral (in-memory study,
no persistence). Agents see it passively when reviewing results
rather than needing to call a separate tool.

9 tools → 8 tools. server.py: 95 lines (was 154).

* Update manage-pod skill: remove autopilot references, document agent-driven loop

* Gitignore pawn-lab ephemeral files; remove stale optuna-storage

* Move lab state to runs/ directory; lab notes via PostCompact hook

- Runner defaults workspace to runs/ locally (still /workspace on pods)
- All ephemeral files (lab_state.json, lab_events.jsonl, pod_manager.md,
  sweep_results/, lab-notes.md) now live under gitignored runs/
- PostCompact hook in .claude/settings.json re-injects runs/lab-notes.md
  after context compaction (not committed — project-local config)
- Skill updated to reference runs/lab-notes.md instead of CLAUDE.md
- Lab notes removed from CLAUDE.md (migrated to runs/lab-notes.md)

* Address code review: 7 fixes

1. is_alive returns (alive, exit_code) tuple — no more lost exit
   codes from zombie reaping (#1)
2. _spawn wraps Popen in try/finally for log file handle (#2)
3. kill() no longer releases GPU immediately — monitor detects
   actual process exit and releases then, avoiding double-assignment
   during graceful shutdown (#3)
4. Trial.from_dict filters to known dataclass fields — survives
   schema evolution (#4)
5. shutdown() cancels monitor tasks cleanly (#6)
6. recover() emits trial_completed/trial_failed events for trials
   that died during server downtime (#7)
7. check_health NaN threshold scales with total_steps (#8)

* Fix Pareto front: use proper 2D dominance instead of 1D frontier

* lab_results: require strategy, always show 3 Optuna suggestions

- strategy parameter is now required (determines search space)
- Always generates 3 suggestions via ephemeral Optuna study, even
  with zero completed trials (pure exploration from the prior)
- Suggestions seeded from completed trials when available
- Exhaustive strategy list in tool docstring

* Fix kill GPU release: poll until process actually exits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant