v2.0 Phase 3: Variant evolution orchestrator + Scientist + Spawner by ty13r · Pull Request #4 · ty13r/skillforge

ty13r · 2026-04-11T00:16:41Z

Summary

Phase 3 of v2.0 — the variant evolution orchestrator. This is the wave where atomic evolution actually starts running. One commit, three waves, stacks on Phase 2 (PR #3 / ca7d154).

Wave 3-1: skillforge/engine/variant_evolution.py — run_variant_evolution(run) reads variant_evolutions rows for the parent run, sorts foundation-first, runs each dimension as a mini-evolution (design focused challenge → spawn N variants → run via Competitor → judging pipeline → pick winner → persist as Variant row → mark VariantEvolution complete), then calls a stub assembly that returns the foundation winner unchanged. Wired into evolution.py::run_evolution as a top-level dispatcher: run.evolution_mode == "atomic" delegates here, falls back to molecular cleanly if no rows exist.
Wave 3-2: design_variant_challenge(specialization, dimension) added to challenge_designer.py. Single streaming Anthropic call returning ONE focused Challenge per dimension.
Wave 3-3: spawn_variant_gen0(specialization, dimension, foundation_genome, pop_size) added to spawner.py. Focused mini-SKILL.md generator. For capability tier, the winning foundation is injected into the system prompt so spawned variants are compatible with the foundation's directory layout. Stamps dimension + tier keys into each spawned genome's frontmatter.

Routes wiring

_classify_run_via_taxonomist in routes.py now persists VariantEvolution rows (one per dimension) when the final run mode is atomic, so the orchestrator has work to pick up. The route handler reorders save_run to come before the variant_evolution INSERTs so the FK on parent_run_id resolves.

Stub assembly (Phase 4 will replace)

The Engineer agent's real merge logic is Wave 4-1. Phase 3 ships with a stub that:

Returns the highest-fitness foundation variant as the composite when one exists.
Falls back to the highest-fitness capability variant if no foundation dimension was defined.
Emits assembly_started + assembly_complete events with synergy_ratio: null.

This unblocks the full atomic event sequence (taxonomy_classified → decomposition_complete → variant_evolution_started × N → variant_evolution_complete × N → assembly_started → assembly_complete → evolution_complete) without blocking on Phase 4.

Bug fixes (latent in earlier phases)

Phase 3 surfaced two test-isolation bugs from earlier waves:

test_list_family_variants_empty_by_default asserted that fams[0] had zero variants — but Phase 3's happy-path test now persists variants under a fam_phase3 family, polluting the global DB. Pinned the assertion to a known seed family (terraform-module-full) that no other test populates.
test_evolve_taxonomist_integration — the mocks bypass the real classify_and_decompose persistence path, so the routes.py downstream code that persists VariantEvolution rows (with FK to skill_families) failed because the mocked family was never inserted. Fix: tests now manually seed the mocked taxonomy + family before the request fires. Mock slugs prefixed with test-fixture- to avoid colliding with the bootstrap loader's testing/unit-tests/python triple.

Quantitative signal

81 v2.0 backend tests passing (55 Phase 1 + 18 Phase 2 + 8 Phase 3)
ruff clean on all Phase 3 files
Frontend untouched in Phase 3 (Phase 4 / 5 surface the variant breakdown UI)

What's not in this PR

Real Engineer assembly (Wave 4-1) — composite skill merging with conflict resolution, integration test, refinement pass
Per-dimension breeding loops — Phase 3 spawns once and picks the winner. Wave 4 will add a multi-generation breeding loop per dimension
Variant breakdown UI (Phase 5)

Test plan

uv run pytest tests/test_variant_evolution.py -q — 8/8 green
uv run pytest tests/test_taxonomist.py tests/test_evolve_taxonomist_integration.py -q — 18/18 green (Phase 2 baseline preserved)
uv run pytest tests/test_models_v2.py tests/test_db_v2.py tests/test_taxonomy_queries.py tests/test_taxonomy_api.py tests/test_report.py -q — 55/55 green (Phase 1 baseline preserved)
uv run ruff check skillforge/engine/variant_evolution.py skillforge/engine/evolution.py skillforge/agents/challenge_designer.py skillforge/agents/spawner.py skillforge/api/routes.py tests/test_variant_evolution.py — clean
Manual end-to-end via the browser (after merge — requires a real dev server + a real /new form submission with evolution_mode: atomic)

Next up

v2.0/phase4-engineer branch — the Engineer agent's real assembly logic, composite skill validation, integration test, and one refinement pass per assembly. Stacks on this PR's main.

🤖 Generated with Claude Code

Wave 3-1 — Variant evolution orchestrator - skillforge/engine/variant_evolution.py — new module. run_variant_evolution reads variant_evolutions rows for the parent run, sorts foundation-first, runs each dimension as a mini-evolution: design focused challenge -> spawn N variants -> run through Competitor -> judging pipeline -> pick winner -> persist as Variant row -> mark VariantEvolution complete. Capability dimensions receive the winning foundation as grounding. Calls a stub assembly that returns the foundation winner unchanged (Phase 4 will replace with the real Engineer). - skillforge/engine/evolution.py — new top-level dispatcher. When run.evolution_mode == "atomic", delegates to run_variant_evolution, emits evolution_complete with evolution_mode="atomic", and fires the post-run report. Falls back to molecular cleanly if no rows exist. Wave 3-2 — Scientist (focused per-dimension challenge) - skillforge/agents/challenge_designer.py — new design_variant_challenge function. Single streaming Anthropic call returning ONE focused Challenge per dimension. Reuses the existing JSON parser + retry path. Wave 3-3 — Spawner variant scope - skillforge/agents/spawner.py — new spawn_variant_gen0 function. Focused mini-SKILL.md package generator. For capability tier, the winning foundation is injected into the system prompt so spawned variants are compatible with the foundation's directory layout. Stamps dimension + tier keys into each spawned genome's frontmatter. Routes integration - skillforge/api/routes.py — _classify_run_via_taxonomist now also persists VariantEvolution rows (one per dimension) when the final run mode is atomic. Reorders save_run before the INSERTs so the FK on parent_run_id resolves. Test isolation fixes (latent bugs from earlier waves) - tests/test_taxonomy_api.py::test_list_family_variants_empty_by_default pinned to a known seed family (terraform-module-full) instead of fams[0], which Phase 3 tests now populate. - tests/test_evolve_taxonomist_integration.py mocks bypass real persistence, so routes.py downstream variant_evolution INSERTs needed manual taxonomy + family seeding. Mock slugs prefixed with test-fixture- to avoid colliding with the bootstrap loader's testing/unit-tests/python triple. tests/test_variant_evolution.py — 8 new tests covering tier sort, fitness aggregation, happy-path orchestration with full mock stack, empty-pending fallback, design_variant_challenge happy + multi-rejection, spawn_variant_gen0 happy + all-invalid rejection. QA - 81 isolated v2.0 tests pass (55 Phase 1 + 18 Phase 2 + 8 Phase 3) - ruff clean on all Phase 3 files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Full pipeline seed run #4 — no shortcuts. 12 Spawner + 48 Competitor dispatches + 1 Engineer. Both v1 (seed) and v2 (spawn) competed against 2 challenges per dimension (24 challenges total, balanced medium/hard). Real competition results: - v1 wins 2 dims (testing-workers 0.40>0.23, transactional-jobs 0.64>0.26) - v2 wins 5 dims (retry-strategy, cron-scheduling, return-values, recurring-jobs, worker-philosophy) - 5 ties (perform, args, unique, queues, telemetry) - Mean winning fitness: 0.487 Composite: 619 lines, 12 capability sections, 3 cross-cutting examples, 10 common mistakes. Foundation: transactional-saga philosophy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#31) Full pipeline seed run #4 — no shortcuts. 12 Spawner + 48 Competitor dispatches + 1 Engineer. Both v1 (seed) and v2 (spawn) competed against 2 challenges per dimension (24 challenges total, balanced medium/hard). Real competition results: - v1 wins 2 dims (testing-workers 0.40>0.23, transactional-jobs 0.64>0.26) - v2 wins 5 dims (retry-strategy, cron-scheduling, return-values, recurring-jobs, worker-philosophy) - 5 ties (perform, args, unique, queues, telemetry) - Mean winning fitness: 0.487 Composite: 619 lines, 12 capability sections, 3 cross-cutting examples, 10 common mistakes. Foundation: transactional-saga philosophy. Co-authored-by: Matt (via Claude Code) <matt@skillforge.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ty13r merged commit eef3e5d into main Apr 11, 2026

This was referenced Apr 11, 2026

v2.0 Phase 5: Advanced UI + swap/evolve endpoints (v2.0 feature-complete) #6

Merged

post(v2.0): live atomic e2e test + 4 bug fixes + real integration check + journal #011 #7

Merged

ty13r mentioned this pull request Apr 12, 2026

seed(elixir-oban-worker): full 12-dimension run with real competition #31

Merged

4 tasks

ty13r deleted the v2.0/phase3-variant-evolution branch April 19, 2026 20:21

ty13r mentioned this pull request Apr 20, 2026

Known limitation: composite scorer is Elixir-scoped #58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0 Phase 3: Variant evolution orchestrator + Scientist + Spawner#4

v2.0 Phase 3: Variant evolution orchestrator + Scientist + Spawner#4
ty13r merged 1 commit intomainfrom
v2.0/phase3-variant-evolution

ty13r commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ty13r commented Apr 11, 2026

Summary

Routes wiring

Stub assembly (Phase 4 will replace)

Bug fixes (latent in earlier phases)

Quantitative signal

What's not in this PR

Test plan

Next up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant