Skip to content

v2.0 Phase 3: Variant evolution orchestrator + Scientist + Spawner#4

Merged
ty13r merged 1 commit intomainfrom
v2.0/phase3-variant-evolution
Apr 11, 2026
Merged

v2.0 Phase 3: Variant evolution orchestrator + Scientist + Spawner#4
ty13r merged 1 commit intomainfrom
v2.0/phase3-variant-evolution

Conversation

@ty13r
Copy link
Copy Markdown
Owner

@ty13r ty13r commented Apr 11, 2026

Summary

Phase 3 of v2.0 — the variant evolution orchestrator. This is the wave where atomic evolution actually starts running. One commit, three waves, stacks on Phase 2 (PR #3 / ca7d154).

  • Wave 3-1: skillforge/engine/variant_evolution.pyrun_variant_evolution(run) reads variant_evolutions rows for the parent run, sorts foundation-first, runs each dimension as a mini-evolution (design focused challenge → spawn N variants → run via Competitor → judging pipeline → pick winner → persist as Variant row → mark VariantEvolution complete), then calls a stub assembly that returns the foundation winner unchanged. Wired into evolution.py::run_evolution as a top-level dispatcher: run.evolution_mode == "atomic" delegates here, falls back to molecular cleanly if no rows exist.
  • Wave 3-2: design_variant_challenge(specialization, dimension) added to challenge_designer.py. Single streaming Anthropic call returning ONE focused Challenge per dimension.
  • Wave 3-3: spawn_variant_gen0(specialization, dimension, foundation_genome, pop_size) added to spawner.py. Focused mini-SKILL.md generator. For capability tier, the winning foundation is injected into the system prompt so spawned variants are compatible with the foundation's directory layout. Stamps dimension + tier keys into each spawned genome's frontmatter.

Routes wiring

_classify_run_via_taxonomist in routes.py now persists VariantEvolution rows (one per dimension) when the final run mode is atomic, so the orchestrator has work to pick up. The route handler reorders save_run to come before the variant_evolution INSERTs so the FK on parent_run_id resolves.

Stub assembly (Phase 4 will replace)

The Engineer agent's real merge logic is Wave 4-1. Phase 3 ships with a stub that:

  • Returns the highest-fitness foundation variant as the composite when one exists.
  • Falls back to the highest-fitness capability variant if no foundation dimension was defined.
  • Emits assembly_started + assembly_complete events with synergy_ratio: null.

This unblocks the full atomic event sequence (taxonomy_classifieddecomposition_completevariant_evolution_started × N → variant_evolution_complete × N → assembly_startedassembly_completeevolution_complete) without blocking on Phase 4.

Bug fixes (latent in earlier phases)

Phase 3 surfaced two test-isolation bugs from earlier waves:

  1. test_list_family_variants_empty_by_default asserted that fams[0] had zero variants — but Phase 3's happy-path test now persists variants under a fam_phase3 family, polluting the global DB. Pinned the assertion to a known seed family (terraform-module-full) that no other test populates.
  2. test_evolve_taxonomist_integration — the mocks bypass the real classify_and_decompose persistence path, so the routes.py downstream code that persists VariantEvolution rows (with FK to skill_families) failed because the mocked family was never inserted. Fix: tests now manually seed the mocked taxonomy + family before the request fires. Mock slugs prefixed with test-fixture- to avoid colliding with the bootstrap loader's testing/unit-tests/python triple.

Quantitative signal

  • 81 v2.0 backend tests passing (55 Phase 1 + 18 Phase 2 + 8 Phase 3)
  • ruff clean on all Phase 3 files
  • Frontend untouched in Phase 3 (Phase 4 / 5 surface the variant breakdown UI)

What's not in this PR

  • Real Engineer assembly (Wave 4-1) — composite skill merging with conflict resolution, integration test, refinement pass
  • Per-dimension breeding loops — Phase 3 spawns once and picks the winner. Wave 4 will add a multi-generation breeding loop per dimension
  • Variant breakdown UI (Phase 5)

Test plan

  • uv run pytest tests/test_variant_evolution.py -q — 8/8 green
  • uv run pytest tests/test_taxonomist.py tests/test_evolve_taxonomist_integration.py -q — 18/18 green (Phase 2 baseline preserved)
  • uv run pytest tests/test_models_v2.py tests/test_db_v2.py tests/test_taxonomy_queries.py tests/test_taxonomy_api.py tests/test_report.py -q — 55/55 green (Phase 1 baseline preserved)
  • uv run ruff check skillforge/engine/variant_evolution.py skillforge/engine/evolution.py skillforge/agents/challenge_designer.py skillforge/agents/spawner.py skillforge/api/routes.py tests/test_variant_evolution.py — clean
  • Manual end-to-end via the browser (after merge — requires a real dev server + a real /new form submission with evolution_mode: atomic)

Next up

v2.0/phase4-engineer branch — the Engineer agent's real assembly logic, composite skill validation, integration test, and one refinement pass per assembly. Stacks on this PR's main.

🤖 Generated with Claude Code

Wave 3-1 — Variant evolution orchestrator
- skillforge/engine/variant_evolution.py — new module. run_variant_evolution
  reads variant_evolutions rows for the parent run, sorts foundation-first,
  runs each dimension as a mini-evolution: design focused challenge ->
  spawn N variants -> run through Competitor -> judging pipeline -> pick
  winner -> persist as Variant row -> mark VariantEvolution complete.
  Capability dimensions receive the winning foundation as grounding.
  Calls a stub assembly that returns the foundation winner unchanged
  (Phase 4 will replace with the real Engineer).
- skillforge/engine/evolution.py — new top-level dispatcher. When
  run.evolution_mode == "atomic", delegates to run_variant_evolution,
  emits evolution_complete with evolution_mode="atomic", and fires the
  post-run report. Falls back to molecular cleanly if no rows exist.

Wave 3-2 — Scientist (focused per-dimension challenge)
- skillforge/agents/challenge_designer.py — new design_variant_challenge
  function. Single streaming Anthropic call returning ONE focused
  Challenge per dimension. Reuses the existing JSON parser + retry path.

Wave 3-3 — Spawner variant scope
- skillforge/agents/spawner.py — new spawn_variant_gen0 function.
  Focused mini-SKILL.md package generator. For capability tier, the
  winning foundation is injected into the system prompt so spawned
  variants are compatible with the foundation's directory layout.
  Stamps dimension + tier keys into each spawned genome's frontmatter.

Routes integration
- skillforge/api/routes.py — _classify_run_via_taxonomist now also
  persists VariantEvolution rows (one per dimension) when the final run
  mode is atomic. Reorders save_run before the INSERTs so the FK on
  parent_run_id resolves.

Test isolation fixes (latent bugs from earlier waves)
- tests/test_taxonomy_api.py::test_list_family_variants_empty_by_default
  pinned to a known seed family (terraform-module-full) instead of
  fams[0], which Phase 3 tests now populate.
- tests/test_evolve_taxonomist_integration.py mocks bypass real
  persistence, so routes.py downstream variant_evolution INSERTs needed
  manual taxonomy + family seeding. Mock slugs prefixed with
  test-fixture- to avoid colliding with the bootstrap loader's
  testing/unit-tests/python triple.

tests/test_variant_evolution.py — 8 new tests covering tier sort,
fitness aggregation, happy-path orchestration with full mock stack,
empty-pending fallback, design_variant_challenge happy + multi-rejection,
spawn_variant_gen0 happy + all-invalid rejection.

QA
- 81 isolated v2.0 tests pass (55 Phase 1 + 18 Phase 2 + 8 Phase 3)
- ruff clean on all Phase 3 files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ty13r ty13r merged commit eef3e5d into main Apr 11, 2026
ty13r pushed a commit that referenced this pull request Apr 12, 2026
Full pipeline seed run #4 — no shortcuts. 12 Spawner + 48 Competitor
dispatches + 1 Engineer. Both v1 (seed) and v2 (spawn) competed
against 2 challenges per dimension (24 challenges total, balanced
medium/hard).

Real competition results:
- v1 wins 2 dims (testing-workers 0.40>0.23, transactional-jobs 0.64>0.26)
- v2 wins 5 dims (retry-strategy, cron-scheduling, return-values,
  recurring-jobs, worker-philosophy)
- 5 ties (perform, args, unique, queues, telemetry)
- Mean winning fitness: 0.487

Composite: 619 lines, 12 capability sections, 3 cross-cutting examples,
10 common mistakes. Foundation: transactional-saga philosophy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ty13r added a commit that referenced this pull request Apr 12, 2026
…#31)

Full pipeline seed run #4 — no shortcuts. 12 Spawner + 48 Competitor
dispatches + 1 Engineer. Both v1 (seed) and v2 (spawn) competed
against 2 challenges per dimension (24 challenges total, balanced
medium/hard).

Real competition results:
- v1 wins 2 dims (testing-workers 0.40>0.23, transactional-jobs 0.64>0.26)
- v2 wins 5 dims (retry-strategy, cron-scheduling, return-values,
  recurring-jobs, worker-philosophy)
- 5 ties (perform, args, unique, queues, telemetry)
- Mean winning fitness: 0.487

Composite: 619 lines, 12 capability sections, 3 cross-cutting examples,
10 common mistakes. Foundation: transactional-saga philosophy.

Co-authored-by: Matt (via Claude Code) <matt@skillforge.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ty13r ty13r deleted the v2.0/phase3-variant-evolution branch April 19, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant