Skip to content

refactor(agents): decompose 4 backend hotspot files into packages#57

Merged
ty13r merged 4 commits intomainfrom
refactor/agent-decomposition
Apr 20, 2026
Merged

refactor(agents): decompose 4 backend hotspot files into packages#57
ty13r merged 4 commits intomainfrom
refactor/agent-decomposition

Conversation

@ty13r
Copy link
Copy Markdown
Owner

@ty13r ty13r commented Apr 20, 2026

Summary

Decomposes the four remaining backend hotspots — each agent/engine file over the 500-LOC Python ceiling in docs/clean-code.md §2. Every split follows the pure-planner / thin-I/O-shell pattern: prompt builders + data helpers in private modules, LLM/SDK/disk I/O in one clearly-labeled module, top-level orchestrators call through.

File Before After (main) Package layout
managed_agents.py 620 7-submodule package _constants, environments, skills, agents, sessions, output
variant_evolution.py 620 345 (dimension.py) _helpers, dimension, assembly, main
breeder.py 629 213 (_reports.py) _ranking, _prompts, _reports, main, bible
spawner.py 763 411 (main.py) _helpers, _prompts, main

Largest submodule anywhere is 416 LOC, under the 500-LOC cap.

Test-patch compatibility

Two of the four packages required a lazy-lookup shim for tests that patch("skillforge.agents.breeder.breed_next_gen") (or similar) on the package root. Binding the reference at import time in a submodule shadows the patch; calling through the package namespace at runtime lets it take effect.

This was load-bearing — without the shim the breeder/spawner test suites silently made real LLM calls (11 minutes of API spend before first failure on the spawner).

Also re-exported a handful of private helpers (_extract_skill_name_from_md, _normalize_output_path, _extract_lessons_and_report, _read_bible_patterns, BIBLE_DIR) through each package's __init__ so existing monkeypatch.setattr(module, "PRIVATE_NAME", ...) calls resolve.

Test plan

  • uv run ruff check skillforge — clean
  • uv run mypy skillforge — 83 files pass (up from 65)
  • uv run pytest tests/ — 411 passed, 2 skipped (unchanged)
  • cd frontend && npm run build / lint / format:check / test — untouched, still green

Public API unchanged

Every import site keeps its original path — from skillforge.agents import breeder; breeder.breed(...), from skillforge.agents.managed_agents import upload_skill, from skillforge.engine.variant_evolution import run_variant_evolution all still work.

🤖 Generated with Claude Code

Matt (via Claude Code) and others added 4 commits April 20, 2026 01:19
Seven-submodule decomposition along SDK-resource-lifecycle seams:

  managed_agents/__init__.py         barrel + full docstring with all
                                     the Step-0 smoke test SDK quirks
  managed_agents/_constants.py (35)  beta headers + make_client + the
                                     $0.08/hr session rate
  managed_agents/environments.py (54) create / archive env
  managed_agents/skills.py      (167) upload + 3-step archive dance +
                                     archive_skill_safe + name extractor
  managed_agents/agents.py       (57) create / archive competitor agent
  managed_agents/sessions.py    (124) create / archive session +
                                     send_user_message + event polling
  managed_agents/output.py      (211) post-run trace introspection —
                                     written_files, bash-write parsing,
                                     token usage, runtime cost

Every public name is re-exported from the package __init__ so 38 call
sites keep their ``from skillforge.agents import managed_agents`` +
``managed_agents.upload_skill(...)`` usage unchanged.

Tests against two private helpers (_extract_skill_name_from_md,
_normalize_output_path) were accessing them on the module directly;
those are re-exported through the barrel so test patches continue to
resolve.

QA: ruff + mypy + 411 pytest (unchanged) all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposed by orchestration level — the mini-evolution loop, the
assembly step, and the top-level run entry each live in their own file:

  variant_evolution/__init__.py    barrel + re-exports run_variant_evolution
  variant_evolution/_helpers.py    constants + _tier_sort_key
                                   + _aggregate_fitness
  variant_evolution/dimension.py   _run_dimension_mini_evolution
                                   (challenge -> spawn -> compete ->
                                   score -> judge -> breed -> pick winner)
  variant_evolution/assembly.py    _real_assembly (Engineer call +
                                   integration check)
  variant_evolution/main.py        run_variant_evolution orchestrator

Largest submodule is dimension.py at 345 LOC, under the 500-LOC ceiling
in docs/clean-code.md §2. Prior to this split, the monolith held a
single 311-LOC function (_run_dimension_mini_evolution) alongside the
assembly logic and the main loop — the file was 620 LOC and every
refactor touched everything.

Test-access surface preserved: tests/test_variant_evolution.py imports
_aggregate_fitness and _tier_sort_key directly from the package, so the
__init__ re-exports them.

Also rolled in: _extract_skill_name_from_md and _normalize_output_path
added to the managed_agents package __all__ (they were already
re-exported for test access, just needed the __all__ entry to satisfy
F401).

QA: ruff + mypy + 411 pytest all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposed by responsibility. The six section comments in the monolith
("slot allocation", "ranking", "main breed", "mutation prompts",
"lessons + reports", "bible publishing") each correspond to a submodule:

  breeder/__init__.py   barrel — re-exports breed + compute_slots +
                        rank_skills + publish_findings_to_bible, plus
                        legacy aliases (breed_next_gen, spawn_gen0,
                        BIBLE_DIR) that tests patch on the package root
  breeder/_ranking.py   compute_slots + rank_skills + _aggregate_fitness
  breeder/_prompts.py   diagnostic + crossover instruction templates
                        + breeding-context formatter (pure)
  breeder/_reports.py   _extract_lessons_and_report + siblings (LLM calls)
  breeder/main.py       breed() + _carry_elite (orchestrator)
  breeder/bible.py      publish_findings_to_bible (disk I/O)

Largest submodule is _reports.py at 213 LOC, under the 500-LOC Python
ceiling in docs/clean-code.md §2.

Test-patch compatibility
------------------------
Tests patch three functions on the package root:
``breeder.breed_next_gen``, ``breeder.spawn_gen0``,
``breeder._extract_lessons_and_report``. Those patches don't propagate
to bindings in submodules, so ``main.breed()`` now resolves each
through the package namespace at call time (``_pkg().breed_next_gen``
etc.). BIBLE_DIR follows the same pattern in bible.py.

QA: ruff + mypy + 411 pytest (unchanged) all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposed along the pure-planner / thin-I/O-shell seam called out in
docs/clean-code.md §7. Four submodules:

  spawner/__init__.py     barrel — re-exports four entry points plus
                          every helper tests patch on the package root
                          (_generate, _read_bible_patterns, BIBLE_DIR, ...)
  spawner/_helpers.py    _generate (LLM streaming) + _parse_genomes
                         + _auto_repair_missing_references
                         + _validate_genomes + _read_bible_patterns
  spawner/_prompts.py    all _build_*_system_prompt string templates
                         + embedded JSON schema constants
                         (pure — no I/O, no LLM calls)
  spawner/main.py        four public entry points:
                         spawn_gen0, breed_next_gen,
                         spawn_from_parent, spawn_variant_gen0

Largest submodule is main.py at 411 LOC, under the 500-LOC ceiling.

Test-patch compatibility
------------------------
Same pattern as the breeder split: tests patch ``spawner._generate``,
``spawner._read_bible_patterns``, and ``spawner.BIBLE_DIR`` on the
package root. Those patches do not propagate to direct imports in
submodules, so ``main._generate`` / ``main._read_bible_patterns``
and ``_helpers._read_bible_patterns`` now resolve the reference
through the package namespace at call time.

Without these shims the test suite made real LLM calls for 11 minutes
before first failure — the fix is load-bearing for both test speed
and API-cost safety.

QA: ruff + mypy (83 files) + 411 pytest all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ty13r ty13r merged commit fcc007f into main Apr 20, 2026
2 checks passed
@ty13r ty13r deleted the refactor/agent-decomposition branch April 20, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant