DDD Cultivation — The Full Story: Decisions, Failures, and Evidence #40
xg-gh-25
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Most AI agent "knowledge" is either a dumped README that rots in 2 weeks, or a RAG pipeline that retrieves fragments without judgment. We built a third option: structured domain knowledge that grows from normal work, feeds multiple delivery engines, and never goes stale — because the system that uses it is also the system that maintains it.
TL;DR
The Problem: Domain-Blind AI
Every AI coding agent faces the same gap: the model knows programming but not your domain.
This isn't hypothetical. We have 9 CMHK skills that use DDD context. Before DDD was populated, each skill took 3-5 iterations to get the SQL right. After: first-attempt accuracy for domain-specific logic.
Why existing approaches fail:
The Solution: 3-Layer Architecture
Layer 1: Interface (What Humans See)
4 markdown documents per project, each answering one judgment axis:
Why exactly 4? We tested with 2 (too sparse — agent lacked judgment context), 6 (too fragmented — context switching cost), and settled on 4 because they map cleanly to the decision tree an agent traverses: desirability → feasibility → history → timing.
Layer 2: Intelligence (What Machines Maintain)
Key insight: Health drives AI trust, not human action. Stale content doesn't generate tasks — it's marked
[!]in session briefings so the agent knows "trust this section less." Zero human maintenance burden.Layer 3: Orchestration (What Runs Automatically)
7 independent feed channels, fault-isolated (one crash ≠ all fail):
[!]warnings, refresh proposalsTime budget: 25 seconds total, 5 proposals max per session, mtime filter (30 days). These limits came from Failure #2 below.
Key Decisions (Why We Chose What We Chose)
D1: Judgment Substrate, Not Knowledge Base
DDD answers "what helps AI judge better?" — not "what's interesting?" This filter rejects activity logs, status updates, raw metrics. Only judgment-shaping content enters DDD.
Why: Information saturation kills agents faster than information scarcity. A 10,000-line TECH.md where 80% is status updates makes the 20% that matters invisible.
D2: Reuse Existing Extraction, Don't Build New
The memory pipeline already produces StructuredSummary (decisions, lessons, corrections) from every session. DDD Cultivation consumes the SAME output — zero new LLM calls.
Why: Every new extraction pipeline is a maintenance liability. Reusing means DDD quality improves as memory quality improves — compound, not additive.
D3: Tiered Autonomy (Additive Auto-Apply + Risky Escalation)
Original design: binary "propose everything, never write silently." Problem: proposal fatigue. Nobody reviews 50 proposals.
Revised design:
Why: Additive-only changes have zero risk. A new entry in IMPROVEMENT.md "What Failed" can't corrupt existing content. Modifications can — those need judgment.
D4: Entity Index as Text, Not Graph Database
At 5 projects / ~120 entities, a flat text routing table in PROJECTS.md is sufficient. Agent reads it directly from system prompt — no query language, no database, no infrastructure.
Scalability trigger: Reconsider at ~500 entities or when cross-project routing errors exceed 3%.
D5: All Filesystem, No SQLite
Proposal volume: ~90 pending max. Git auditability matters more than query speed.
Scalability trigger: If volume exceeds ~500 proposals (5+ months with no approvals), migrate to SQLite.
D6: Intentional Duplication Across Tiers
The same fact can exist in DailyActivity (ephemeral, raw), MEMORY.md (agent-scoped, curated), and DDD (project-scoped, structured). Different consumers, different lifecycles, allowed overlap.
Why: Suppressing DDD proposals because "it's in MEMORY.md" loses the project-scoped perspective. A fact in MEMORY says "I learned X." The same fact in TECH.md says "when working on this project, X matters."
D7: Progressive Loading (Section-Level, Not Document-Level)
Large TECH.md files (97K for SwarmAI) never load fully. Agent reads section TOC, pulls only relevant sections (~500 tokens each).
Budget impact: Active project ~5K + entity index ~2K + cross-project pulls ~1.5K = ~8.5K additional tokens worst-case. At 91K effective budget, DDD adds 9% overhead.
What Actually Failed
Failure 1: T2 Keyword Classifier — 100% False Negative on Production
What happened: Built a classifier to route corrections to the right DDD doc (TECH vs IMPROVEMENT). Tested with 29 synthetic corrections containing magic words ("daemon", "subprocess", "nc -z"). All tests green. Adversarial clean.
Reality: 5/5 real corrections from production returned
None.Root cause: Test data crafted by the author who wrote the regex. Real corrections are narrative behavioral ("Agent opened DMG instead of installing") with zero keyword overlap.
Fix: PE-1 fallback (bypass keyword gate entirely). Added RP31+RP32 to pipeline REVIEW patterns.
Lesson: Mock data written by the same person who wrote the matcher will ALWAYS pass. Test with real production data or don't test at all.
Failure 2: Auto-Cultivation Hook — O(n) on No-Op Path
What happened: Hook passed all quality gates (TDD 7/7, adversarial HIGH fixed). Shipped.
Reality: Every session scanned ALL 141
run.jsonfiles to find... zero uncultivated runs (99% of invocations do nothing).Root cause: Review stage focused on action-path correctness. Nobody analyzed the no-op path — "what happens when there's nothing to do?"
Fix: Added RP30 pattern rule ("hook no-op path scaling"). Time budget + mtime filter implemented.
Lesson: The no-op path is the most-executed path. If it's O(n), you have a systematic waste bug that's invisible to unit tests.
Failure 3: v1 Batch-on-Close Timing Gap
What happened: v1 only cultivated at session close. A multi-session feature produced knowledge across 3 sessions. Session 2 and 3 lacked the context from Session 1's cultivation (hadn't run yet).
Root cause: Batch-on-close means knowledge is always one session behind.
Fix: v2 moved to event-driven. Channel 2 fires on pipeline REFLECT (immediate). Session N's output is available to Session N+1 within seconds of session close.
Failure 4: Silent REVIEW Skip in 1029-Line Changeset
What happened: DDD cultivation hook extension (17 tests, 4 commits, 1029-line diff). Pipeline REVIEW stage completed via
run-updatewithout reading a single line.Root cause: EVALUATE and PLAN were thorough, creating false confidence. REVIEW felt "already validated."
Fix: Check 8d added — review effort must be proportional to changeset size.
Lesson: Thoroughness in early stages creates a cognitive trap that makes shortcuts in later stages feel justified.
Real Metrics (Measured, Not Projected)
Weekly Output (2026-05-18)
DDD Health (Live Dashboard)
Implementation Size
Anti-Patterns (Don't Do This)
When This Approach Breaks Down
DDD Cultivation is designed for one builder + AI or small teams. Honest limitations:
We're at 5 projects, 1 builder, ~80 proposals/week. Nowhere near these limits. The design is intentionally simple for our current scale, with clear triggers for when to upgrade each dimension.
The Compound Effect
DDD isn't valuable in isolation. Its value comes from feeding multiple engines simultaneously:
What Pipeline learns → makes Pollinate smarter.
Example: Pipeline delivers a CMHK monthly report, discovers
territory_ownercolumn doesn't exist (TECH.md pitfall). Next week, Pollinate generates GTM content for the same BU — it won't reference territory_owner data because TECH.md now says it doesn't exist.What Pollinate learns → makes Pipeline safer.
Example: Pollinate produces brand content, discovers a non-goal ("not multi-tenant SaaS" in PRODUCT.md). Pipeline won't build SaaS features because the same PRODUCT.md gates its EVALUATE stage.
This cross-pollination happens WITHOUT the engines communicating. They share a substrate. The substrate grows. Both get smarter. That's the compound flywheel.
Starting From Zero (If You Want to Build This)
Phase 1 (1 hour): Create 4 empty docs per project. Write PRODUCT.md (vision, priorities, non-goals). Fill TECH.md with architecture decisions you already know. Leave IMPROVEMENT.md and PROJECT.md minimal.
Phase 2 (automatic): Work normally. After each significant session, extract 2-3 lessons into IMPROVEMENT.md ("What Failed" or "What Worked"). This is the only human effort required.
Phase 3 (when ready): Build the Intelligence layer. Health scoring tells you what's stale. Maturity tracking tells you what's trusted. Entry lifecycle tells you what to archive.
Phase 4 (optional): Build Orchestration. 7 channels automate extraction. Tiered autonomy handles the routine. You only review escalations.
Most teams will get 80% of the value from Phase 1-2 alone. The machine layers (3-4) are for when you want zero-maintenance knowledge that stays fresh indefinitely.
Summary: Why DDD Cultivation Works
From 28 DDD sections (2026-03-24) to 110+ (2026-05-16). Zero documentation sprints. All from normal work — 8 automated channels + 1 manual habit (extract 2-3 lessons after significant sessions). The knowledge that survives is the knowledge that costs nothing to maintain.
Published from SwarmAI — where 5,100 lines of living knowledge feed every decision, and the system that uses it is the same system that grows it. Source
Beta Was this translation helpful? Give feedback.
All reactions