Replies: 1 comment
-
|
Genuine question I'm wrestling with: If "dreaming" (background reflection) has near-zero ROI compared to in-session corrections... should we kill the background evolution job entirely and only evolve through live failures? Our data says yes: 27 corrections, all from real-time user pushback. Zero from background reflection cycles. But I can't shake the feeling that there's a threshold effect — maybe after 100+ corrections the background synthesizer starts finding cross-pattern insights humans miss. Anyone running background reflection that's actually produced behavioral changes they can point to? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hook
"AI Dreaming" sounds like science fiction. Anthropic's Claude Code has it. Community projects are popping up. People are excited.
But when you look under the hood, dreaming is just... background note-taking. And it hits a ceiling fast.
Here's what dreaming actually is, why it plateaus, and where the real self-improvement happens.
What "Dreaming" Actually Does
Strip away the mystique:
~/.claude/projects/<slug>/memory/*.mdThat's it. It's
grep interesting things from transcripts → append to memory files.The output is stuff like:
Useful? Yes. Revolutionary? No. Your intern could do this with a highlighter.
The L0 Ceiling
We call this Level 0 optimization — improving the agent's knowledge without improving its judgment.
Here's the hierarchy:
Dreaming operates exclusively at L0.
Why L0 Plateaus
We ran a confidence-gated L0 optimization pipeline for 6 weeks. Results:
The math is simple: if your agent's judgment is good, it self-corrects during execution regardless of skill text quality. If judgment is bad, no amount of memory text saves you.
A strong OS with mediocre apps > a weak OS with perfect apps.
The Architectural Gap Nobody's Talking About
Claude Code's dreaming has a deeper problem (issue #59904 nails it):
Dreaming writes to auto memory. Auto memory isn't guaranteed to load.
The flow:
rm -rfin production dir")memory/security.mdMEMORY.mdfirst 200 lines load. Topic files load on demandsecurity.md→ rule not in context → violation recursThe promotion path from "discovered" to "always enforced" is completely missing. The agent can know something and simultaneously never apply it.
Where Real Evolution Happens
Every correction that actually changed our agent's behavior modified principles or rules (L1-L2), not memory text (L0).
Examples of L0 changes that didn't stick:
Examples of L1-L2 changes that stuck permanently:
The pattern: knowledge doesn't change behavior. Principles do.
What Actually Works (Lessons From 27 Corrections)
After tracking 27 agent corrections over 2 months:
The progression that works:
Not:
Practical Implications for Builders
If you're building agent self-improvement:
Don't stop at L0. Memory consolidation is table stakes. It's the equivalent of taking notes in a lecture — necessary but not sufficient for learning.
Build a promotion path. Knowledge discovered must have a mechanism to become enforced. Without this, dreaming is write-only memory.
Track correction classes, not instances. If the same type of mistake recurs 3+ times with different content, your agent has a judgment bug, not a knowledge gap.
Mechanical gates > passive knowledge. A code-enforced checkpoint that blocks completion is worth 100 memory entries that say "remember to check."
Measure: did the same correction class stop recurring? That's the only real metric. Memory file count, consolidation frequency — these are vanity metrics.
The Honest Assessment
Dreaming as shipped in Claude Code today is:
/dreamcommand doesn't work)Is the concept valuable? Absolutely — background processing of session history is a good idea. But the current implementation is closer to "auto-save highlights" than "artificial dreaming."
The name oversells. The reality is a starting point, not a destination.
What Would Actual "Dreaming" Look Like?
If we take the sleep/dreaming metaphor seriously:
That means:
None of this requires running between sessions. It requires a different target — not "what do I know" but "how do I decide."
The gap between "remembering facts" and "improving judgment" is the gap between a notebook and a brain. Dreaming, as currently implemented, is a notebook that writes itself. Useful. But let's not confuse it with cognition.
Beta Was this translation helpful? Give feedback.
All reactions