You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You don't need a CS degree to understand how AI coding agents work. You just need to understand one idea: the wrapper matters more than the engine.
The Aha Moment
Here's something that surprised me: Claude Code, Cursor, Kiro, Devin — they all work the same way under the hood.
Every single one of these AI coding tools follows this pattern:
Your App (the wrapper)
│
├── spawns a subprocess
│ ↓
│ AI Engine (a CLI tool)
│ - reads instructions from stdin
│ - calls APIs and tools
│ - writes results to stdout
│ ↑
└── reads the output, manages everything else
That "AI Engine" is just a command-line program. It has no memory. It doesn't know what happened yesterday. It doesn't know when to restart itself. It's a stateless worker that does exactly one thing: take an instruction, think about it, use tools, return a result.
Everything else — memory, self-healing, running in the background, learning from mistakes — comes from the wrapper. The industry calls this wrapper an Agent Harness.
The Self-Driving Car Analogy
This maps perfectly to how self-driving cars work:
Car Component
Agent Component
What It Does
Engine
AI CLI (Claude, Kiro, etc.)
Raw power — moves things forward
Sensors
Context files, tools, MCPs
Perceives the world
Control system
Pipeline, routing logic
Makes moment-to-moment decisions
Navigation
Goal decomposition
Plans multi-step journeys
The whole car
Agent Harness
Everything working together
You can swap the engine. A Tesla can run on different motor versions. An Agent Harness can swap from Claude to Kiro to Gemini. The engine is commodity. The system around it is the product.
Five Levels of Agent Autonomy
Just like self-driving cars have levels (L1 through L5), agent harnesses do too:
Level 1: Assist
What the harness does: Formats a prompt, makes one API call, shows you the result.
What you do: Everything else. You decide what to ask, when to ask it, what to do with the answer.
Example: ChatGPT web interface, API playground.
Car analogy: Cruise control. It holds speed, you do everything else.
Level 2: Copilot
What the harness does: Watches what you're doing, suggests completions, routes to the right tool.
What you do: Accept or reject suggestions. Direct every interaction.
Example: GitHub Copilot, Cursor Tab.
Car analogy: Lane-keeping assist. Nudges the wheel, but you're driving.
Level 3: Agent
What the harness does: Takes a goal, breaks it into steps, executes multiple tools, asks you at checkpoints.
What you do: Set the goal. Review at checkpoints. Approve the result.
Example: Claude Code in terminal, Kiro IDE, Windsurf.
Car analogy: Highway autopilot. Handles the highway, but you take over for exits and cities.
Level 4: Autonomous
What the harness does: Everything L3 does, PLUS: runs 24/7 without you present, remembers across sessions, self-heals when things break, handles multiple tasks concurrently.
What you do: Set intent. Check in occasionally. Handle the exceptions it escalates.
Car analogy: Full self-driving in mapped areas. You set the destination, the car handles the journey. But you might need to take over in new situations.
Level 5: Self-Evolving
What the harness does: Everything L4 does, PLUS: decomposes high-level goals into sub-tasks spanning days, learns from its own mistakes, expands its own capabilities.
What you do: Set direction. ("Make this product better by Friday.")
Example: No complete L5 exists yet (as of 2026). Some systems have L5 capabilities in narrow domains.
Car analogy: A car that redesigns its own navigation algorithm after getting lost once.
What Makes Each Level Different (The Capabilities Stack)
Each level is cumulative — L4 includes everything from L1-L3:
Capability
L1
L2
L3
L4
L5
Execute a single tool call
Yes
Yes
Yes
Yes
Yes
Inject relevant context automatically
Yes
Yes
Yes
Yes
Multi-step reasoning with tools
Yes
Yes
Yes
Remember things across sessions
Yes
Yes
Run without a human present
Yes
Yes
Recover from crashes automatically
Yes
Yes
Handle multiple tasks at once
Yes
Yes
Improve its own behavior over time
Yes
Break big goals into multi-day plans
Yes
The Key Insight: The Engine Is Not The Moat
Here's what most people get wrong: they think the AI model is the product. It's not. The model is the engine. The harness is the product.
Why? Because:
Models are interchangeable. Today it's Claude. Tomorrow it might be Gemini or an open-source model. If your harness is well-designed, swapping takes a day.
The harness is where intelligence compounds. Memory, learned preferences, domain knowledge, self-correction patterns — all of this lives in the harness, not the model.
The model has no lifecycle. It doesn't know when to wake up, when to retry, when to escalate. The harness provides all lifecycle intelligence.
Everyone has access to the same models. Anyone can call Claude's API. What you can't easily replicate is 100 days of accumulated memory + 27 self-corrections + domain expertise across 7 projects.
Under The Hood: How It Actually Works
For the technically curious, here's the actual mechanism:
# 1. Set up the environment (what the AI engine will "know")os.environ["CLAUDE_CODE_USE_BEDROCK"] ="true"os.environ["AWS_REGION"] ="us-east-1"# 2. Spawn the AI engine as a subprocessprocess=subprocess.Popen(
["claude"], # The CLI toolstdin=subprocess.PIPE, # We send instructions herestdout=subprocess.PIPE, # We read results here
)
# 3. Send a message (JSON over stdin)process.stdin.write(json.dumps({
"type": "user_message",
"content": "Fix the bug in auth.py"
}))
# 4. Read the streaming response (JSON over stdout)forlineinprocess.stdout:
event=json.loads(line)
# Handle: text, tool_calls, errors, completion
That's it. That's the entire interface between a harness and its engine. Everything else — the memory system, the job scheduler, the self-healing, the context engineering — is harness logic that wraps this simple pipe.
Why This Matters For You
If you're using these tools: Understanding the level helps you set expectations. Don't expect L3 tools (Claude Code terminal) to remember what you did yesterday — they can't. That's an L4 capability.
If you're building with AI: Focus on the harness, not the model. Your competitive advantage is in context engineering, memory architecture, and lifecycle management — not in which model you call.
If you're evaluating AI tools: Ask "what level is this?" A tool claiming to be "autonomous" but requiring you to restart it after every crash is L3 at best, regardless of marketing.
The Market Map (2026)
Product
Level
Key Harness Capability
ChatGPT / Claude.ai
L1-2
Conversation memory (L2)
GitHub Copilot
L2
Code context awareness
Cursor
L2-3
Multi-file reasoning
Claude Code (terminal)
L3
Tool use + checkpoints
Kiro IDE
L3
SDD specs + multi-file
Windsurf
L3
Multi-step flows
Devin
L3-4
Background execution
SwarmAI
L4 (L5 partial)
Full harness: daemon + memory + evolution
The race isn't about who has the best engine. It's about who builds the best harness.
Published from SwarmAI — an L4 Agent Harness built by one person + AI, proving that harness engineering is the real multiplier.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
The Aha Moment
Here's something that surprised me: Claude Code, Cursor, Kiro, Devin — they all work the same way under the hood.
Every single one of these AI coding tools follows this pattern:
That "AI Engine" is just a command-line program. It has no memory. It doesn't know what happened yesterday. It doesn't know when to restart itself. It's a stateless worker that does exactly one thing: take an instruction, think about it, use tools, return a result.
Everything else — memory, self-healing, running in the background, learning from mistakes — comes from the wrapper. The industry calls this wrapper an Agent Harness.
The Self-Driving Car Analogy
This maps perfectly to how self-driving cars work:
You can swap the engine. A Tesla can run on different motor versions. An Agent Harness can swap from Claude to Kiro to Gemini. The engine is commodity. The system around it is the product.
Five Levels of Agent Autonomy
Just like self-driving cars have levels (L1 through L5), agent harnesses do too:
Level 1: Assist
What the harness does: Formats a prompt, makes one API call, shows you the result.
What you do: Everything else. You decide what to ask, when to ask it, what to do with the answer.
Example: ChatGPT web interface, API playground.
Car analogy: Cruise control. It holds speed, you do everything else.
Level 2: Copilot
What the harness does: Watches what you're doing, suggests completions, routes to the right tool.
What you do: Accept or reject suggestions. Direct every interaction.
Example: GitHub Copilot, Cursor Tab.
Car analogy: Lane-keeping assist. Nudges the wheel, but you're driving.
Level 3: Agent
What the harness does: Takes a goal, breaks it into steps, executes multiple tools, asks you at checkpoints.
What you do: Set the goal. Review at checkpoints. Approve the result.
Example: Claude Code in terminal, Kiro IDE, Windsurf.
Car analogy: Highway autopilot. Handles the highway, but you take over for exits and cities.
Level 4: Autonomous
What the harness does: Everything L3 does, PLUS: runs 24/7 without you present, remembers across sessions, self-heals when things break, handles multiple tasks concurrently.
What you do: Set intent. Check in occasionally. Handle the exceptions it escalates.
Example: SwarmAI (with daemon + jobs + memory + self-healing), Devin.
Car analogy: Full self-driving in mapped areas. You set the destination, the car handles the journey. But you might need to take over in new situations.
Level 5: Self-Evolving
What the harness does: Everything L4 does, PLUS: decomposes high-level goals into sub-tasks spanning days, learns from its own mistakes, expands its own capabilities.
What you do: Set direction. ("Make this product better by Friday.")
Example: No complete L5 exists yet (as of 2026). Some systems have L5 capabilities in narrow domains.
Car analogy: A car that redesigns its own navigation algorithm after getting lost once.
What Makes Each Level Different (The Capabilities Stack)
Each level is cumulative — L4 includes everything from L1-L3:
The Key Insight: The Engine Is Not The Moat
Here's what most people get wrong: they think the AI model is the product. It's not. The model is the engine. The harness is the product.
Why? Because:
Models are interchangeable. Today it's Claude. Tomorrow it might be Gemini or an open-source model. If your harness is well-designed, swapping takes a day.
The harness is where intelligence compounds. Memory, learned preferences, domain knowledge, self-correction patterns — all of this lives in the harness, not the model.
The model has no lifecycle. It doesn't know when to wake up, when to retry, when to escalate. The harness provides all lifecycle intelligence.
Everyone has access to the same models. Anyone can call Claude's API. What you can't easily replicate is 100 days of accumulated memory + 27 self-corrections + domain expertise across 7 projects.
Under The Hood: How It Actually Works
For the technically curious, here's the actual mechanism:
That's it. That's the entire interface between a harness and its engine. Everything else — the memory system, the job scheduler, the self-healing, the context engineering — is harness logic that wraps this simple pipe.
Why This Matters For You
If you're using these tools: Understanding the level helps you set expectations. Don't expect L3 tools (Claude Code terminal) to remember what you did yesterday — they can't. That's an L4 capability.
If you're building with AI: Focus on the harness, not the model. Your competitive advantage is in context engineering, memory architecture, and lifecycle management — not in which model you call.
If you're evaluating AI tools: Ask "what level is this?" A tool claiming to be "autonomous" but requiring you to restart it after every crash is L3 at best, regardless of marketing.
The Market Map (2026)
The race isn't about who has the best engine. It's about who builds the best harness.
Published from SwarmAI — an L4 Agent Harness built by one person + AI, proving that harness engineering is the real multiplier.
Beta Was this translation helpful? Give feedback.
All reactions