One
docker run, and you have a self-hosted AI computer. Inside the container: Chromium streamed to your browser, a Bytebot-style agent that can see, click, type and scroll, and a teach mode that records your demos as adaptive skills. Powered by your Claude Pro/Max subscription — no API key, no host install, no cloud agent watching your real screen. Bonus: your local Claude Code can also drive the same container over MCP.
A single Docker image that turns any Mac into a self-hosted AI personal computer.
Open localhost:8080, add any website as a "WebApp", teach the agent skills by
demonstration, and drive the same container from external Claude Code via MCP.
Status: v1.0 — milestone complete. Docker base + Xvfb, Chrome streaming via noVNC, Teach + Replay (SQLite-persisted), Anthropic chat agent with computer-use tools, and a Streamable HTTP MCP server on
:8090. One image, onedocker run, end-to-end demo.
# 1. Build (Mac amd64 via Rosetta — most reliable):
docker buildx build --platform linux/amd64 -t selfclaude .
# Or native arm64 / Linux:
docker build -t selfclaude .
# 2. Run with a persistent volume so taught skills survive restarts:
docker volume create selfclaude-data
docker run --rm -p 8080:8080 -p 8090:8090 \
-v selfclaude-data:/data \
--name selfclaude \
selfclaude3. Open http://localhost:8080 in your browser.
4. Paste your Anthropic API key (sk-ant-...) into the header field; click Save.
5. Click "+ Add WebApp", paste a URL (e.g. https://example.com), and watch the
streamed Chrome window appear in the main pane.
curl http://localhost:8080/health returns {"status":"ok",...} once the container
is up. curl http://localhost:8090/mcp (POST a JSON-RPC payload) is the MCP entry
point — see Phase 5 below.
| Component | Why it's here |
|---|---|
| Node 20 (bookworm-slim base) | Backend runtime |
Xvfb on display :99 |
Headless X server; Chrome draws here |
| fluxbox | Tiny window manager — xdotool is unreliable without a real WM |
| chromium + chromium-browser symlink | Browser runtime (covers amd64 + arm64 from Debian repos) |
| x11vnc | Streams individual Chrome windows to noVNC |
| xdotool, wmctrl, xprop, xdpyinfo, maim | Window discovery + input dispatch + screenshots |
| sqlite3, dumb-init | Skill persistence (Phase 3) and proper PID 1 signal handling |
@anthropic-ai/sdk |
Phase 4 chat agent — direct Anthropic Messages API streaming |
@modelcontextprotocol/sdk |
Phase 5 MCP HTTP server on :8090/mcp |
# 1. Container is up, health endpoint responds
curl -fsS http://localhost:8080/health | grep -q '"status":"ok"'
# 2. X stack is alive
docker exec selfclaude xdpyinfo -display :99 >/dev/null
# 3. Required tools are on PATH
docker exec selfclaude sh -c 'for t in chromium-browser x11vnc xdotool wmctrl xprop xdpyinfo maim sqlite3; do command -v $t || echo MISSING:$t; done'
# 4. Node server is running under dumb-init
docker exec selfclaude ps -A -o pid,ppid,cmd | head -20docker images selfclaude --format '{{.Size}}' # expect ≤ 2 GBOnce a WebApp is streaming, you can record a sequence of clicks and keystrokes, save it as a named skill, and replay it on demand.
-
Start the container with the volume so skills persist (see Quickstart step 2).
-
Add a WebApp from the UI (e.g.
https://example.com) and wait for the stream to come online (status pill turns green). -
Click Teach in the toolbar. The button flips red and the canvas cursor becomes a crosshair. Perform the workflow you want to teach — clicks and keystrokes are captured into an action log; the streamed Chrome window receives them in real time so you see what you're recording.
-
Click Stop. A modal opens showing how many events were captured. Name the skill (lowercase letters, digits, hyphens — e.g.
compose-mail) and click Save skill. The skill appears in the sidebarSKILLSsection under the active WebApp. -
Click the ▶ button on a skill row to replay it. The streamed Chrome window performs the recorded clicks and keystrokes against the same window, preserving the original timing (capped at 2 seconds between events).
-
Click the ✕ button to delete a skill. Skills are scoped per-WebApp.
| Method | Path | Description |
|---|---|---|
POST |
/api/skills |
Create a skill from a recorded action_log. Body: {webappId, name, actionLog}. Returns 201 with {id, webappId, name, createdAt, eventCount}. |
GET |
/api/skills?webappId=<id> |
List skills scoped to a WebApp. |
GET |
/api/skills/:id |
Get one skill including its full action_log. |
DELETE |
/api/skills/:id |
Delete a skill. Returns 200 on success, 404 if missing. |
POST |
/api/skills/:id/replay |
Replay a skill against its WebApp's wid. Returns 202 Accepted immediately; the dispatch loop runs server-side. Per-event errors are logged + skipped (best-effort). |
{
"version": 1,
"webappId": "728e617e",
"startedAt": 1234567890000,
"endedAt": 1234567920000,
"events": [
{"type": "click", "button": 1, "x": 320, "y": 180, "ts": 1234567891000},
{"type": "key", "key": "Return", "ts": 1234567892000},
{"type": "type", "text": "hello@example.com", "ts": 1234567893000}
]
}Event types:
click—buttonis1(left),2(middle), or3(right).x,yare in canvas pixel space (1280×720).key—keyis a DOMKeyboardEvent.keyvalue (e.g.Enter,Backspace,ArrowUp,F5); the server translates to xdotool keysyms viamapDomKeyToXdotool.type—textis literal text (1–4096 chars). Used for printable-char sequences. Coalesced from consecutive single-char keystrokes during recording when no modifier is held.
Replay sleeps min(ev.ts - prevTs, 2000) ms between events — never faster
than recorded, never longer than 2 seconds between events.
Each event is dispatched against the bound wid using the chained
xdotool pattern (see src/input-dispatcher.js):
# click event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
mousemove --window <wid> --sync <x> <y> click --clearmodifiers <button>
# key event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
key --clearmodifiers <keysym>
# type event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
type --clearmodifiers --delay 0 <text>Why the chain (and not xdotool click --window <wid> directly): Chrome
(and many GTK apps) filter synthetic XSendEvent input as
send_event=True and drop it. The chain forces real X11 focus first
(windowactivate + windowfocus, both --sync), then dispatches the input
to the focused window without --window on the verb — which is what real
input from the user looks like to the application.
The header has a chat panel wired to Anthropic's Messages API streaming endpoint. Once you've pasted your API key in the header, you can ask the agent to do things in plain English — it has 4 tools at its disposal:
| Tool | What it does |
|---|---|
list_webapps |
List the active streamed WebApps (id, url, state). |
list_skills |
List saved skills (optionally scoped to a webappId). |
replay_skill |
Replay a saved skill — same path as the UI ▶ button. |
screenshot |
Capture a PNG of a WebApp window and feed it back to the model. |
How it talks to the backend:
| Endpoint | Purpose |
|---|---|
POST /api/key |
Set the in-memory Anthropic key (never persisted to disk). |
GET /api/key |
Returns `{hasKey: true |
WS /ws/agent |
Bidirectional chat stream — {type:"user_message", text} from the client; the server emits text_delta, tool_use, tool_result, agent_thinking, message_end, error events. |
When the agent calls replay_skill, the chat panel shows a live tool-use status
indicator and the streamed Chrome window in the main pane visibly performs the
recorded actions — judges can read the state off the screen at a glance.
The container also exposes itself as a Streamable HTTP MCP server on a separate port (8090), so external Claude Code or Claude Desktop on the host Mac can drive the same WebApps and skills as if they were the chat agent's tools.
# On the host (Mac), with Claude Code installed:
claude mcp add selfclaude --transport http http://localhost:8090/mcpThat's it — the server registers under selfclaude and Claude Code can now
invoke any of the 5 MCP tools:
| MCP Tool | Same code path as |
|---|---|
add_webapp(url) |
UI "+ Add WebApp" button → POST /api/webapps |
list_webapps() |
UI sidebar list → GET /api/webapps |
list_skills(webappId?) |
UI Skills section → GET /api/skills?webappId=... |
replay_skill(skillId) |
UI ▶ button + chat agent → POST /api/skills/:id/replay |
screenshot(webappId) |
Chat agent's screenshot tool → maim -i 0x<wid> (returns base64 PNG) |
Single source of truth: src/mcp-server.js imports the same backend modules as
src/agent-chat.js and the HTTP routes in src/server.js. No duplicated logic,
no parallel state.
For Claude Desktop, drop this into ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"selfclaude": {
"transport": "http",
"url": "http://localhost:8090/mcp"
}
}
}Security note: localhost-only, no auth — fine for hackathon/demo use. Don't expose port 8090 to the internet.
# initialize
curl -sS -X POST http://localhost:8090/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
--data '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"smoke","version":"0"}}}'
# list tools
curl -sS -X POST http://localhost:8090/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
--data '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
# add a WebApp
curl -sS -X POST http://localhost:8090/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
--data '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"add_webapp","arguments":{"url":"https://example.com"}}}'[0:00 – 0:20] "Stanford, I brought my own AI computer. One docker command."
$ docker run -p 8080:8080 -p 8090:8090 selfclaude
(Open localhost:8080, paste Anthropic key, hit Save)
[0:20 – 0:50] "Let's add a WebApp — Gmail." Click "+ Add WebApp", paste URL.
A streamed Chrome window appears in 5 seconds.
"That Chrome is running INSIDE the container. I'm just watching
via noVNC. Mouse and keyboard work end-to-end."
[0:50 – 1:30] "Now let's teach it something. Click Teach."
(Filter unread, open the first email — ~30 seconds of clicks.)
Click Stop → name it "check-inbox" → Save.
[1:30 – 2:10] "Run the skill from the chat panel."
(Type "run check-inbox" — agent calls replay_skill, the
streamed Chrome visibly performs the recorded actions.)
[2:10 – 2:50] "But the killer demo: I can drive this container from MY Claude
Code on the host, over MCP."
(In a host terminal:)
$ claude mcp add selfclaude --transport http http://localhost:8090/mcp
$ claude → "use selfclaude to replay check-inbox"
(External Claude Code calls replay_skill via MCP →
container's Chrome window plays the workflow live.)
[2:50 – 3:00] "Self-hosted. My subscription. My computer. I teach it skills
and drive it from anywhere — UI, chat, or Claude Code over MCP.
AI personal computer in a box."
If something fails 5 minutes before going on stage, kill features in this order — the floor that survives is still demoable:
| Drop | When | Fallback |
|---|---|---|
1. MCP HTTP server (:8090) |
If port 8090 / SDK / Claude Code MCP transport breaks | Skip the "external Claude Code" segment of the demo. UI demo (add WebApp + teach + replay + chat agent) is still complete on its own. |
2. Chat agent (/ws/agent, Anthropic SDK) |
If Anthropic API key / streaming / tool loop breaks | Drop the "agent runs the skill" beat. Click the manual ▶ Replay button next to the skill — same backend code path. |
| 3. Teach persistence (SQLite) | If /data/skills.db write fails |
Run without -v selfclaude-data:/data — skills go to the container layer, lost on rm but live during the demo window. |
| ⛔ DO NOT drop | These keep the floor | Add WebApp → noVNC stream → manual replay button. Without these the box is just an empty Docker image. |
- Phase 1: Docker base + Node server skeleton — DONE
- Phase 2: Add WebApp via URL → Chrome spawn → noVNC live stream — DONE
- Phase 3: Teach + Replay (record canvas events, replay via xdotool) — DONE
- Phase 4: Anthropic chat agent + 4-tool toolbox — DONE
- Phase 5: Streamable HTTP MCP server (
http://localhost:8090/mcp) + demo polish — DONE