Skip to content

utopusc/selfclaude

Repository files navigation

selfclaude — AI Computer in a Box

One docker run, and you have a self-hosted AI computer. Inside the container: Chromium streamed to your browser, a Bytebot-style agent that can see, click, type and scroll, and a teach mode that records your demos as adaptive skills. Powered by your Claude Pro/Max subscription — no API key, no host install, no cloud agent watching your real screen. Bonus: your local Claude Code can also drive the same container over MCP.

A single Docker image that turns any Mac into a self-hosted AI personal computer. Open localhost:8080, add any website as a "WebApp", teach the agent skills by demonstration, and drive the same container from external Claude Code via MCP.

Status: v1.0 — milestone complete. Docker base + Xvfb, Chrome streaming via noVNC, Teach + Replay (SQLite-persisted), Anthropic chat agent with computer-use tools, and a Streamable HTTP MCP server on :8090. One image, one docker run, end-to-end demo.

Quickstart (5 steps)

# 1. Build (Mac amd64 via Rosetta — most reliable):
docker buildx build --platform linux/amd64 -t selfclaude .

# Or native arm64 / Linux:
docker build -t selfclaude .

# 2. Run with a persistent volume so taught skills survive restarts:
docker volume create selfclaude-data
docker run --rm -p 8080:8080 -p 8090:8090 \
  -v selfclaude-data:/data \
  --name selfclaude \
  selfclaude
3. Open http://localhost:8080 in your browser.
4. Paste your Anthropic API key (sk-ant-...) into the header field; click Save.
5. Click "+ Add WebApp", paste a URL (e.g. https://example.com), and watch the
   streamed Chrome window appear in the main pane.

curl http://localhost:8080/health returns {"status":"ok",...} once the container is up. curl http://localhost:8090/mcp (POST a JSON-RPC payload) is the MCP entry point — see Phase 5 below.

What's Inside

Component Why it's here
Node 20 (bookworm-slim base) Backend runtime
Xvfb on display :99 Headless X server; Chrome draws here
fluxbox Tiny window manager — xdotool is unreliable without a real WM
chromium + chromium-browser symlink Browser runtime (covers amd64 + arm64 from Debian repos)
x11vnc Streams individual Chrome windows to noVNC
xdotool, wmctrl, xprop, xdpyinfo, maim Window discovery + input dispatch + screenshots
sqlite3, dumb-init Skill persistence (Phase 3) and proper PID 1 signal handling
@anthropic-ai/sdk Phase 4 chat agent — direct Anthropic Messages API streaming
@modelcontextprotocol/sdk Phase 5 MCP HTTP server on :8090/mcp

Verifying the Floor

# 1. Container is up, health endpoint responds
curl -fsS http://localhost:8080/health | grep -q '"status":"ok"'

# 2. X stack is alive
docker exec selfclaude xdpyinfo -display :99 >/dev/null

# 3. Required tools are on PATH
docker exec selfclaude sh -c 'for t in chromium-browser x11vnc xdotool wmctrl xprop xdpyinfo maim sqlite3; do command -v $t || echo MISSING:$t; done'

# 4. Node server is running under dumb-init
docker exec selfclaude ps -A -o pid,ppid,cmd | head -20

Image Size Sanity

docker images selfclaude --format '{{.Size}}'   # expect ≤ 2 GB

Phase 3 — Teach + Replay

Once a WebApp is streaming, you can record a sequence of clicks and keystrokes, save it as a named skill, and replay it on demand.

  1. Start the container with the volume so skills persist (see Quickstart step 2).

  2. Add a WebApp from the UI (e.g. https://example.com) and wait for the stream to come online (status pill turns green).

  3. Click Teach in the toolbar. The button flips red and the canvas cursor becomes a crosshair. Perform the workflow you want to teach — clicks and keystrokes are captured into an action log; the streamed Chrome window receives them in real time so you see what you're recording.

  4. Click Stop. A modal opens showing how many events were captured. Name the skill (lowercase letters, digits, hyphens — e.g. compose-mail) and click Save skill. The skill appears in the sidebar SKILLS section under the active WebApp.

  5. Click the button on a skill row to replay it. The streamed Chrome window performs the recorded clicks and keystrokes against the same window, preserving the original timing (capped at 2 seconds between events).

  6. Click the button to delete a skill. Skills are scoped per-WebApp.

Endpoints (for curl / MCP integration)

Method Path Description
POST /api/skills Create a skill from a recorded action_log. Body: {webappId, name, actionLog}. Returns 201 with {id, webappId, name, createdAt, eventCount}.
GET /api/skills?webappId=<id> List skills scoped to a WebApp.
GET /api/skills/:id Get one skill including its full action_log.
DELETE /api/skills/:id Delete a skill. Returns 200 on success, 404 if missing.
POST /api/skills/:id/replay Replay a skill against its WebApp's wid. Returns 202 Accepted immediately; the dispatch loop runs server-side. Per-event errors are logged + skipped (best-effort).

action_log shape

{
  "version": 1,
  "webappId": "728e617e",
  "startedAt": 1234567890000,
  "endedAt":   1234567920000,
  "events": [
    {"type": "click", "button": 1, "x": 320, "y": 180, "ts": 1234567891000},
    {"type": "key",   "key":  "Return",                 "ts": 1234567892000},
    {"type": "type",  "text": "hello@example.com",      "ts": 1234567893000}
  ]
}

Event types:

  • clickbutton is 1 (left), 2 (middle), or 3 (right). x, y are in canvas pixel space (1280×720).
  • keykey is a DOM KeyboardEvent.key value (e.g. Enter, Backspace, ArrowUp, F5); the server translates to xdotool keysyms via mapDomKeyToXdotool.
  • typetext is literal text (1–4096 chars). Used for printable-char sequences. Coalesced from consecutive single-char keystrokes during recording when no modifier is held.

Replay sleeps min(ev.ts - prevTs, 2000) ms between events — never faster than recorded, never longer than 2 seconds between events.

Replay dispatch (xdotool chain)

Each event is dispatched against the bound wid using the chained xdotool pattern (see src/input-dispatcher.js):

# click event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
  mousemove --window <wid> --sync <x> <y> click --clearmodifiers <button>

# key event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
  key --clearmodifiers <keysym>

# type event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
  type --clearmodifiers --delay 0 <text>

Why the chain (and not xdotool click --window <wid> directly): Chrome (and many GTK apps) filter synthetic XSendEvent input as send_event=True and drop it. The chain forces real X11 focus first (windowactivate + windowfocus, both --sync), then dispatches the input to the focused window without --window on the verb — which is what real input from the user looks like to the application.

Phase 4 — Chat Agent

The header has a chat panel wired to Anthropic's Messages API streaming endpoint. Once you've pasted your API key in the header, you can ask the agent to do things in plain English — it has 4 tools at its disposal:

Tool What it does
list_webapps List the active streamed WebApps (id, url, state).
list_skills List saved skills (optionally scoped to a webappId).
replay_skill Replay a saved skill — same path as the UI ▶ button.
screenshot Capture a PNG of a WebApp window and feed it back to the model.

How it talks to the backend:

Endpoint Purpose
POST /api/key Set the in-memory Anthropic key (never persisted to disk).
GET /api/key Returns `{hasKey: true
WS /ws/agent Bidirectional chat stream — {type:"user_message", text} from the client; the server emits text_delta, tool_use, tool_result, agent_thinking, message_end, error events.

When the agent calls replay_skill, the chat panel shows a live tool-use status indicator and the streamed Chrome window in the main pane visibly performs the recorded actions — judges can read the state off the screen at a glance.

Phase 5 — MCP Integration

The container also exposes itself as a Streamable HTTP MCP server on a separate port (8090), so external Claude Code or Claude Desktop on the host Mac can drive the same WebApps and skills as if they were the chat agent's tools.

# On the host (Mac), with Claude Code installed:
claude mcp add selfclaude --transport http http://localhost:8090/mcp

That's it — the server registers under selfclaude and Claude Code can now invoke any of the 5 MCP tools:

MCP Tool Same code path as
add_webapp(url) UI "+ Add WebApp" button → POST /api/webapps
list_webapps() UI sidebar list → GET /api/webapps
list_skills(webappId?) UI Skills section → GET /api/skills?webappId=...
replay_skill(skillId) UI ▶ button + chat agent → POST /api/skills/:id/replay
screenshot(webappId) Chat agent's screenshot tool → maim -i 0x<wid> (returns base64 PNG)

Single source of truth: src/mcp-server.js imports the same backend modules as src/agent-chat.js and the HTTP routes in src/server.js. No duplicated logic, no parallel state.

For Claude Desktop, drop this into ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "selfclaude": {
      "transport": "http",
      "url": "http://localhost:8090/mcp"
    }
  }
}

Security note: localhost-only, no auth — fine for hackathon/demo use. Don't expose port 8090 to the internet.

Smoke testing the MCP endpoint manually

# initialize
curl -sS -X POST http://localhost:8090/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  --data '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"smoke","version":"0"}}}'

# list tools
curl -sS -X POST http://localhost:8090/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  --data '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'

# add a WebApp
curl -sS -X POST http://localhost:8090/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  --data '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"add_webapp","arguments":{"url":"https://example.com"}}}'

Demo Script (3 minutes)

[0:00 – 0:20]  "Stanford, I brought my own AI computer. One docker command."
               $ docker run -p 8080:8080 -p 8090:8090 selfclaude
               (Open localhost:8080, paste Anthropic key, hit Save)

[0:20 – 0:50]  "Let's add a WebApp — Gmail." Click "+ Add WebApp", paste URL.
               A streamed Chrome window appears in 5 seconds.
               "That Chrome is running INSIDE the container. I'm just watching
                via noVNC. Mouse and keyboard work end-to-end."

[0:50 – 1:30]  "Now let's teach it something. Click Teach."
               (Filter unread, open the first email — ~30 seconds of clicks.)
               Click Stop → name it "check-inbox" → Save.

[1:30 – 2:10]  "Run the skill from the chat panel."
               (Type "run check-inbox" — agent calls replay_skill, the
                streamed Chrome visibly performs the recorded actions.)

[2:10 – 2:50]  "But the killer demo: I can drive this container from MY Claude
                Code on the host, over MCP."
               (In a host terminal:)
               $ claude mcp add selfclaude --transport http http://localhost:8090/mcp
               $ claude → "use selfclaude to replay check-inbox"
               (External Claude Code calls replay_skill via MCP →
                container's Chrome window plays the workflow live.)

[2:50 – 3:00]  "Self-hosted. My subscription. My computer. I teach it skills
                and drive it from anywhere — UI, chat, or Claude Code over MCP.
                AI personal computer in a box."

Cutdown Ladder (DEMO-03)

If something fails 5 minutes before going on stage, kill features in this order — the floor that survives is still demoable:

Drop When Fallback
1. MCP HTTP server (:8090) If port 8090 / SDK / Claude Code MCP transport breaks Skip the "external Claude Code" segment of the demo. UI demo (add WebApp + teach + replay + chat agent) is still complete on its own.
2. Chat agent (/ws/agent, Anthropic SDK) If Anthropic API key / streaming / tool loop breaks Drop the "agent runs the skill" beat. Click the manual ▶ Replay button next to the skill — same backend code path.
3. Teach persistence (SQLite) If /data/skills.db write fails Run without -v selfclaude-data:/data — skills go to the container layer, lost on rm but live during the demo window.
⛔ DO NOT drop These keep the floor Add WebApp → noVNC stream → manual replay button. Without these the box is just an empty Docker image.

Roadmap

  • Phase 1: Docker base + Node server skeleton — DONE
  • Phase 2: Add WebApp via URL → Chrome spawn → noVNC live stream — DONE
  • Phase 3: Teach + Replay (record canvas events, replay via xdotool) — DONE
  • Phase 4: Anthropic chat agent + 4-tool toolbox — DONE
  • Phase 5: Streamable HTTP MCP server (http://localhost:8090/mcp) + demo polish — DONE

About

Self-hosted AI computer in a Docker container. Stream Chrome WebApps via VNC, teach the agent skills by demonstration, and drive everything from external Claude Code via MCP. One docker run.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors