selfclaude — AI Computer in a Box

One docker run, and you have a self-hosted AI computer. Inside the container: Chromium streamed to your browser, a Bytebot-style agent that can see, click, type and scroll, and a teach mode that records your demos as adaptive skills. Powered by your Claude Pro/Max subscription — no API key, no host install, no cloud agent watching your real screen. Bonus: your local Claude Code can also drive the same container over MCP.

A single Docker image that turns any Mac into a self-hosted AI personal computer. Open localhost:8080, add any website as a "WebApp", teach the agent skills by demonstration, and drive the same container from external Claude Code via MCP.

Status: v1.0 — milestone complete. Docker base + Xvfb, Chrome streaming via noVNC, Teach + Replay (SQLite-persisted), Anthropic chat agent with computer-use tools, and a Streamable HTTP MCP server on :8090. One image, one docker run, end-to-end demo.

Quickstart (5 steps)

# 1. Build (Mac amd64 via Rosetta — most reliable):
docker buildx build --platform linux/amd64 -t selfclaude .

# Or native arm64 / Linux:
docker build -t selfclaude .

# 2. Run with a persistent volume so taught skills survive restarts:
docker volume create selfclaude-data
docker run --rm -p 8080:8080 -p 8090:8090 \
  -v selfclaude-data:/data \
  --name selfclaude \
  selfclaude

3. Open http://localhost:8080 in your browser.
4. Paste your Anthropic API key (sk-ant-...) into the header field; click Save.
5. Click "+ Add WebApp", paste a URL (e.g. https://example.com), and watch the
   streamed Chrome window appear in the main pane.

curl http://localhost:8080/health returns {"status":"ok",...} once the container is up. curl http://localhost:8090/mcp (POST a JSON-RPC payload) is the MCP entry point — see Phase 5 below.

What's Inside

Component	Why it's here
Node 20 (bookworm-slim base)	Backend runtime
Xvfb on display `:99`	Headless X server; Chrome draws here
fluxbox	Tiny window manager — xdotool is unreliable without a real WM
chromium + chromium-browser symlink	Browser runtime (covers amd64 + arm64 from Debian repos)
x11vnc	Streams individual Chrome windows to noVNC
xdotool, wmctrl, xprop, xdpyinfo, maim	Window discovery + input dispatch + screenshots
sqlite3, dumb-init	Skill persistence (Phase 3) and proper PID 1 signal handling
`@anthropic-ai/sdk`	Phase 4 chat agent — direct Anthropic Messages API streaming
`@modelcontextprotocol/sdk`	Phase 5 MCP HTTP server on `:8090/mcp`

Verifying the Floor

# 1. Container is up, health endpoint responds
curl -fsS http://localhost:8080/health | grep -q '"status":"ok"'

# 2. X stack is alive
docker exec selfclaude xdpyinfo -display :99 >/dev/null

# 3. Required tools are on PATH
docker exec selfclaude sh -c 'for t in chromium-browser x11vnc xdotool wmctrl xprop xdpyinfo maim sqlite3; do command -v $t || echo MISSING:$t; done'

# 4. Node server is running under dumb-init
docker exec selfclaude ps -A -o pid,ppid,cmd | head -20

Image Size Sanity

docker images selfclaude --format '{{.Size}}'   # expect ≤ 2 GB

Phase 3 — Teach + Replay

Once a WebApp is streaming, you can record a sequence of clicks and keystrokes, save it as a named skill, and replay it on demand.

Start the container with the volume so skills persist (see Quickstart step 2).
Add a WebApp from the UI (e.g. https://example.com) and wait for the stream to come online (status pill turns green).
Click Teach in the toolbar. The button flips red and the canvas cursor becomes a crosshair. Perform the workflow you want to teach — clicks and keystrokes are captured into an action log; the streamed Chrome window receives them in real time so you see what you're recording.
Click Stop. A modal opens showing how many events were captured. Name the skill (lowercase letters, digits, hyphens — e.g. compose-mail) and click Save skill. The skill appears in the sidebar SKILLS section under the active WebApp.
Click the ▶ button on a skill row to replay it. The streamed Chrome window performs the recorded clicks and keystrokes against the same window, preserving the original timing (capped at 2 seconds between events).
Click the ✕ button to delete a skill. Skills are scoped per-WebApp.

Endpoints (for `curl` / MCP integration)

Method	Path	Description
`POST`	`/api/skills`	Create a skill from a recorded action_log. Body: `{webappId, name, actionLog}`. Returns 201 with `{id, webappId, name, createdAt, eventCount}`.
`GET`	`/api/skills?webappId=<id>`	List skills scoped to a WebApp.
`GET`	`/api/skills/:id`	Get one skill including its full `action_log`.
`DELETE`	`/api/skills/:id`	Delete a skill. Returns 200 on success, 404 if missing.
`POST`	`/api/skills/:id/replay`	Replay a skill against its WebApp's wid. Returns 202 Accepted immediately; the dispatch loop runs server-side. Per-event errors are logged + skipped (best-effort).

action_log shape

{
  "version": 1,
  "webappId": "728e617e",
  "startedAt": 1234567890000,
  "endedAt":   1234567920000,
  "events": [
    {"type": "click", "button": 1, "x": 320, "y": 180, "ts": 1234567891000},
    {"type": "key",   "key":  "Return",                 "ts": 1234567892000},
    {"type": "type",  "text": "hello@example.com",      "ts": 1234567893000}
  ]
}

Event types:

click — button is 1 (left), 2 (middle), or 3 (right). x, y are in canvas pixel space (1280×720).
key — key is a DOM KeyboardEvent.key value (e.g. Enter, Backspace, ArrowUp, F5); the server translates to xdotool keysyms via mapDomKeyToXdotool.
type — text is literal text (1–4096 chars). Used for printable-char sequences. Coalesced from consecutive single-char keystrokes during recording when no modifier is held.

Replay sleeps min(ev.ts - prevTs, 2000) ms between events — never faster than recorded, never longer than 2 seconds between events.

Replay dispatch (xdotool chain)

Each event is dispatched against the bound wid using the chained xdotool pattern (see src/input-dispatcher.js):

# click event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
  mousemove --window <wid> --sync <x> <y> click --clearmodifiers <button>

# key event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
  key --clearmodifiers <keysym>

# type event
xdotool windowactivate --sync <wid> windowfocus --sync <wid> \
  type --clearmodifiers --delay 0 <text>

Why the chain (and not xdotool click --window <wid> directly): Chrome (and many GTK apps) filter synthetic XSendEvent input as send_event=True and drop it. The chain forces real X11 focus first (windowactivate + windowfocus, both --sync), then dispatches the input to the focused window without --window on the verb — which is what real input from the user looks like to the application.

Phase 4 — Chat Agent

The header has a chat panel wired to Anthropic's Messages API streaming endpoint. Once you've pasted your API key in the header, you can ask the agent to do things in plain English — it has 4 tools at its disposal:

Tool	What it does
`list_webapps`	List the active streamed WebApps (id, url, state).
`list_skills`	List saved skills (optionally scoped to a webappId).
`replay_skill`	Replay a saved skill — same path as the UI ▶ button.
`screenshot`	Capture a PNG of a WebApp window and feed it back to the model.

How it talks to the backend:

Endpoint	Purpose
`POST /api/key`	Set the in-memory Anthropic key (never persisted to disk).
`GET /api/key`	Returns `{hasKey: true
`WS /ws/agent`	Bidirectional chat stream — `{type:"user_message", text}` from the client; the server emits `text_delta`, `tool_use`, `tool_result`, `agent_thinking`, `message_end`, `error` events.

When the agent calls replay_skill, the chat panel shows a live tool-use status indicator and the streamed Chrome window in the main pane visibly performs the recorded actions — judges can read the state off the screen at a glance.

Phase 5 — MCP Integration

The container also exposes itself as a Streamable HTTP MCP server on a separate port (8090), so external Claude Code or Claude Desktop on the host Mac can drive the same WebApps and skills as if they were the chat agent's tools.

# On the host (Mac), with Claude Code installed:
claude mcp add selfclaude --transport http http://localhost:8090/mcp

That's it — the server registers under selfclaude and Claude Code can now invoke any of the 5 MCP tools:

MCP Tool	Same code path as
`add_webapp(url)`	UI "+ Add WebApp" button → `POST /api/webapps`
`list_webapps()`	UI sidebar list → `GET /api/webapps`
`list_skills(webappId?)`	UI Skills section → `GET /api/skills?webappId=...`
`replay_skill(skillId)`	UI ▶ button + chat agent → `POST /api/skills/:id/replay`
`screenshot(webappId)`	Chat agent's `screenshot` tool → `maim -i 0x<wid>` (returns base64 PNG)

Single source of truth: src/mcp-server.js imports the same backend modules as src/agent-chat.js and the HTTP routes in src/server.js. No duplicated logic, no parallel state.

For Claude Desktop, drop this into ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "selfclaude": {
      "transport": "http",
      "url": "http://localhost:8090/mcp"
    }
  }
}

Security note: localhost-only, no auth — fine for hackathon/demo use. Don't expose port 8090 to the internet.

Smoke testing the MCP endpoint manually

# initialize
curl -sS -X POST http://localhost:8090/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  --data '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"smoke","version":"0"}}}'

# list tools
curl -sS -X POST http://localhost:8090/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  --data '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'

# add a WebApp
curl -sS -X POST http://localhost:8090/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  --data '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"add_webapp","arguments":{"url":"https://example.com"}}}'

Demo Script (3 minutes)

[0:00 – 0:20]  "Stanford, I brought my own AI computer. One docker command."
               $ docker run -p 8080:8080 -p 8090:8090 selfclaude
               (Open localhost:8080, paste Anthropic key, hit Save)

[0:20 – 0:50]  "Let's add a WebApp — Gmail." Click "+ Add WebApp", paste URL.
               A streamed Chrome window appears in 5 seconds.
               "That Chrome is running INSIDE the container. I'm just watching
                via noVNC. Mouse and keyboard work end-to-end."

[0:50 – 1:30]  "Now let's teach it something. Click Teach."
               (Filter unread, open the first email — ~30 seconds of clicks.)
               Click Stop → name it "check-inbox" → Save.

[1:30 – 2:10]  "Run the skill from the chat panel."
               (Type "run check-inbox" — agent calls replay_skill, the
                streamed Chrome visibly performs the recorded actions.)

[2:10 – 2:50]  "But the killer demo: I can drive this container from MY Claude
                Code on the host, over MCP."
               (In a host terminal:)
               $ claude mcp add selfclaude --transport http http://localhost:8090/mcp
               $ claude → "use selfclaude to replay check-inbox"
               (External Claude Code calls replay_skill via MCP →
                container's Chrome window plays the workflow live.)

[2:50 – 3:00]  "Self-hosted. My subscription. My computer. I teach it skills
                and drive it from anywhere — UI, chat, or Claude Code over MCP.
                AI personal computer in a box."

Cutdown Ladder (DEMO-03)

If something fails 5 minutes before going on stage, kill features in this order — the floor that survives is still demoable:

Drop	When	Fallback
1. MCP HTTP server (`:8090`)	If port 8090 / SDK / Claude Code MCP transport breaks	Skip the "external Claude Code" segment of the demo. UI demo (add WebApp + teach + replay + chat agent) is still complete on its own.
2. Chat agent (`/ws/agent`, Anthropic SDK)	If Anthropic API key / streaming / tool loop breaks	Drop the "agent runs the skill" beat. Click the manual ▶ Replay button next to the skill — same backend code path.
3. Teach persistence (SQLite)	If `/data/skills.db` write fails	Run without `-v selfclaude-data:/data` — skills go to the container layer, lost on rm but live during the demo window.
⛔ DO NOT drop	These keep the floor	Add WebApp → noVNC stream → manual replay button. Without these the box is just an empty Docker image.

Roadmap

Phase 1: Docker base + Node server skeleton — DONE
Phase 2: Add WebApp via URL → Chrome spawn → noVNC live stream — DONE
Phase 3: Teach + Replay (record canvas events, replay via xdotool) — DONE
Phase 4: Anthropic chat agent + 4-tool toolbox — DONE
Phase 5: Streamable HTTP MCP server (http://localhost:8090/mcp) + demo polish — DONE

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
ui		ui
.dockerignore		.dockerignore
.dockerignore.bak		.dockerignore.bak
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
entrypoint.sh		entrypoint.sh
package-lock.json		package-lock.json
package.json		package.json
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

selfclaude — AI Computer in a Box

Quickstart (5 steps)

What's Inside

Verifying the Floor

Image Size Sanity

Phase 3 — Teach + Replay

Endpoints (for `curl` / MCP integration)

action_log shape

Replay dispatch (xdotool chain)

Phase 4 — Chat Agent

Phase 5 — MCP Integration

Smoke testing the MCP endpoint manually

Demo Script (3 minutes)

Cutdown Ladder (DEMO-03)

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

selfclaude — AI Computer in a Box

Quickstart (5 steps)

What's Inside

Verifying the Floor

Image Size Sanity

Phase 3 — Teach + Replay

Endpoints (for curl / MCP integration)

action_log shape

Replay dispatch (xdotool chain)

Phase 4 — Chat Agent

Phase 5 — MCP Integration

Smoke testing the MCP endpoint manually

Demo Script (3 minutes)

Cutdown Ladder (DEMO-03)

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Endpoints (for `curl` / MCP integration)

Packages