GitHub - the-agents-work/taw-computer: Give any AI a real computer. Open source MCP server with Ubuntu sandbox, browser automation, desktop control, and 30+ tools. Works with Claude Code, Cursor, Claude Desktop.

Your AI writes code. This lets it use a computer.

Quick Start · Demo · Hosted Version · Contributing · Report Bug

What if your AI could do everything you do on a computer?

Not just write code — but open a browser, click buttons, fill forms, run servers, test in real browsers, install anything, and see the screen?

taw-computer is an open-source MCP server that gives AI agents a full Ubuntu desktop inside Docker. Your AI connects, gets a real computer, and works like a human would.

No internal LLM. No chat UI. Your AI is the brain. This is the body.

Demo

📹 Demo coming soon — star this repo to get notified!

Why taw-computer?

Other tools let AI write code. taw-computer lets AI use a computer.

	ChatGPT / Claude	Cursor / Copilot	Lovable / Bolt	taw-computer
Write code	✅	✅	✅	✅
Run shell commands	❌	Limited	Sandboxed	Full Ubuntu
Browse the web	❌	❌	❌	Real Chromium
See & click the screen	❌	❌	❌	Desktop + VNC
Install any software	❌	❌	❌	apt/npm/pip
Test in real browser	❌	❌	Preview only	Playwright + CDP
Persist across sessions	❌	❌	✅	Snapshots
Self-hostable	❌	❌	❌	100% yours

Quick start

Get running in under 5 minutes:

# 1. Clone & build
git clone https://github.com/the-agents-work/taw-computer.git
cd taw-computer
docker build -f images/Dockerfile.taw -t taw-computer-base .

# 2. Install & start
npm install && npm start

Then add to your AI client:

Claude Code (~/.claude/mcp.json)

{
  "mcpServers": {
    "taw-computer": {
      "command": "npx",
      "args": ["tsx", "/path/to/taw-computer/mcp/index.ts"]
    }
  }
}

Cursor

Add to Cursor MCP settings (Settings → MCP Servers) — same JSON format as above.

Claude Desktop

Add to claude_desktop_config.json — same JSON format as above.

Any MCP client

taw-computer speaks standard MCP over stdio. Any client that supports MCP can connect.

Remote server (SSH) — run on a beefy machine, use from your laptop

Got a powerful server / Mac Mini / VPS? Run taw-computer there and connect from anywhere:

{
  "mcpServers": {
    "taw-computer": {
      "command": "ssh",
      "args": ["user@your-server", "cd /path/to/taw-computer && npx tsx mcp/index.ts"]
    }
  }
}

Your laptop (Claude Code)
    ↕ SSH (stdin/stdout piped over network)
Remote server (taw-computer + Docker)
    ↕ Docker
Ubuntu sandbox

Setup:

On the server: install Docker, clone repo, build image, npm install
On the server: enable SSH (sudo systemctl enable ssh)
On your laptop: ssh-copy-id user@your-server (passwordless login)
Add the MCP config above — done!

Watch via VNC: open http://your-server:6080 in your browser.

That's it. Now tell your AI: "Create a VM and build me a website" — and watch it work.

What can it do?

🖥️ "Build me a landing page"

AI creates a VM → scaffolds Next.js → writes components → starts dev server → opens browser to check → iterates until it looks right

🌐 "Go to Amazon and find the best laptop under $1000"

AI opens Chromium → navigates to Amazon → searches → scrolls → extracts prices → compares → reports back

🧪 "Run E2E tests on my deployed app"

AI launches Playwright → navigates to your URL → fills forms → clicks buttons → asserts results → reports failures

🔧 "Set up a PostgreSQL database with sample data"

AI runs apt install postgresql → creates database → writes seed script → runs it → verifies with queries

📸 "What does my app look like on mobile?"

AI takes desktop screenshot → resizes viewport → screenshots again → compares → suggests CSS fixes

How it works

┌─────────────────────────────────────────────────────┐
│  Your AI Client                                     │
│  Claude Code · Cursor · Claude Desktop · any MCP    │
└───────────────────────┬─────────────────────────────┘
                        │ MCP protocol (stdio)
┌───────────────────────▼─────────────────────────────┐
│  taw-computer MCP server          30+ tools         │
│  vm · shell · files · browser · desktop · search    │
└───────────────────────┬─────────────────────────────┘
                        │ Docker API
┌───────────────────────▼─────────────────────────────┐
│  Ubuntu 22.04 Sandbox              isolated container│
│                                                      │
│   bash    Chromium + CDP    xfce4 Desktop + VNC     │
│     git npm pip curl          Playwright              │
│       python node              xdotool scrot          │
│                                                      │
│   /workspace ← your project files live here          │
└──────────────────────────────────────────────────────┘

30+ tools

VM Management — create, destroy, snapshot, resume

Tool	What it does
`vm_create`	Spin up a new sandbox. Returns VNC URL to watch live
`vm_destroy`	Destroy (auto-saves snapshot for later)
`vm_reset`	Destroy + delete snapshot (fresh start)
`vm_restart`	Restart container, keep all files
`vm_status`	CPU, RAM, disk, uptime, top processes
`vm_list`	List running sandboxes
`vm_rename`	Rename a VM
`snapshot_list`	List saved snapshots
`snapshot_delete`	Delete a snapshot

Shell & Files — full Ubuntu command line + file ops

Tool	What it does
`exec`	Run any command: git, npm, pip, curl, docker, anything
`fs_read`	Read a file
`fs_write`	Write a file (creates parent dirs)
`fs_edit`	Find-and-replace in a file
`fs_list`	ls / recursive find
`fs_search`	grep for patterns
`code_search`	ripgrep with regex, file types, context
`file_upload`	Upload file into VM (base64, max 50MB)

Browser (CDP/Playwright) — real browser, not a simulator

Tool	What it does
`browser_navigate`	Go to URL, wait for load
`browser_snapshot`	Screenshot + numbered overlays on every clickable element
`browser_click_ref`	Click element #N from snapshot
`browser_type_ref`	Type into element #N
`browser_extract`	Read page text (CSS selector or full page)
`browser_eval`	Run JavaScript in page
`browser_wait_for`	Wait for selector / text / network idle
`browser_console_logs`	Read console.log, console.error, etc.
`browser_network_errors`	Catch 404s, CORS errors, failed requests
`browser_run_test`	Run a Playwright test script
`browser_open`	Open Chrome via desktop (fallback)
`browser_close`	Kill Chrome
`web_search`	Google search → top 8 results

Desktop — see and control the GUI

Tool	What it does
`desktop_screenshot`	JPEG screenshot of the whole desktop
`desktop_click`	Click at (x, y)
`desktop_type`	Type text into focused window
`desktop_key`	Key combos: ctrl+c, alt+tab, Return, etc.
`desktop_scroll`	Scroll up/down
`desktop_drag`	Drag from A to B

Set-of-Mark: how browser automation actually works

Most "computer use" tools guess pixel coordinates. We use Set-of-Mark prompting — the AI sees numbered badges on every interactive element:

Step 1: browser_snapshot
        → AI sees screenshot with [1] Login  [2] Search  [3] Cart  ...

Step 2: browser_click_ref(ref=2)
        → clicks the Search box precisely

Step 3: browser_type_ref(ref=2, text="laptop", submit=true)  
        → types and presses Enter

Step 4: browser_snapshot
        → sees new page with results [4] [5] [6] ...

No coordinate guessing. No CSS selector fragility. The AI sees what it's clicking.

VNC — watch your AI work in real time

Every sandbox comes with a noVNC web viewer. Open the URL in your browser and watch:

🖱️ AI navigating websites and clicking buttons
⌨️ AI writing code in the terminal
🏗️ AI building and testing applications
🐛 AI debugging by inspecting the screen

Perfect for demos, debugging, and building trust in AI agents.

What's inside each sandbox

	Included
OS	Ubuntu 22.04
Desktop	xfce4 + Xvfb + x11vnc + noVNC
Browser	Playwright Chromium (native arm64 + amd64)
Languages	Node.js 20, Python 3, build-essential
CLI	git, curl, wget, jq, ripgrep, tree, nano, vim
DB clients	PostgreSQL, MariaDB, Redis
Dev tools	GitHub CLI, yq, httpie
Automation	xdotool, scrot, imagemagick, xclip

Configuration

Variable	Default	Description
`MAX_SANDBOXES`	`3`	Max concurrent VMs
`SANDBOX_TYPE`	`auto`	`auto` / `docker` / `firecracker`
`DOCKER_IMAGE`	`taw-computer-base`	Base image
`DOCKER_MEMORY_MB`	`4096`	RAM per container
`DOCKER_CPUS`	`2`	CPUs per container
`DESKTOP_RESOLUTION`	`1280x720`	Screen resolution

Requirements

	Minimum
Docker	Docker Desktop or Docker Engine
Node.js	20+
RAM	~4GB per sandbox
Disk	~5GB for base image

Project structure

taw-computer/
├── mcp/
│   ├── index.ts            # MCP server — stdio, 30+ tool handlers
│   └── browser.ts          # Playwright CDP + Set-of-Mark engine
├── sandbox/
│   ├── SandboxManager.ts   # Abstract interface
│   ├── DockerSandbox.ts    # Docker implementation
│   ├── FirecrackerSandbox.ts # Firecracker microVM (optional)
│   ├── NetworkManager.ts   # Network isolation
│   ├── config.ts           # Env-based config
│   └── index.ts            # Auto-detect backend
├── images/
│   └── Dockerfile.taw      # Ubuntu sandbox image
├── .github/
│   ├── workflows/ci.yml    # CI: typecheck + Docker build
│   └── ISSUE_TEMPLATE/     # Bug report + feature request
├── package.json
├── CONTRIBUTING.md
└── LICENSE (MIT)

Contributing

We'd love your help! See CONTRIBUTING.md.

Ideas for first contributions:

🎨 Record a demo GIF for this README
📝 Write a tutorial ("Build X with taw-computer")
🔧 Add a new MCP tool (audio? clipboard? multi-tab?)
🐳 Build a slimmer Docker image
🧪 Add automated tests
📦 Support Podman / containerd

Hosted version

Don't want to self-host? shipkit.cc — managed taw-computer with:

Chat UI (just type what you want)
Auth & team collaboration
One-click app sharing
No Docker setup needed

FAQ

How is this different from Lovable / Bolt / v0?

Those are closed-source, hosted-only products that generate code. taw-computer gives AI a real computer — it can run servers, browse the web, install anything, and interact with any desktop app. It's also open source and self-hostable.

How is this different from OpenInterpreter / Open Hands?

OpenInterpreter runs code on your local machine (risky). Open Hands uses its own LLM orchestration. taw-computer is just the computer — no built-in LLM, no opinions about orchestration. Your existing AI client (Claude Code, Cursor, etc.) is the brain. taw-computer is a pure MCP server.

Is it safe? Can the AI break my system?

Each sandbox is an isolated Docker container with its own filesystem, network, and process space. Nothing inside can touch your host system. Containers have memory/CPU/PID limits. When you're done, destroy the VM.

Can I use it with GPT-4 / Gemini / local models?

Yes — any AI client that supports MCP can connect. The server doesn't care which LLM is behind the client.

Does it work on Mac / Windows / Linux?

Yes. Anywhere Docker runs, taw-computer runs. The sandbox image supports both arm64 (Apple Silicon) and amd64 (Intel/AMD).

Star History

If taw-computer is useful to you, give it a ⭐ — it helps others find it.

License

MIT — do whatever you want with it.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
images		images
mcp		mcp
sandbox		sandbox
.gitignore		.gitignore
.mcp.json.example		.mcp.json.example
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What if your AI could do everything you do on a computer?

Demo

Why taw-computer?

Quick start

What can it do?

🖥️ "Build me a landing page"

🌐 "Go to Amazon and find the best laptop under $1000"

🧪 "Run E2E tests on my deployed app"

🔧 "Set up a PostgreSQL database with sample data"

📸 "What does my app look like on mobile?"

How it works

30+ tools

Set-of-Mark: how browser automation actually works

VNC — watch your AI work in real time

What's inside each sandbox

Configuration

Requirements

Project structure

Contributing

Hosted version

FAQ

Star History

Related

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What if your AI could do everything you do on a computer?

Demo

Why taw-computer?

Quick start

What can it do?

🖥️ "Build me a landing page"

🌐 "Go to Amazon and find the best laptop under $1000"

🧪 "Run E2E tests on my deployed app"

🔧 "Set up a PostgreSQL database with sample data"

📸 "What does my app look like on mobile?"

How it works

30+ tools

Set-of-Mark: how browser automation actually works

VNC — watch your AI work in real time

What's inside each sandbox

Configuration

Requirements

Project structure

Contributing

Hosted version

FAQ

Star History

Related

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages