An agentic Model Context Protocol (MCP) server for CUA Cloud - delegate desktop automation tasks to an autonomous vision-based agent. Images never leave the server; only text summaries are returned.
Production URL: https://cua-mcp-server.vercel.app/mcp
CUA (Computer Use Agent) provides cloud-based virtual machine sandboxes that AI agents can control. This MCP server exposes CUA's capabilities through a clean task-delegation API:
- Create and manage VMs (Linux, Windows, macOS)
- Delegate tasks - "Open Chrome and navigate to google.com"
- Get text summaries - No images in your context window
- Query screen state - Vision-based descriptions without taking action
Claude Code (Orchestrator)
│
│ run_task("Open Chrome and go to google.com")
▼
┌─────────────────────────────────────────────────────────────┐
│ CUA MCP Server (Agentic) │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Internal Agent Loop │ │
│ │ 1. screenshot() → CUA sandbox │ │
│ │ 2. screenshot → Claude API (computer_use tool) │ │
│ │ 3. Claude returns: click(x,y) / type("text") / done │ │
│ │ 4. Execute action on sandbox │ │
│ │ 5. Loop until complete │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
{ success: true, summary: "Opened Chrome...", steps_taken: 5 }
(TEXT ONLY - no images)
api/mcp.ts # MCP protocol handler
lib/
├── agent/ # Modular agent architecture
│ ├── index.ts # Public exports
│ ├── types.ts # Type definitions
│ ├── config.ts # Model configurations
│ ├── execute.ts # Main agent loop
│ ├── describe.ts # Screen description
│ ├── progress.ts # Progress tracking
│ └── actions/ # Action handler registry (16 handlers)
├── cua-client.ts # CUA Cloud API client
└── tool-schemas.ts # MCP tool definitions
| Tool | Description |
|---|---|
list_sandboxes |
List all CUA cloud sandboxes with their current status |
get_sandbox |
Get details of a specific sandbox including API URLs |
start_sandbox |
Start a stopped sandbox |
stop_sandbox |
Stop a running sandbox |
restart_sandbox |
Restart a sandbox |
Note: Create and delete sandboxes via the CUA Dashboard - the Cloud API doesn't expose these operations.
| Tool | Description |
|---|---|
describe_screen |
Get a text description of current screen state using vision AI. No actions taken. |
run_task |
Execute a computer task autonomously. Returns immediately with task_id for polling. |
get_task_progress |
Poll progress of running tasks. Returns current step, last action, and reasoning. |
get_task_history |
Retrieve results of a previously executed task by ID. |
- Go to cua.ai/signin
- Navigate to Dashboard > API Keys > New API Key
- Copy your API key (starts with
sk_cua-api01_...)
Add to your ~/.claude.json:
{
"mcpServers": {
"cua": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://cua-mcp-server.vercel.app/mcp"]
}
}
}You: "List my CUA sandboxes"
Claude: [Uses list_sandboxes tool]
You: "Start my-sandbox"
Claude: [Uses start_sandbox tool]
You: "Open Firefox and go to google.com on my-sandbox"
Claude: [Uses run_task with task="Open Firefox and navigate to google.com"]
→ Returns: { success: true, summary: "Opened Firefox, navigated to google.com", steps_taken: 4 }
You: "What's currently on the screen?"
Claude: [Uses describe_screen tool]
→ Returns: { description: "Firefox browser showing Google homepage with search box..." }
You: "On my-sandbox, open Chrome, go to github.com, and search for 'mcp server'"
Claude uses run_task:
- task: "Open Chrome browser, navigate to github.com, find the search box, type 'mcp server' and press Enter"
- Returns summary of what happened (no screenshots in your context)
You: "What's on the screen right now?"
Claude uses describe_screen:
- focus: "ui" (or "text" or "full")
- Returns text description of UI elements, buttons, text content
You: "Is there a login button visible?"
Claude uses describe_screen:
- question: "Is there a login button visible?"
- Returns: "Yes, there is a blue 'Sign In' button in the top right corner..."
- Vercel account with Pro plan (for 800s function timeout)
- Vercel Blob storage
- Anthropic API key
# Clone the repository
git clone https://github.com/anthropics/cua-mcp-server.git
cd cua-mcp-server
# Install dependencies
npm install
# Deploy to Vercel
vercel --prod| Variable | Description | Required |
|---|---|---|
CUA_API_KEY |
Your CUA Cloud API key | Yes |
ANTHROPIC_API_KEY |
Anthropic API key for vision processing | Yes |
BLOB_READ_WRITE_TOKEN |
Vercel Blob token (auto-added) | Yes |
CUA_API_BASE |
Custom API base URL (default: https://api.cua.ai) | No |
CUA_MODEL |
Model to use: claude-opus-4-5 (default) or claude-sonnet-4-5 |
No |
- Go to your Vercel project dashboard
- Navigate to Storage → Create → Blob
- The
BLOB_READ_WRITE_TOKENwill be automatically added
If you don't want to store the CUA API key on the server:
{
"mcpServers": {
"cua": {
"command": "npx",
"args": [
"-y", "mcp-remote",
"https://your-deployment.vercel.app/mcp",
"--header", "X-CUA-API-Key: sk_cua-api01_your-key-here"
]
}
}
}URL: POST /mcp
Content-Type: application/json
{
"jsonrpc": "2.0",
"method": "tools/call",
"id": 1,
"params": {
"name": "run_task",
"arguments": {
"sandbox_name": "s-linux-abc123",
"task": "Open Firefox and navigate to google.com",
"max_steps": 30,
"timeout_seconds": 120
}
}
}Response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"content": [{
"type": "text",
"text": "{\"task_id\":\"task_123...\",\"success\":true,\"summary\":\"Opened Firefox, navigated to google.com\",\"steps_taken\":4,\"duration_ms\":8500}"
}]
}
}{
"jsonrpc": "2.0",
"method": "tools/call",
"id": 2,
"params": {
"name": "describe_screen",
"arguments": {
"sandbox_name": "s-linux-abc123",
"focus": "ui",
"question": "Is there a search box visible?"
}
}
}| Model | Env Variable | Tool Version | Features |
|---|---|---|---|
| Claude Opus 4.5 (default) | CUA_MODEL=claude-opus-4-5 |
computer_20251124 |
Zoom support, higher accuracy |
| Claude Sonnet 4.5 | CUA_MODEL=claude-sonnet-4-5 |
computer_20250124 |
Faster, lower cost |
The agent can perform the following actions autonomously:
UI Actions:
screenshot- Capture current screenleft_click,right_click,double_click,triple_click,middle_click- Mouse clicks at coordinatesmouse_move- Move cursor to coordinatesleft_click_drag- Click and drag from start to end coordinatesleft_mouse_down,left_mouse_up- Press/release mouse buttonscroll- Scroll up/down/left/rightwait- Pause executionzoom- View specific screen region at full resolution (Opus 4.5 only, defaults to center if no coordinate)
Keyboard:
type- Type textkey- Press key or key combination (e.g., "ctrl+c")hold_key- Hold a modifier key down (auto-releases after next action)
| Constraint | Value |
|---|---|
| Function timeout | 800 seconds (Vercel Pro) |
| Max steps per task | 100 |
| Default steps | 100 |
| Default timeout | 750 seconds |
| Task history TTL | 24 hours |
| Display resolution | Dynamic (default 1024x768) |
| OS | Size | CPU | RAM | Use Case |
|---|---|---|---|---|
| Linux | small | 2 | 4GB | Development, testing |
| Linux | medium | 4 | 8GB | Build tasks, CI/CD |
| Linux | large | 8 | 16GB | Heavy workloads |
| Windows | small | 2 | 4GB | Basic Windows apps |
| Windows | medium | 4 | 8GB | Office, development |
| Windows | large | 8 | 16GB | Enterprise apps |
| macOS | small | 2 | 4GB | iOS development |
| macOS | medium | 4 | 8GB | Xcode builds |
| macOS | large | 8 | 16GB | Heavy compilation |
north-america- US East (lowest latency for US users)europe- EU Westasia- Asia Pacific
Set CUA_API_KEY environment variable in Vercel or pass via X-CUA-API-Key header.
The server needs an Anthropic API key for vision processing. Add it to your Vercel environment variables.
- Default timeout is 750 seconds
- Reduce task complexity or break into smaller steps
- Check if sandbox is responsive with
describe_screen
- Default is 100 steps (max 100)
- Break complex tasks into smaller subtasks
- Use more specific task descriptions
MIT