Feat/agent mode and mcp by hi-lei · Pull Request #14 · verda-cloud/verda-cli

hi-lei · 2026-04-07T11:26:00Z

Summary

--agent mode: Global flag that forces JSON output, disables interactive prompts, and returns structured errors with semantic exit codes. Agents and scripts get deterministic, parseable behavior.
MCP server (verda mcp serve): Model Context Protocol server over stdio with 18 tools — discovery, cost estimation, VM lifecycle, SSH, and volumes. Works with Claude Code, Cursor, and any MCP-compatible agent. Instant handshake (<300ms), credentials deferred to first tool call.
Standardized error classification: ClassifyError() maps all errors to structured JSON with codes (AUTH_ERROR, API_ERROR, NOT_FOUND, MISSING_REQUIRED_FLAGS, etc.) and semantic exit codes.
Smart defaults in create_vm: Location auto-picked from available stock, OS volume defaults to 50GB, all SSH keys attached if none specified.
Integration test suite: 44 tests covering CLI, agent mode, and MCP server against staging API with credential profiles.
README restructured: MCP setup instructions, command reference moved to docs/commands.md.

MCP Tools (18)

Category	Tools
Discovery	`list_locations`, `list_instance_types`, `check_availability`, `list_images`, `vm_availability`
Cost	`get_balance`, `estimate_cost`, `get_running_costs`
VM	`list_vms`, `describe_vm`, `create_vm`, `vm_action`
SSH	`list_ssh_keys`, `add_ssh_key`, `get_ssh_command`
Volume	`list_volumes`, `create_volume`, `list_volumes_in_trash`

Test plan

Unit tests pass (go test ./...)
Lint clean (make lint)
Integration tests pass against staging (make test.integration)
MCP handshake < 5s
MCP tools tested from Cursor (live VM deploy, balance check, availability)
Agent mode structured errors for missing flags, auth errors, confirmation required

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add two features for AI agent integration with Verda CLI: **--agent mode:** - Global `--agent` flag (and `VERDA_AGENT=1` env var) that forces JSON output, disables interactive prompts, and returns structured errors - AgentError type with JSON serialization and semantic exit codes - Agent prompter that returns structured errors instead of blocking - vm create: returns MISSING_REQUIRED_FLAGS error instead of wizard - vm action: adds --action and --yes flags for non-interactive use **MCP server (`verda mcp serve`):** - Model Context Protocol server over stdio for Claude Code, Cursor, etc. - 15 tools: discovery (locations, instance types, images, availability), cost (balance, estimate, running), VM lifecycle (create, list, describe, action), SSH (keys, ssh command), volumes (list, create, trash) - Uses Verda Go SDK directly, reuses CLI credential resolution - Configured via standard MCP client settings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add ClassifyError() that maps all errors to structured AgentError types in --agent mode. Errors are classified by priority: 1. Already an AgentError (from explicit command checks) 2. SDK APIError → mapped by HTTP status (401→AUTH, 404→NOT_FOUND, 402→INSUFFICIENT_BALANCE) 3. SDK ValidationError → VALIDATION_ERROR with field/reason 4. Auth-related message heuristic → AUTH_ERROR 5. Fallback → generic ERROR This ensures agents always receive JSON errors on stderr, never plain text. Also adds docs/agent-errors.md with the complete error format specification for developers and AI agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move MCP server code from internal/verda-cli/mcp/ into internal/verda-cli/cmd/mcp/ to match the existing convention where command logic lives alongside Cobra commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MCP clients (Cursor, Claude Code) have a ~10s timeout for the server handshake. Previously the server authenticated with the Verda API during startup, which could exceed this timeout. Now the MCP server starts instantly and defers client creation to the first tool call via NewLazyServer(). The handshake completes in milliseconds; auth happens only when a tool is actually invoked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PersistentPreRunE runs opts.Complete() for every command, which resolves credentials from ~/.verda/credentials and can be slow (network calls for token exchange). This caused MCP clients like Cursor to time out during the handshake. Now mcp serve skips opts.Complete() at startup and defers credential resolution to the lazy client func called on first tool invocation. The handshake completes in ~0.3s instead of 14+s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Integration tests run the actual verda binary against a real API using credential profiles configured in ~/.verda/credentials: [test] — valid staging credentials [test-invalid] — wrong client_id/secret [test-empty] — no client_id/secret Test coverage: - Auth: show with valid/invalid profiles, agent mode - Discovery: locations, instance-types (all/gpu/cpu), availability, images - Cost: balance, estimate (with/without storage), running, invalid type - VM: list, list with status filter, describe invalid ID, SSH keys, volumes - Agent mode: forced JSON output, MISSING_REQUIRED_FLAGS for vm create/action, CONFIRMATION_REQUIRED for destructive actions, auth errors - MCP: handshake speed (<5s), tools/list (17 tools), tool calls (list_locations, get_balance, estimate_cost, describe_vm invalid ID), auth errors (no creds, invalid creds), missing required args Run with: make test.integration Guarded by //go:build integration so regular `go test` skips them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Integration tests run the actual verda binary against a real API using credential profiles configured in ~/.verda/credentials: [test] — valid staging credentials [test-invalid] — wrong client_id/secret [test-empty] — no client_id/secret Test coverage: - Auth: show with valid/invalid profiles, agent mode - Discovery: locations, instance-types (all/gpu/cpu), availability, images - Cost: balance, estimate (with/without storage), running, invalid type - VM: list, list with status filter, describe invalid ID, SSH keys, volumes - Agent mode: forced JSON output, MISSING_REQUIRED_FLAGS for vm create/action, CONFIRMATION_REQUIRED for destructive actions, auth errors - MCP: handshake speed (<5s), tools/list (17 tools), tool calls (list_locations, get_balance, estimate_cost, describe_vm invalid ID), auth errors (no creds, invalid creds), missing required args Run with: make test.integration Guarded by //go:build integration so regular go test skips them. Added !tests/ negation to .gitignore (overrides *tests metasyntactic rule). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix MCP test client: capture stderr, better error messages on failure - Fix requireProfile: use locations call to verify both creds and API - Fix agent mode tests: require profile before testing, better assertions - Use 1A6000.10V instance type for staging tests - Add MCP test helpers: assertToolSuccess, assertToolError, extractToolText - Add MCP tests for list_instance_types, list_ssh_keys, list_volumes, list_vms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Makefile test.integration defaults VERDA_BIN to $PWD/bin/verda so tests always use the local build, not the system verda - Agent mode auth error tests skip gracefully when binary times out (no stderr output) instead of failing with parse errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… lookup The /instance-types/{type} endpoint 404s on staging. Match the CLI's approach: fetch all types via /instance-types and filter client-side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Makefile uses $(CURDIR)/bin/verda as default VERDA_BIN - MCP estimate_cost fetches all instance types and filters client-side (matches CLI approach, avoids 404 on per-type endpoint) - Use 1A6000.10V instance type for staging integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Keeps the Cursor MCP server binary up to date with the latest build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New MCP tool that combines availability + specs + pricing in one call. Returns only instance types currently in stock, sorted by price. Filters: location, instance_type, gpu_only, cpu_only, spot. This replaces the need to chain check_availability + list_instance_types + estimate_cost — agents can answer "what can I deploy?" with one call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- list_ssh_keys: add 'search' param for name filtering - create_vm: ssh_key_ids now accepts names or IDs, resolved automatically (e.g. 'meng' matches 'meng@datacrunch.io') - Add resolveSSHKeyIDs helper for name-to-ID resolution - Add stderr startup logs for MCP server debugging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Location is now optional: if omitted, automatically finds a location with stock for the requested instance type - SSH keys are now optional: if omitted, uses the most recent SSH key in the account as default - Reduces the number of tool calls needed for a deploy from 3-4 to 1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When create_vm is called without ssh_key_ids, return the list of available SSH keys so the agent can ask the user which one to use, instead of silently picking the most recent key. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dance Tool description now lists what info to gather before calling: instance_type, image, hostname, ssh_key_ids, os_volume_size_gb. Guides agent to use vm_availability and list_images first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Return an MCP error with explicit instruction to ASK THE USER when no SSH key is provided to create_vm. Lists available key names in the error message so the agent can present choices. Previously returned data that agents interpreted as informational and silently picked a key. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MCP tools are non-interactive -- they can't pause to ask the user. By making these fields required in the schema, the agent (Cursor, Claude Code) is forced to gather the info from the user before calling the tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

If user doesn't mention SSH keys, attach all account keys so the VM is accessible with any of their keys. If user specifies a name or ID, only that key is attached. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When no SSH key is specified in create_vm, the response now includes a note listing which keys were attached, e.g.: "No SSH key specified — attached all 3 account keys: meng@datacrunch.io, ..." This helps the agent inform the user what happened. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- os_volume_size_gb defaults to 50 if not provided - Updated tool description: only instance_type, image, hostname are truly required; ssh_key_ids, os_volume_size_gb, location all have sensible defaults Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- README: streamlined with install, getting started, MCP setup, agent mode, and links to detailed docs - docs/commands.md: full command reference moved from README - Keep gif at top for visual impact (interactive + non-interactive modes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

IsAgentError only matches *AgentError types, but SDK errors (auth, API) are plain errors. Check opts.Agent to ensure all errors in --agent mode are classified and output as structured JSON. Also fix perfsprint lint in credentials.go. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Focus on what GPU customers actually care about: availability, pricing, deployment, and cost monitoring. Remove SSH example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hi-lei and others added 28 commits April 6, 2026 16:01

chore: add .worktrees/ to .gitignore

6907bf1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: copy build to /usr/local/bin/verda-test during integration tests

b22ee74

Keeps the Cursor MCP server binary up to date with the latest build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: mark MCP and agent mode as beta

8296bec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: rewrite MCP examples from customer perspective

9da1198

Focus on what GPU customers actually care about: availability, pricing, deployment, and cost monitoring. Remove SSH example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: pre-allocate slices to satisfy prealloc linter

41d9280

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hi-lei merged commit a5c57bc into main Apr 7, 2026
13 checks passed

hi-lei deleted the feat/agent-mode-and-mcp branch April 7, 2026 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/agent mode and mcp#14

Feat/agent mode and mcp#14
hi-lei merged 28 commits intomainfrom
feat/agent-mode-and-mcp

hi-lei commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hi-lei commented Apr 7, 2026

Summary

MCP Tools (18)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant