Skip to content

Feat/agent mode and mcp#14

Merged
hi-lei merged 28 commits intomainfrom
feat/agent-mode-and-mcp
Apr 7, 2026
Merged

Feat/agent mode and mcp#14
hi-lei merged 28 commits intomainfrom
feat/agent-mode-and-mcp

Conversation

@hi-lei
Copy link
Copy Markdown
Collaborator

@hi-lei hi-lei commented Apr 7, 2026

Summary

  • --agent mode: Global flag that forces JSON output, disables interactive prompts, and returns structured errors with semantic exit codes. Agents and scripts get deterministic, parseable behavior.

  • MCP server (verda mcp serve): Model Context Protocol server over stdio with 18 tools — discovery, cost estimation, VM lifecycle, SSH, and volumes. Works with Claude Code, Cursor, and any MCP-compatible agent. Instant handshake (<300ms), credentials deferred to first tool call.

  • Standardized error classification: ClassifyError() maps all errors to structured JSON with codes (AUTH_ERROR, API_ERROR, NOT_FOUND, MISSING_REQUIRED_FLAGS, etc.) and semantic exit codes.

  • Smart defaults in create_vm: Location auto-picked from available stock, OS volume defaults to 50GB, all SSH keys attached if none specified.

  • Integration test suite: 44 tests covering CLI, agent mode, and MCP server against staging API with credential profiles.

  • README restructured: MCP setup instructions, command reference moved to docs/commands.md.

MCP Tools (18)

Category Tools
Discovery list_locations, list_instance_types, check_availability, list_images, vm_availability
Cost get_balance, estimate_cost, get_running_costs
VM list_vms, describe_vm, create_vm, vm_action
SSH list_ssh_keys, add_ssh_key, get_ssh_command
Volume list_volumes, create_volume, list_volumes_in_trash

Test plan

  • Unit tests pass (go test ./...)
  • Lint clean (make lint)
  • Integration tests pass against staging (make test.integration)
  • MCP handshake < 5s
  • MCP tools tested from Cursor (live VM deploy, balance check, availability)
  • Agent mode structured errors for missing flags, auth errors, confirmation required

🤖 Generated with Claude Code

hi-lei and others added 28 commits April 6, 2026 16:01
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add two features for AI agent integration with Verda CLI:

**--agent mode:**
- Global `--agent` flag (and `VERDA_AGENT=1` env var) that forces JSON
  output, disables interactive prompts, and returns structured errors
- AgentError type with JSON serialization and semantic exit codes
- Agent prompter that returns structured errors instead of blocking
- vm create: returns MISSING_REQUIRED_FLAGS error instead of wizard
- vm action: adds --action and --yes flags for non-interactive use

**MCP server (`verda mcp serve`):**
- Model Context Protocol server over stdio for Claude Code, Cursor, etc.
- 15 tools: discovery (locations, instance types, images, availability),
  cost (balance, estimate, running), VM lifecycle (create, list, describe,
  action), SSH (keys, ssh command), volumes (list, create, trash)
- Uses Verda Go SDK directly, reuses CLI credential resolution
- Configured via standard MCP client settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ClassifyError() that maps all errors to structured AgentError types
in --agent mode. Errors are classified by priority:

1. Already an AgentError (from explicit command checks)
2. SDK APIError → mapped by HTTP status (401→AUTH, 404→NOT_FOUND, 402→INSUFFICIENT_BALANCE)
3. SDK ValidationError → VALIDATION_ERROR with field/reason
4. Auth-related message heuristic → AUTH_ERROR
5. Fallback → generic ERROR

This ensures agents always receive JSON errors on stderr, never plain text.

Also adds docs/agent-errors.md with the complete error format specification
for developers and AI agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move MCP server code from internal/verda-cli/mcp/ into
internal/verda-cli/cmd/mcp/ to match the existing convention
where command logic lives alongside Cobra commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MCP clients (Cursor, Claude Code) have a ~10s timeout for the server
handshake. Previously the server authenticated with the Verda API
during startup, which could exceed this timeout.

Now the MCP server starts instantly and defers client creation to the
first tool call via NewLazyServer(). The handshake completes in
milliseconds; auth happens only when a tool is actually invoked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PersistentPreRunE runs opts.Complete() for every command, which
resolves credentials from ~/.verda/credentials and can be slow
(network calls for token exchange). This caused MCP clients like
Cursor to time out during the handshake.

Now mcp serve skips opts.Complete() at startup and defers credential
resolution to the lazy client func called on first tool invocation.
The handshake completes in ~0.3s instead of 14+s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integration tests run the actual verda binary against a real API using
credential profiles configured in ~/.verda/credentials:

  [test]          — valid staging credentials
  [test-invalid]  — wrong client_id/secret
  [test-empty]    — no client_id/secret

Test coverage:
- Auth: show with valid/invalid profiles, agent mode
- Discovery: locations, instance-types (all/gpu/cpu), availability, images
- Cost: balance, estimate (with/without storage), running, invalid type
- VM: list, list with status filter, describe invalid ID, SSH keys, volumes
- Agent mode: forced JSON output, MISSING_REQUIRED_FLAGS for vm create/action,
  CONFIRMATION_REQUIRED for destructive actions, auth errors
- MCP: handshake speed (<5s), tools/list (17 tools), tool calls
  (list_locations, get_balance, estimate_cost, describe_vm invalid ID),
  auth errors (no creds, invalid creds), missing required args

Run with: make test.integration
Guarded by //go:build integration so regular `go test` skips them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integration tests run the actual verda binary against a real API using
credential profiles configured in ~/.verda/credentials:

  [test]          — valid staging credentials
  [test-invalid]  — wrong client_id/secret
  [test-empty]    — no client_id/secret

Test coverage:
- Auth: show with valid/invalid profiles, agent mode
- Discovery: locations, instance-types (all/gpu/cpu), availability, images
- Cost: balance, estimate (with/without storage), running, invalid type
- VM: list, list with status filter, describe invalid ID, SSH keys, volumes
- Agent mode: forced JSON output, MISSING_REQUIRED_FLAGS for vm create/action,
  CONFIRMATION_REQUIRED for destructive actions, auth errors
- MCP: handshake speed (<5s), tools/list (17 tools), tool calls
  (list_locations, get_balance, estimate_cost, describe_vm invalid ID),
  auth errors (no creds, invalid creds), missing required args

Run with: make test.integration
Guarded by //go:build integration so regular go test skips them.
Added !tests/ negation to .gitignore (overrides *tests metasyntactic rule).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix MCP test client: capture stderr, better error messages on failure
- Fix requireProfile: use locations call to verify both creds and API
- Fix agent mode tests: require profile before testing, better assertions
- Use 1A6000.10V instance type for staging tests
- Add MCP test helpers: assertToolSuccess, assertToolError, extractToolText
- Add MCP tests for list_instance_types, list_ssh_keys, list_volumes, list_vms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Makefile test.integration defaults VERDA_BIN to $PWD/bin/verda
  so tests always use the local build, not the system verda
- Agent mode auth error tests skip gracefully when binary times out
  (no stderr output) instead of failing with parse errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… lookup

The /instance-types/{type} endpoint 404s on staging. Match the CLI's
approach: fetch all types via /instance-types and filter client-side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Makefile uses $(CURDIR)/bin/verda as default VERDA_BIN
- MCP estimate_cost fetches all instance types and filters client-side
  (matches CLI approach, avoids 404 on per-type endpoint)
- Use 1A6000.10V instance type for staging integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keeps the Cursor MCP server binary up to date with the latest build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New MCP tool that combines availability + specs + pricing in one call.
Returns only instance types currently in stock, sorted by price.

Filters: location, instance_type, gpu_only, cpu_only, spot.

This replaces the need to chain check_availability + list_instance_types
+ estimate_cost — agents can answer "what can I deploy?" with one call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- list_ssh_keys: add 'search' param for name filtering
- create_vm: ssh_key_ids now accepts names or IDs, resolved automatically
  (e.g. 'meng' matches 'meng@datacrunch.io')
- Add resolveSSHKeyIDs helper for name-to-ID resolution
- Add stderr startup logs for MCP server debugging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Location is now optional: if omitted, automatically finds a location
  with stock for the requested instance type
- SSH keys are now optional: if omitted, uses the most recent SSH key
  in the account as default
- Reduces the number of tool calls needed for a deploy from 3-4 to 1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When create_vm is called without ssh_key_ids, return the list of
available SSH keys so the agent can ask the user which one to use,
instead of silently picking the most recent key.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dance

Tool description now lists what info to gather before calling:
instance_type, image, hostname, ssh_key_ids, os_volume_size_gb.
Guides agent to use vm_availability and list_images first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Return an MCP error with explicit instruction to ASK THE USER when
no SSH key is provided to create_vm. Lists available key names in
the error message so the agent can present choices.

Previously returned data that agents interpreted as informational
and silently picked a key.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MCP tools are non-interactive -- they can't pause to ask the user.
By making these fields required in the schema, the agent (Cursor,
Claude Code) is forced to gather the info from the user before
calling the tool.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If user doesn't mention SSH keys, attach all account keys so the VM
is accessible with any of their keys. If user specifies a name or ID,
only that key is attached.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When no SSH key is specified in create_vm, the response now includes
a note listing which keys were attached, e.g.:
"No SSH key specified — attached all 3 account keys: meng@datacrunch.io, ..."

This helps the agent inform the user what happened.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- os_volume_size_gb defaults to 50 if not provided
- Updated tool description: only instance_type, image, hostname are
  truly required; ssh_key_ids, os_volume_size_gb, location all have
  sensible defaults

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README: streamlined with install, getting started, MCP setup,
  agent mode, and links to detailed docs
- docs/commands.md: full command reference moved from README
- Keep gif at top for visual impact (interactive + non-interactive modes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
IsAgentError only matches *AgentError types, but SDK errors (auth,
API) are plain errors. Check opts.Agent to ensure all errors in
--agent mode are classified and output as structured JSON.

Also fix perfsprint lint in credentials.go.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Focus on what GPU customers actually care about: availability,
pricing, deployment, and cost monitoring. Remove SSH example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hi-lei hi-lei merged commit a5c57bc into main Apr 7, 2026
13 checks passed
@hi-lei hi-lei deleted the feat/agent-mode-and-mcp branch April 7, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant