feat: Watch Management API — REST + ov CLI + MCP (RFC #2104)#2110
Conversation
Net-new MCP capability — main branch's add_resource tool had no watch entrypoint. Watch_interval > 0 triggers periodic full re-ingest via existing WatchScheduler, requires explicit `to` URI to avoid collision in WatchManager._uri_to_task reverse index. Part of P3 of RFC volcengine#2104 (Watch Management API).
Implements P1+P2 of RFC volcengine#2104 (Watch Management API). New /api/v1/watches router exposing: - GET /watches (list, with active_only filter; supports ?to_uri= for single lookup) - GET /watches/{task_id} - PATCH /watches/{task_id} or ?to_uri= (partial update: watch_interval, is_active, reason, instruction — orthogonal pause via is_active) - DELETE /watches/{task_id} or ?to_uri= - POST /watches/{task_id}/trigger or /watches/trigger?to_uri= (immediate refresh, does not wait) Reuses WatchManager primitives directly — zero backend changes. Translates watch_manager.PermissionDeniedError (plain Exception) to the OpenVikingError-rooted variant so the global handler renders 403. Tests in tests/server/test_api_watches.py cover empty list, full lifecycle, by-uri lookup, active_only filter, dual-key behavior, missing key, 404, extra-fields rejection, trigger-by-uri, and partial PATCH.
Implements P4 of RFC volcengine#2104. MCP gets the minimum closure: list + cancel. Pause/resume/trigger/set-interval are intentionally not exposed — those are power-user operations belonging on the ov CLI, not in the agent's system prompt. cancel_watch uses to_uri as the primary key (agents don't track UUIDs). list_watches returns a compact one-line-per-task format with URI, interval, status, and next-run timestamp. Tests cover empty/seeded list, cancel-by-URI, cancel-not-found, plus the add_resource watch_interval-without-to error hint added in the prior commit on this branch.
Implements P5 of RFC volcengine#2104. Full parity with the REST /watches surface: ov watch ls [--active-only] ov watch show <key> ov watch rm <key> ov watch pause <key> ov watch resume <key> ov watch set-interval <key> <minutes> ov watch trigger <key> <key> auto-detects by prefix: a "viking://" string routes to the by-uri HTTP endpoint, anything else is treated as a task_id and routes to the path endpoint. Saves users from remembering UUIDs. Adds patch() and post_with_query() helpers to BaseClient (PATCH was absent; post_with_query supports POST .../trigger?to_uri=...). Output flows through the existing OutputFormat (Table / JSON) helper unchanged.
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
There was a problem hiding this comment.
Pull request overview
Implements RFC #2104 — Watch Management API across three control planes (REST, MCP, and the ov CLI). The change is a thin layer over existing WatchManager primitives: new REST endpoints under /api/v1/watches with dual-key (path {task_id} or ?to_uri=) resolution, two new MCP tools (list_watches, cancel_watch) plus watch_interval/to params on add_resource, and a new ov watch subcommand group with full REST parity. Tests cover the REST lifecycle and MCP tool variants; a Rust unit test validates URI-vs-task-id key classification.
Changes:
- Adds REST routes (list/get/patch/delete/trigger) with dual-key support, permission-error translation, and partial-update semantics.
- Extends MCP
add_resourcewithwatch_interval/to, and addslist_watches/cancel_watchtools. - Adds
ov watch ls/show/rm/pause/resume/set-interval/triggerCLI pluspatch/post_with_queryHTTP client helpers.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| openviking/server/routers/watches.py | New router implementing REST /watches lifecycle with dual-key resolution and exception translation. |
| openviking/server/routers/init.py | Exports the new watches_router. |
| openviking/server/app.py | Registers the watches router on the FastAPI app. |
| openviking/server/mcp_endpoint.py | Adds watch_interval/to to add_resource and new list_watches/cancel_watch MCP tools. |
| crates/ov_cli/src/main.rs | Wires the new Watch subcommand group into the CLI dispatcher. |
| crates/ov_cli/src/commands/mod.rs | Declares the new watch command module. |
| crates/ov_cli/src/commands/watch.rs | New CLI handlers for ls/show/rm/pause/resume/set-interval/trigger plus is_uri classifier and unit tests. |
| crates/ov_cli/src/client.rs | Adds typed list/get/patch/delete/trigger_watch_* helpers and exposes generic patch/post_with_query. |
| crates/ov_cli/src/base_client.rs | Adds low-level patch and post_with_query methods with query-string + JSON body support. |
| tests/server/test_api_watches.py | New end-to-end tests covering the REST lifecycle, by-URI lookup, active filtering, and edge cases. |
| tests/server/test_mcp_endpoint.py | Adds tests for add_resource watch requirement and the new MCP list_watches/cancel_watch tools. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
REST router (`watches.py`):
- _resolve_task: relax dual-key handling. When both {task_id} and ?to_uri=
are supplied, accept if they refer to the same task (useful cross-key
check), reject only on disagreement. Expose ?to_uri= on the by-id GET /
PATCH / DELETE / trigger routes so this is reachable from HTTP.
- _patch_impl: drop fragile "not found" substring matching. _resolve_task
pre-checks existence; if WatchManager.update_task still raises ValueError
(race window), map to 404 — matches the pre-check behavior. Add a
docstring explaining why ConflictError and other OpenVikingError-rooted
exceptions are intentionally allowed to bubble for the global handler.
MCP (`mcp_endpoint.py`):
- add_resource: reject negative watch_interval. Previously a negative value
silently bypassed the `watch_interval > 0 requires to` check and was
forwarded with undefined semantics.
- cancel_watch: treat delete_task returning False as idempotent success.
Race-removal between our lookup and delete is a benign concurrent cancel
— the desired post-condition holds either way.
Tests:
- Rename test_dual_key_conflict_returns_400 → test_dual_key_matching_accepted
(the matching case is now valid). Add test_dual_key_mismatch_returns_400
for the genuine conflict path.
- Rename test_patch_to_uri_conflict_returns_409 → test_patch_rejects_unknown_field
to match what it actually verifies.
CLI (`commands/watch.rs`):
- set_interval: local-validate `minutes > 0` before issuing the HTTP request
so we fail fast with a clear message instead of a generic server 400.
Addresses: comments 1, 2, 3, 4, 6, 7, 8 on PR volcengine#2110.
REST router (`watches.py`): - _trigger_impl: dispatch schedule_task via asyncio.create_task so the HTTP request returns immediately. WatchScheduler.schedule_task awaits _execute_task inline (full re-ingest can take many seconds); keeping the trigger truly fire-and-forget matches the documented contract. - _resolve_task: document the deliberate 404-for-no-permission collapse (avoids leaking task existence to unauthorized callers across tenants). - by-id route ?to_uri= Query: expand description to warn that a wrong cross-key value blocks the operation with 400. MCP (`mcp_endpoint.py`): - list_watches: drop dead `try/except PermissionDeniedError` — WatchManager .get_all_tasks silently filters and does not raise. Replace with a comment pointing at the filtering line. - cancel_watch: rename captured-but-unused `ok` to `_` and clarify the inline comment to state that the return value is intentionally ignored (idempotent on the race-removal case). CLI (`commands/watch.rs`): - Extract `classify_key` and `validate_interval_minutes` as pure helpers, unit-tested directly. The previous "boolean predicate" test gave false coverage — would have passed even if the guard was removed. - classify_key now rejects `Viking://` (wrong case) and other-scheme `://` inputs (e.g. `http://`) with a clear Parse error, instead of silently treating them as task_ids that 404 server-side. Tests: - test_full_lifecycle and test_trigger_by_uri: fix monkeypatch signature (`(self, task_id)` instead of `(task_id)`) — class-level setattr binds self. Wait on an asyncio.Event so the assertion does not race the background asyncio.create_task started by the new fire-and-forget trigger implementation. Addresses: comments 9, 10, 11, 12, 13, 14, 15, 16 on PR volcengine#2110.
REST router (`watches.py`): - Hold strong references to fire-and-forget trigger tasks. The event loop only keeps weak refs to tasks created via asyncio.create_task, so a discarded handle can be garbage-collected mid-execution and silently abort the refresh. Add a module-level `_BACKGROUND_TRIGGER_TASKS` set and use add_done_callback(set.discard) to release on completion. - list_or_get_watch: when `?to_uri=` and `active_only=true` are both supplied, also 404 paused tasks. Previously the to_uri branch returned any matching task regardless of active_only, which was inconsistent with the list-branch contract. Tests: - Add test_add_resource_rejects_negative_watch_interval covering the guard introduced in the prior commit. Addresses: comments 17, 18, 19 on PR volcengine#2110.
Two ergonomic changes after PR review feedback: 1. `ov watch *` → `ov task watch *`. The top-level `Watch` command moved under `TaskCommands` since both are forms of background work tracking (watch = recurring subscription, task = single-execution record). REST stays at /api/v1/watches — only the CLI surface is nested. 2. `set-interval <key> <minutes>` → `update <key> [flags]`. The PATCH endpoint already supported watch_interval / is_active / reason / instruction, but the CLI only exposed watch_interval. `update` now covers all four via flags, with at-least-one-flag enforced. `pause` / `resume` shortcuts are kept since they're the highest-frequency ops and `update --active=false` reads worse for them. Helper `build_update_body` is extracted as a pure function so the empty- flag, partial-flag, and interval-validation paths are unit-testable without an HTTP client.
qin-ctx
left a comment
There was a problem hiding this comment.
Requesting changes for the REST watch_interval validation issue. I also left non-blocking comments for stale CLI wording after the command moved under ov task watch.
|
|
||
| model_config = ConfigDict(extra="forbid") | ||
|
|
||
| watch_interval: Optional[float] = None |
There was a problem hiding this comment.
[Bug] (blocking) UpdateWatchRequest.watch_interval accepts any float, so PATCH /api/v1/watches/{task_id} can persist watch_interval <= 0 through the public REST management API. For example, sending {"watch_interval": -1} to an active task makes WatchManager.update_task deactivate it and store the negative interval. A later resume with {"is_active": true} then raises ValueError("watch_interval must be > 0 for active tasks"), which this router currently maps to 404. CLI and MCP already reject non-positive intervals, so REST should match that contract. Please validate this field at the request boundary, for example with Field(gt=0) or a model validator, and return a real invalid-argument response with regression tests for 0 and negative values.
There was a problem hiding this comment.
Great catch and fully valid — REST was the odd one out. Fixed by adding a field_validator on UpdateWatchRequest.watch_interval that rejects non-positive values at the request boundary (None still means "leave unchanged"). The validator's docstring spells out the original failure mode (deactivate-store-then-resume-blows-up) so the rationale survives.
Regression test added covering -1, 0, and -42.5 — all assert 422 now. The three control planes (REST/CLI/MCP) are now uniform on "non-positive is invalid at the boundary".
| fn validate_interval_minutes(minutes: f64) -> Result<()> { | ||
| if !(minutes > 0.0) { | ||
| return Err(Error::Parse(format!( | ||
| "minutes must be > 0 (got {minutes}). To pause a watch task, use `ov watch pause`." |
There was a problem hiding this comment.
[Suggestion] (non-blocking) This user-facing error still points to the old ov watch pause command after the CLI moved under ov task watch. The nearby comment also still mentions ov watch set-interval. Please update these strings to ov task watch pause and the current ov task watch update --interval ... flow so users do not follow a command path that no longer parses.
There was a problem hiding this comment.
You're right — stale after the CLI nesting refactor. Updated both:
- The error string in
validate_interval_minutesnow points toov task watch pause. - The helper's docstring now references
ov task watch update(replacing the obsoleteov watch set-interval).
| # MCP exposes the minimum closure: list + cancel. Pause/resume/trigger/ | ||
| # set-interval are intentionally NOT exposed — they're either low-value for | ||
| # agents or invite unwanted autonomous decisions. Power users should reach | ||
| # for the REST API or `ov watch` CLI for those operations. |
There was a problem hiding this comment.
[Suggestion] (non-blocking) This comment still points power users to ov watch, but the PR moved the command under ov task watch. The same block also mentions the removed set-interval verb. Please update this to the current CLI surface, for example ov task watch and update --interval, so the MCP-side guidance stays aligned with the shipped command names.
There was a problem hiding this comment.
Refreshed the comment block. The new wording mentions ov task watch * (CLI surface), explicitly lists pause/resume/trigger/update --interval as the actions that intentionally aren't on MCP, and drops the removed set-interval reference.
REST router (`watches.py`): - UpdateWatchRequest: add field_validator rejecting watch_interval <= 0 at the request boundary. Previously REST accepted any float and forwarded to WatchManager.update_task, which deactivated the task and stored the bad cadence. A later resume (`is_active=true`) then failed inside update_task with ValueError that this router maps to 404 — misleading callers about the root cause. Now matches the CLI/MCP boundary checks so all three control planes agree on "non-positive is invalid". Test added covering 0, -1, and -42.5. CLI (`commands/watch.rs`): - Update the pause-hint error string from `ov watch pause` to `ov task watch pause` and update the helper docstring to reference `ov task watch update --interval` after the CLI nesting refactor. MCP (`mcp_endpoint.py`): - Refresh the watch-management section comment: mention `ov task watch *` (not `ov watch`) and `update --interval` (not the removed `set-interval` verb), keeping MCP-side guidance aligned with the shipped command names. Addresses: 3 review comments from @qin-ctx on PR volcengine#2110 (1 blocking, 2 non-blocking).
Description
Implements RFC #2104 — Watch Management API across all three control planes (REST / ov CLI / MCP). Watch tasks created via
POST /resourceswithwatch_intervalpreviously had no list / pause / delete / trigger / update endpoint — this PR closes that gap.Zero backend changes.
WatchManageralready exposes every primitive needed. This PR is purely a router / MCP-tool / CLI-command layer.Related Issue
Implements RFC discussion: #2104
(See the v2 follow-up comment for the 4 design deltas finalized during PR review — dual-key relaxation, fire-and-forget trigger, CLI nesting under
ov task, and theset-interval→updaterename.)Type of Change
Changes Made
P1 + P2 — REST
/api/v1/watches(openviking/server/routers/watches.py):GET /watches— list (with?active_only=); also acts as single-lookup via?to_uri=. When both are supplied a paused task at that URI also 404s, keeping the active_only contract consistent across branches.GET /watches/{task_id}(with optional?to_uri=cross-key sanity check)PATCH /watches/{task_id}or?to_uri=— partial update ofwatch_interval,is_active,reason,instruction.is_activeis orthogonal towatch_intervalfor pause-without-losing-cadence.DELETE /watches/{task_id}or?to_uri=POST /watches/{task_id}/triggeror/watches/trigger?to_uri=— fire-and-forget, returns immediately. Background task held via a module-level_BACKGROUND_TRIGGER_TASKS: setwithadd_done_callback(discard)so the asyncio loop's weak-ref policy doesn't GC the refresh mid-execution.{task_id}and?to_uri=are supplied, accept if they refer to the same task (cross-key sanity check); reject with 400 only on disagreement.P3 — MCP
add_resourcegains watch params (openviking/server/mcp_endpoint.py):watch_interval: float = 0+to: str = ""parameters. Negative values rejected with a clear error;watch_interval > 0requiresto(server-side hard constraint with hint message for agents). Main branch's MCP had no watch entrypoint.P4 — MCP minimum closure (
openviking/server/mcp_endpoint.py):list_watches()— one line per task, notask_idexposed (agents use URI)cancel_watch(to_uri)— delete by URI; idempotent on race-removalP5 —
ov task watchsubcommand group (crates/ov_cli/src/commands/watch.rs):ov task(sibling ofov task status/ov task list) since watch is a form of background work tracking, parallel to single-execution task records.ls [--active-only],show,rm,pause,resume,update,triggerupdate <key> [--interval N] [--active true|false] [--reason ...] [--instruction ...]unifies all four mutable PATCH fields (replacing the originalset-intervalsingle-field verb). At-least-one-flag enforced.<key>auto-classification (classify_keyhelper):viking://prefix → by-URI route; other-scheme://keys (http://,Viking://, etc.) rejected locally with a clear Parse error instead of silently 404ing server-side.BaseClient::patchandBaseClient::post_with_query(PATCH was absent; the latter supportsPOST .../trigger?to_uri=).Testing
Test coverage:
tests/server/test_api_watches.py(new, 11 tests) — empty list, full lifecycle, by-URI lookup, active_only filter (both branches), dual-key matching accepted, dual-key mismatch 400, missing key (422), 404 cases, extra-fields rejection (422), trigger-by-URI fire-and-forget timing, partial PATCH preserves unset fieldstests/server/test_mcp_endpoint.py(extended, 6 new tests) —watch_intervalrequiresto, negativewatch_intervalrejected,list_watchesempty + with seed,cancel_watchby URI,cancel_watchnot-foundcrates/ov_cli/src/commands/watch.rs— Rust unit tests forclassify_key(uri/task_id/case-insensitive reject/other-scheme reject) andvalidate_interval_minutesandbuild_update_body(no-flag rejection / full-field / partial-field / interval validation). 10 tests total.cargo build -p ov_cliclean;cargo test -p ov_cli commands::watch::passesuvx ruff format --check+uvx ruff checkclean on all modified filesChecklist
Additional Notes
Key design decisions (full rationale in RFC #2104 + the v2 delta comment):
{task_id}in path or?to_uri=in query.WatchManageralready maintains both indexes, so zero added cost. When both are supplied they must refer to the same task — useful as a cross-key sanity check from clients with both pieces of information; only mismatch returns 400.PermissionDeniedErrortranslation:openviking/resource/watch_manager.py:104defines its ownPermissionDeniedError(Exception)which is NOT a subclass ofOpenVikingError. Without explicit translation in the router, it falls through the global exception handler as 500. Each REST handler catcheswm_mod.PermissionDeniedErrorand re-raisesopenviking_cli.exceptions.PermissionDeniedErrorso the global handler returns 403.DELETEreturns 404 on missing:WatchManager.delete_taskreturnsbool;Falseis translated toNotFoundErrorrather than{ok: false}, avoiding contract drift for CLI consumers._resolve_taskdeliberately collapses "doesn't exist" and "no permission" into 404 to avoid leaking task existence to unauthorized callers — a security-first stance documented inline.WatchScheduler.schedule_taskawaits_execute_taskinline (full re-ingest, can take many seconds). The HTTP handler dispatches viaasyncio.create_taskand returns immediately. Strong refs held in_BACKGROUND_TRIGGER_TASKS: setbecause asyncio only retains weak refs.ov task watch *rather than top-levelov watch *. Watch (recurring subscription) and task (single-execution record) are both background-activity tracking concepts; co-locating them simplifies the top-level command tree.updateoverset-interval: the underlying PATCH covers four fields; oneupdatesubcommand with flags scales better than aset-Xverb per field.pause/resumeretained as high-frequency shortcuts.Out of scope (RFC §6 Open Questions, follow-ups):
POST /resources.watch_intervallong-term deprecationCommit layout (logical phases, for reviewer-friendly diffing):
feat(mcp): add watch_interval + to params to add_resource(P3)feat(server): add REST /watches management endpoints(P1+P2)feat(mcp): add list_watches and cancel_watch tools(P4)feat(cli): add ov watch subcommand group(P5)fix: address Copilot review on PR #2110fix: address second round of Copilot review on PR #2110fix: address third round of Copilot review on PR #2110refactor(cli): nest watch under task; replace set-interval with update