feat(mcp): progressive single-entrypoint upload for local files#1847
feat(mcp): progressive single-entrypoint upload for local files#1847t0saki wants to merge 4 commits intovolcengine:mainfrom
Conversation
Extends `add_resource` MCP tool to handle local-file paths via a server-orchestrated
two-step flow, eliminating the need for `ov` CLI in sandboxed agent environments
(Claude web, Manus) where local FS is unavailable and CLI install is blocked.
Behavior:
- Remote URL → unchanged.
- Local path → server mints a 6-char base62 token, returns prose Step 1 / Step 2
instructions pointing at /api/v1/resources/temp_upload_signed.
Agent uploads, then re-calls add_resource(temp_file_id=...).
- temp_file_id → resolved against per-tenant subdir, ingested via existing pipeline.
Token: in-memory dict, 10-min TTL, dict.pop doubles as replay protection.
Per-tenant temp-dir isolation ({root}/{aid}/{uid}/{tfid}); legacy CLI uploads
keep flat layout via dual-lookup in resolve_uploaded_temp_file_id.
Public base URL resolves env > config > listen-host fallback (12-factor: runtime
env trumps image-baked config; production deployments behind MCP proxy + nginx
must set OPENVIKING_PUBLIC_BASE_URL since the agent-facing URL is not derivable
from the server's request scope).
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Pull request overview
Adds a progressive local-file upload flow for the MCP add_resource tool so sandboxed MCP clients can upload bytes via a short-lived token and then ingest via temp_file_id, without requiring the ov CLI.
Changes:
- Introduces an in-memory upload token store and a new signed temp-upload REST route that writes uploads into per-tenant temp directories.
- Extends MCP
add_resourceto (a) return upload instructions for local paths and (b) ingest previously-uploadedtemp_file_ids. - Updates temp-file resolution/cleanup to support both per-tenant and legacy flat temp-upload layouts, plus adds server config for public base URL + upload limits.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
openviking/server/upload_token_store.py |
New in-memory short-lived token issuance/consume mechanism for signed uploads. |
openviking/server/routers/resources.py |
Adds /api/v1/resources/temp_upload_signed, factors _resolve_temp_or_path, and updates temp cleanup for per-tenant layout. |
openviking/server/mcp_endpoint.py |
Extends MCP add_resource with progressive upload instructions + temp_file_id ingestion and public base URL resolution. |
openviking/server/local_input_guard.py |
Extends resolve_uploaded_temp_file_id to prefer per-tenant layout with fallback to legacy flat layout. |
openviking/server/dependencies.py |
Adds global ServerConfig accessor for MCP tools. |
openviking/server/config.py |
Adds public_base_url, upload_signed_ttl_seconds, upload_signed_max_bytes config fields. |
openviking/server/app.py |
Registers ServerConfig into the new dependency getter at startup. |
tests/server/test_upload_token_store.py |
Unit tests for token issuance/consume/expiry/collision handling. |
tests/server/test_resources_temp_upload_signed.py |
Endpoint tests for signed upload behavior, size limits, and legacy temp upload compatibility. |
tests/server/test_mcp_endpoint.py |
Tests progressive MCP add_resource behavior (local-path instructions, remote URL ingestion, temp_file_id ingest). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if server_config is not None | ||
| else _DEFAULT_UPLOAD_TTL_SECONDS | ||
| ) | ||
| suffix = Path(path).suffix |
| for file_path in temp_dir.rglob("*"): | ||
| if not file_path.is_file(): | ||
| continue | ||
| try: | ||
| file_age = now - file_path.stat().st_mtime | ||
| if file_age > max_age_seconds: | ||
| file_path.unlink(missing_ok=True) | ||
|
|
||
| # Also clean up corresponding .ov_upload.meta file | ||
| if not file_path.name.endswith(".ov_upload.meta"): | ||
| meta_path = temp_dir / f"{file_path.name}.ov_upload.meta" | ||
| if meta_path.exists(): | ||
| meta_path.unlink(missing_ok=True) | ||
| except OSError: | ||
| continue | ||
| if file_age <= max_age_seconds: | ||
| continue | ||
| file_path.unlink(missing_ok=True) | ||
| if not file_path.name.endswith(".ov_upload.meta"): | ||
| meta_path = file_path.parent / f"{file_path.name}.ov_upload.meta" | ||
| if meta_path.exists(): | ||
| meta_path.unlink(missing_ok=True) |
| bytes_written = 0 | ||
| try: | ||
| with open(target_path, "wb") as out: | ||
| while True: | ||
| chunk = await file.read(64 * 1024) | ||
| if not chunk: | ||
| break | ||
| bytes_written += len(chunk) | ||
| if bytes_written > max_bytes: | ||
| raise HTTPException(status_code=413, detail="upload exceeds max_bytes") | ||
| out.write(chunk) | ||
| except HTTPException: | ||
| target_path.unlink(missing_ok=True) | ||
| raise |
| @router.post("/resources/temp_upload_signed") | ||
| async def temp_upload_signed( | ||
| request: Request, | ||
| file: UploadFile = File(...), | ||
| token: str = Query(..., min_length=1), | ||
| temp_file_id: str = Query(..., min_length=1), | ||
| ): | ||
| """Upload via short-lived signed token. Used by the MCP progressive-upload flow. |
| """ | ||
| import json | ||
|
|
||
| if not _TEMP_FILE_ID_RE.match(temp_file_id): |
… hint Adds a third fallback layer between explicit operator config and listen-host fallback: capture X-Forwarded-Host / X-Forwarded-Proto / Host headers in the MCP identity middleware and use them when neither OPENVIKING_PUBLIC_BASE_URL nor ServerConfig.public_base_url is set. Resolution order is now: env > config > X-Forwarded-* > Host > listen-host. The first two are explicit; the rest are inferred. When an inferred source is used, the add_resource prose response appends a troubleshooting hint asking the user to set OPENVIKING_PUBLIC_BASE_URL on the server if upload fails — because inferred URLs can be wrong if the reverse-proxy chain doesn't forward X-Forwarded headers, or if the server listens on 0.0.0.0. Documents the variable in docker-compose.yml (commented-out env block) and in the MCP integration guides (zh + en) — covers when it's required and the full resolution chain.
…e-upload # Conflicts: # openviking/server/mcp_endpoint.py # tests/server/test_mcp_endpoint.py
- Relax temp_file_id regex from `[a-zA-Z0-9]+` extension to any non-separator
chars, and dedupe to a single TEMP_FILE_ID_RE in local_input_guard. The old
pattern rejected `Path("report.my-file").suffix == ".my-file"` and similar
legitimate filenames, breaking the progressive upload flow.
- Hoist `_resolve_temp_or_path` import to module level in mcp_endpoint
(verified no circular import).
- Add `_is_safe_namespace_component` defense-in-depth at the signed-upload
route so a future code path that mints tokens from less-trusted input
still cannot escape the per-tenant directory.
- Broaden partial-file cleanup to any exception via try/finally + flag,
not just HTTPException — prevents OSError/IO failures from leaving
half-written files behind.
- Scope `_cleanup_temp_files` to the tenant subdir at the signed-upload
route to bound the rglob scan; the legacy `/temp_upload` route still
cleans the root level.
- Add round-trip test for unusual filename extensions (.my-file, .bak~, .中文).
|
Thanks for the review. Walked through each comment; here's the disposition: Comment on Comment on Reviewer-Guide hint (in-function import of Reviewer-Guide hint (defense-in-depth namespace check at signed-upload route) — fixed. Comment on Comment on Comment on All 50 new tests still green; ruff clean. |
Description
Adds local-file upload support to the MCP
add_resourcetool via a progressive single-entrypoint flow, eliminating the need for theovCLI in sandboxed agent environments (Claude web, Manus, etc.) where the local filesystem is inaccessible and CLI installation is blocked.The agent always calls one tool —
add_resource— and the server orchestrates a two-step dance for local files: mint a short-lived upload token, return proseStep 1 / Step 2instructions pointing at a new HTTP-POST endpoint, agent uploads, agent re-callsadd_resource(temp_file_id=…)to ingest.This complements the local-path rejection flow on PR #1846 (which restricts
add_resourceto remote URLs only and points local-file users atovCLI) — that PR remains correct for desktop Claude Code, this one extends the surface to cover sandboxed clients without changing the MCP tool count.Related Issue
None — driven by ongoing discussion of Claude web / Manus file-upload UX.
Type of Change
Changes Made
openviking/server/upload_token_store.py— in-memorydict[token, TokenInfo]indexed by 6-char base62 token.dict.popdoubles as the consume-and-burn primitive (no separate replay set, no on-disk state, no signing key). 10-min TTL by default, configurable viaServerConfig.upload_signed_ttl_seconds.POST /api/v1/resources/temp_upload_signed— does NOT useDepends(get_request_context); the token carries the bound(account_id, user_id, temp_file_id). Writes to per-tenant subdir{upload_dir}/{aid}/{uid}/{tfid}plus.ov_upload.metasidecar. EnforcesContent-Lengthand streaming size cap (upload_signed_max_bytes, default 100MB).add_resourceextended — new signature(path="", temp_file_id="", description=""). Three branches: remote URL → unchanged; local path → mint token, return prose multi-step upload instruction (URL on its own indented line for verbatim copy by the agent);temp_file_idset → resolve via per-tenant lookup, ingest via existing service pipeline.resolve_uploaded_temp_file_iddual-lookup — accepts optionalaccount_id/user_idkwargs; tries per-tenant subdir first, falls back to flat layout so legacy CLIPOST /temp_uploadkeeps working unchanged._cleanup_temp_filesrecurses subdirs —Path.rglobinstead ofiterdirso per-tenant cleanup happens.ServerConfig.public_base_url>http://{host}:{port}. Production deployments behind MCP proxy + nginx must setOPENVIKING_PUBLIC_BASE_URL(the agent-facing URL is not derivable from the server's request scope)._resolve_temp_or_pathhelper — extracted from the RESTadd_resourceroute so the MCP tool reuses the same temp-file resolution logic.Testing
22 new tests across 3 files, all green:
tests/server/test_upload_token_store.pytests/server/test_resources_temp_upload_signed.pytests/server/test_mcp_endpoint.py(4 added + 1 replaced)Regression check: same
-k 'resources or temp_upload or add_resource or local_input_security'filter onmainvs this branch — 8 failures + 2 errors identical on both (all pre-existing). Branch adds 12 net new passing tests (37 → 49 in that filter).Lint:
ruff check+ruff format --checkclean on all touched files.Checklist
Additional Notes
Security envelope of the token scheme. 6-char base62 = 62^6 ≈ 5.7×10^10 combinations. With 10-min TTL and a sustained 1K req/s brute force, expected attempts ≈ 6×10^5 → ~1 in 95K hit per live token. The token is bound to a specific
temp_file_id(server-chosen UUID, 122 bits) and a specific tenant subdir; even a forged token can only write into that tenant's per-tenant temp dir under the server-chosen filename — no cross-tenant write, no attacker-chosen path. If production traffic patterns demand more headroom, bump_TOKEN_LENGTHinupload_token_store.py.Single-worker only. The token store is process-local. Documented limitation; multi-worker shared state (sqlite/redis) is out of scope for this PR.
Deployment topology. OpenViking prod commonly sits behind an MCP proxy (for OAuth-only clients) plus nginx reverse proxy. The agent-facing URL is the MCP proxy's external endpoint, which the server can't infer from
request.urlorX-Forwarded-*(those at most reflect nginx's upstream). So the resolver requires explicitOPENVIKING_PUBLIC_BASE_URLfor production; local dev getshttp://{listen_host}:{listen_port}automatically.Out of scope (potential follow-ups). (a) Migrating legacy CLI
POST /temp_uploadto also write per-tenant — non-breaking but removes the dual-lookup. (b) Inline base64 escape hatch (add_resource_inline) for tiny files in environments where outbound HTTP from the agent sandbox is blocked. (c) Multi-worker shared replay store.