Skip to content

feat(claude-code-plugin): session-start profile injection, /ov status command, tool-output capture cleanup#1914

Merged
ZaynJarvis merged 4 commits intovolcengine:mainfrom
t0saki:feat/cc-plugin-session-start-profile-injection
May 8, 2026
Merged

feat(claude-code-plugin): session-start profile injection, /ov status command, tool-output capture cleanup#1914
ZaynJarvis merged 4 commits intovolcengine:mainfrom
t0saki:feat/cc-plugin-session-start-profile-injection

Conversation

@t0saki
Copy link
Copy Markdown
Contributor

@t0saki t0saki commented May 8, 2026

Description

Three independent claude-code-memory-plugin improvements bundled together (file-disjoint, all CC-plugin-scoped):

  1. Session-start profile injection — every session begins with the user's profile.md plus a description-annotated listing of preferences/ and entities/, instead of relying on auto-recall to hit the right keyword.
  2. /ov slash command — tight five-section status report (server health, identity, last injection / recall, toggle state, auth source) for visibility into plugin state.
  3. Tool-output capture cleanup (cherry-picked from fix/cc-memory-plugin-drop-tool-output) — drop tool_result content by default; keep tool input verbatim. Renames TOOL_BLOCK_MAX_CHARSTOOL_RESULT_MAX_CHARS, default 0 (drop). Applies symmetrically to auto-capture.mjs and subagent-stop.mjs.

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Bug fix (non-breaking change that fixes an issue) — tool-output capture default

Changes Made

1. Session-start profile injection

  • scripts/session-start.mjs — restructured so profile injection runs on every source (startup/clear/resume/compact); existing archive-overview injection stays gated on resume/compact. Both halves compose into a single <openviking-context source="..."> envelope.
  • New scripts/lib/profile-inject.mjs — recursive ls (/api/v1/fs/ls?recursive=true&output=agent) flattens the two-level <owner>/<file>.md layout under preferences/entities. Listing headers show full URIs; child entries are relative paths so the agent can reconstruct any leaf URI by concatenation.
  • CJK-aware token estimator — codepoint ≥ 0x3000 counts at 1.5 tokens, else chars/4. The plugin's flat chars/4 heuristic undercounts CJK content by 4-6×; this localized fix means a "10k token budget" reflects real tokenizer cost for Chinese content.
  • Head + middle-elision + tail truncation — when profile exceeds its sub-cap, keep first 8 lines (identity block) and as many trailing lines as fit (recent timeline), elide the noisy middle.
  • New env vars in scripts/config.mjs:
    • OPENVIKING_NO_AUTO_INJECT (bool, default false) — kill switch for the new injection; auto-recall is unaffected.
    • OPENVIKING_PROFILE_TOKEN_BUDGET (int, default 10000) — total cap; profile gets up to half, listings split the remainder evenly with ... +N more, use \memory_recall`` truncation tails.
  • Audit file — composed payload mirrored to ~/.openviking/last_inject.md on every session_start.
  • Subagents skippedsubagent-start.mjs is unmodified.

2. /ov slash command

  • commands/ov.md + scripts/ov-status.mjs — five-section report: server URL + /health latency, resolved identity, last session-start injection, last auto-recall, toggle state.
  • Final line shows where url + api_key were actually resolved from (env / ovcli.conf / default), per the same priority chain config.mjs uses, rather than enumerating every file on disk that could have contributed.

3. Tool-output capture cleanup (cherry-picked)

  • Tool output (tool_result content) is mostly noise for memory extraction — agent prose around the call already summarizes the meaningful bit. Storing raw 4 KB of fetched markdown inflates session size and extraction token cost.
  • Tool input (agent-authored URLs / paths / commands) should not be truncated — pathologically long input is itself signal.
  • Both changes applied to auto-capture.mjs and subagent-stop.mjs.

Testing

  • Manual end-to-end smoke against ov-dev.tosaki.top:
    • Default budget (10000): profile + 27 prefs + 115 entities, ~5.9 KB / ~1.7k CJK-aware tokens, no truncation.
    • OPENVIKING_PROFILE_TOKEN_BUDGET=1500: head + ... [profile middle elided] ... + tail, listings truncated with +N more tails.
    • OPENVIKING_NO_AUTO_INJECT=1: hook returns bare {"decision":"approve"} — no injection block.
    • Username-less config (url + api_key only): plugin correctly resolves user space via /api/v1/system/status driven by api_key alone.
    • /ov status command: prints all five sections; Auth: url from env, api_key from env when env vars are set, file path otherwise.
  • node --check passes on all modified/new files.
  • No new automated tests added — the plugin has no test harness today.
  • Tested on:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style (matches existing plugin patterns: ES modules, makeFetchJSON closure, writeJsonState for runtime state, <openviking-context> envelope for additionalContext).
  • I have performed a self-review of my code.
  • Comments explain non-obvious decisions (CJK estimator rationale, head+tail elision rationale, why auth-source is computed instead of file-listed).
  • No documentation changes — README sections for new env vars and /ov command are deferred to a follow-up if reviewers want them.
  • My changes generate no new warnings.

Additional Notes

  • plugin.json version intentionally not bumped in this PR — happy to bump to 0.3.0 if reviewers prefer.
  • Out of scope (deferred per design discussion):
    • Splitting profile.md into stable / timeline halves upstream in memory_extractor — handled there if/when noise becomes a problem.
    • Auto-recall ↔ session-start injection dedup — defer until measured overlap is meaningful.
    • Top-N filtering / per-entry frontmatter descriptions — current ls returns empty abstract for leaf .md files; filenames remain self-describing.

t0saki added 2 commits May 8, 2026 17:18
…ssion start

Previously, profile/preferences/entities only reached the agent when the
user's prompt happened to trigger semantic auto-recall (UserPromptSubmit).
Trivial first prompts (e.g. `git status`) left the agent with no identity
context.

Session-start hook now always builds a profile injection block —
profile.md plus a description-annotated recursive ls of preferences/ and
entities/ — composed into the same <openviking-context source="..."> envelope
that already carries archive context on resume/compact. Subagents are
unaffected (they go through subagent-start.mjs).

Budget enforcement uses a CJK-aware token estimate (codepoint >= 0x3000
counts at 1.5 tokens, else chars/4) so a "10k token budget" reflects real
tokenizer cost for Chinese content rather than the 4-6× undercount the
flat chars/4 heuristic produces.

Profile truncation on overflow keeps the head (identity facts) and tail
(most-recent timeline events), eliding the noisy middle, instead of
hard-cutting at the head.

Each invocation mirrors the composed payload to ~/.openviking/last_inject.md
for user-facing audit.

New env vars / config (config.mjs):
- OPENVIKING_NO_AUTO_INJECT (bool, default false) — kill switch for the new
  injection; auto-recall is unaffected.
- OPENVIKING_PROFILE_TOKEN_BUDGET (int, default 10000) — total cap for the
  block; profile gets up to half, listings split the remainder.
Tight five-section status report covering: server URL + /health latency,
resolved identity (account/user/agent), last session-start injection
(size, age, audit-file path), last auto-recall (item count, top score,
token budget use), and toggle state for the three injection paths
(auto-inject / auto-recall / auto-capture).

Final line shows where url + api_key were actually resolved from (env vs
ovcli.conf vs default), per the same priority chain config.mjs uses —
rather than enumerating every file on disk that *could have* contributed.

Reuses existing ~/.openviking/state/ files (last-recall.json,
last-session-event.json) and the audit file written by session-start.mjs;
no new server-side state.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: feat(claude-code-plugin): session-start profile injection

Relevant files:

  • examples/claude-code-memory-plugin/scripts/config.mjs
  • examples/claude-code-memory-plugin/scripts/lib/profile-inject.mjs
  • examples/claude-code-memory-plugin/scripts/session-start.mjs

Sub-PR theme: feat(claude-code-plugin): add /ov status slash command

Relevant files:

  • examples/claude-code-memory-plugin/commands/ov.md
  • examples/claude-code-memory-plugin/scripts/ov-status.mjs

⚡ Recommended focus areas for review

Token Budget Inconsistency

The elideProfile function uses a fixed maxChars = maxTokens * 4 heuristic to truncate content, but the rest of the module uses a CJK-aware token estimator (estimateTokens). This can cause the profile block to exceed the token budget for CJK-heavy content, since 1 CJK char is counted as 1.5 tokens (not 0.25 tokens as chars/4 implies).

function elideProfile(content, maxTokens) {
  const maxChars = Math.max(400, maxTokens * 4);
  if (content.length <= maxChars) return content;

  const HEAD_LINES = 8;
  const ELLIPSIS = "\n... [profile middle elided] ...\n";
  const lines = content.split("\n");

  const fallbackHeadTruncate = () =>
    content.slice(0, maxChars).trimEnd() + "\n... [profile truncated]";

  if (lines.length <= HEAD_LINES + 4) return fallbackHeadTruncate();

  const head = lines.slice(0, HEAD_LINES).join("\n");
  const reserveForTail = maxChars - head.length - ELLIPSIS.length;
  if (reserveForTail < 200) return fallbackHeadTruncate();

  let tailChars = 0;
  let tailStart = lines.length;
  for (let i = lines.length - 1; i > HEAD_LINES; i--) {
    const lineLen = lines[i].length + 1;
    if (tailChars + lineLen > reserveForTail) break;
    tailChars += lineLen;
    tailStart = i;
  }
  if (tailStart >= lines.length - 1) return fallbackHeadTruncate();

  return `${head}${ELLIPSIS}${lines.slice(tailStart).join("\n")}`;
}

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

PR Code Suggestions ✨

No code suggestions found for the PR.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds session-start context injection to the Claude Code OpenViking memory plugin so the agent receives identity/profile context at the beginning of every session (startup/clear/resume/compact), and introduces a new /ov slash command to display a concise plugin/server status report.

Changes:

  • Restructures session-start.mjs to compose a profile block (always, unless disabled) plus an archive block (resume/compact only) into a single <openviking-context source="..."> payload and mirrors it to ~/.openviking/last_inject.md.
  • Introduces scripts/lib/profile-inject.mjs to fetch profile.md and list preferences/ + entities/ with abstracts under a token budget, including head/tail elision logic for oversized profiles.
  • Adds /ov command wiring (commands/ov.md) and implementation (scripts/ov-status.mjs) to report health/identity/last injection + recall/toggles/auth source.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
examples/claude-code-memory-plugin/scripts/session-start.mjs Composes session-start profile + archive injections into a single context envelope and writes an audit file.
examples/claude-code-memory-plugin/scripts/lib/profile-inject.mjs New helper to build the <user-profile> + <available-memories> block with budgeting and elision.
examples/claude-code-memory-plugin/scripts/ov-status.mjs New CLI script backing /ov to print server/identity/state/toggle/auth source status.
examples/claude-code-memory-plugin/scripts/config.mjs Adds env/config support for OPENVIKING_NO_AUTO_INJECT and OPENVIKING_PROFILE_TOKEN_BUDGET.
examples/claude-code-memory-plugin/commands/ov.md Registers the /ov slash command to invoke the status script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* 1. Profile injection (every source: startup/clear/resume/compact unless
* OPENVIKING_NO_AUTO_INJECT=1): full profile.md + description-annotated
* ls of preferences/ and entities/. Total capped at
* OPENVIKING_PROFILE_TOKEN_BUDGET (default 5000 tokens).
}

function estimateTokens(text) {
return text ? Math.ceil(text.length / 4) : 0;
* viking://user/<space>/memories/preferences/ (ls with abstracts)
* viking://user/<space>/memories/entities/ (ls with abstracts)
*
* Budget enforced via chars/4 token estimate (matches auto-recall.mjs:205-207).
Comment on lines +128 to +149
const maxChars = Math.max(400, maxTokens * 4);
if (content.length <= maxChars) return content;

const HEAD_LINES = 8;
const ELLIPSIS = "\n... [profile middle elided] ...\n";
const lines = content.split("\n");

const fallbackHeadTruncate = () =>
content.slice(0, maxChars).trimEnd() + "\n... [profile truncated]";

if (lines.length <= HEAD_LINES + 4) return fallbackHeadTruncate();

const head = lines.slice(0, HEAD_LINES).join("\n");
const reserveForTail = maxChars - head.length - ELLIPSIS.length;
if (reserveForTail < 200) return fallbackHeadTruncate();

let tailChars = 0;
let tailStart = lines.length;
for (let i = lines.length - 1; i > HEAD_LINES; i--) {
const lineLen = lines[i].length + 1;
if (tailChars + lineLen > reserveForTail) break;
tailChars += lineLen;
Comment on lines +157 to +178
function formatListing(headerUri, entries, budgetTokens) {
if (entries.length === 0) return { lines: [], used: 0, dropped: 0 };
// Header is the full directory URI; child lines are relative paths so the
// agent can reconstruct each leaf's full URI by concatenation while the
// listing itself stays compact.
const header = ` ${headerUri}/`;
const lines = [header];
let used = estimateTokens(header);
let included = 0;
for (let i = 0; i < entries.length; i++) {
const e = entries[i];
const desc = e.abstract
? ` — ${e.abstract.replace(/\s+/g, " ").slice(0, 200)}`
: "";
const line = ` - ${e.name}${desc}`;
const tokens = estimateTokens(line);
if (used + tokens > budgetTokens && included > 0) {
const remaining = entries.length - i;
const tail = ` ... +${remaining} more, use \`memory_recall\``;
lines.push(tail);
return { lines, used: used + estimateTokens(tail), dropped: remaining };
}
chars: block.length,
tokens: estimateTokens(block),
profileUri,
profileBytes: profile?.length ?? 0,
* - Last session-start injection (size, age, audit path)
* - Last auto-recall (item count, top score, token budget use)
* - Toggle state for the three injection paths
* - Active config file + env overrides currently in effect
Comment on lines +111 to +123
// 5. Auth source — single source of truth for url+api_key. We compute which
// file/env actually drove each value rather than listing every file on disk
// that "could have" contributed: that gives the false impression of competing
// sources when there's just one chain (env → ovcli.conf → ov.conf → default,
// first hit wins per field).
const cliConfPath = expandHome(process.env.OPENVIKING_CLI_CONFIG_FILE
|| join(homedir(), ".openviking", "ovcli.conf"));
const cliExists = existsSync(cliConfPath);
const urlSrc = process.env.OPENVIKING_URL || process.env.OPENVIKING_BASE_URL
? "env" : (cliExists ? homeShort(cliConfPath) : "default");
const keySrc = process.env.OPENVIKING_API_KEY || process.env.OPENVIKING_BEARER_TOKEN
? "env" : (cliExists ? homeShort(cliConfPath) : "(none)");
console.log(`Auth: url from ${urlSrc}, api_key from ${keySrc}`);
Comment on lines 113 to 123
@@ -105,30 +123,75 @@ async function main() {
return;
…erbatim

After volcengine#1849 / volcengine#1850 the plugin captured tool I/O at a 4 KB-per-block cap
under one knob (TOOL_BLOCK_MAX_CHARS = 4096). Field thinking surfaced
two refinements:

1. **Tool *output* (tool_result content) is mostly noise for memory
   extraction.** Memory extraction cares about user preferences, project
   context, decisions, and what the agent did — not about the bytes a
   tool happened to return. The agent's prose around the tool call almost
   always summarizes the meaningful bit ("I checked the docs and confirmed
   X"); the raw 4 KB of fetched markdown adds nothing the prose doesn't
   already cover. Storing it just inflates session size and extraction
   token cost.

   Renamed TOOL_BLOCK_MAX_CHARS → TOOL_RESULT_MAX_CHARS and changed
   default to 0. When 0, tool_result blocks are dropped entirely. Operators
   wanting replay-style archives can set >0 to retain truncated output.

2. **Tool *input* should not be truncated.** Inputs are agent-authored
   (URLs, file paths, queries, commands). They're usually short, and a
   pathologically long input is itself signal worth surfacing — a
   memory extractor seeing "agent ran `bash` with a 10 KB script" learns
   something the truncated form would hide.

   Replaced truncateForLog(block.input) with formatToolInput(block.input),
   which JSON-serializes structured inputs but applies no length cap.

Both changes apply symmetrically to auto-capture.mjs and subagent-stop.mjs.
@t0saki t0saki changed the title feat(claude-code-plugin): session-start profile injection + /ov status command feat(claude-code-plugin): session-start profile injection, /ov status command, tool-output capture cleanup May 8, 2026
Nine review comments, all valid:

profile-inject.mjs:
- header doc said chars/4 but estimateTokens is CJK-aware → fixed
- estimateTokens now exported so callers can log token counts that match
  the budget logic
- elideProfile derived maxChars from maxTokens*4, but the estimator counts
  CJK at 1.5 tokens/char → for CJK profiles the truncated string could
  still bust the token cap. New tokensToCharsBudget() converts using the
  content's actual CJK density
- formatListing always included header + first entry, so very small budgets
  silently violated the cap. Now: stub-out when header alone exceeds
  budget; only emit "+N more" tail when it fits; close silently otherwise
- profileBytes was UTF-16 char count, labeled "B" → renamed to profileChars

session-start.mjs:
- header doc said budget=5000, code default is 10000 → doc fix
- local estimateTokens was flat chars/4 while injection enforces CJK-aware
  budget → import the shared estimator from profile-inject so logs match
  reality
- /health probe ran even when no injection path would fire (e.g.
  NO_AUTO_INJECT=1 + startup) → short-circuit before the network call
- profileBytes references updated to profileChars

ov-status.mjs:
- header doc said "Active config file + env overrides" but the bottom
  block was removed earlier → header fixed to describe Auth source
- auth source detection only considered env + ovcli.conf; could misreport
  "(none)" when key was actually coming from ov.conf claude_code.apiKey
  or server.root_api_key. Now mirrors config.mjs's full priority chain
  (env → ovcli.conf → ov.conf → default)
@t0saki
Copy link
Copy Markdown
Contributor Author

t0saki commented May 8, 2026

Addressed all 9 Copilot comments in d-th commit on this branch (fix(claude-code-plugin): address Copilot review on PR #1914).

# File:line Resolution
1 session-start.mjs:12 header doc updated to match default 10000
2 session-start.mjs:63 imported the CJK-aware estimateTokens from profile-inject.mjs; debug logs now reflect real budget
3 profile-inject.mjs:9 header doc updated: "CJK-aware"
4 profile-inject.mjs elideProfile new tokensToCharsBudget() converts maxTokens→maxChars using this content's CJK density; truncated profile is now guaranteed to fit the token sub-cap
5 profile-inject.mjs formatListing header alone over budget → emit single-line stub; "+N more" tail only emitted when it fits; close silently otherwise
6 profile-inject.mjs:246 profileBytesprofileChars (and log line updated)
7 ov-status.mjs:11 header doc updated to describe "Auth source" instead of the dropped config-file listing
8 ov-status.mjs:123 auth source detection extended to follow config.mjs's full priority: env → ovcli.confov.conf (claude_code.apiKey / server.root_api_key / server.url) → default
9 session-start.mjs:123 short-circuit before /health probe when neither injection path will run for the current source/config

Smoke-tested: default budget, NO_AUTO_INJECT=1 (now skips network probe and returns immediately), PROFILE_TOKEN_BUDGET=500 (formatListing stays within cap, emits truncation tails), /ov (auth source line correct).

Copy link
Copy Markdown
Collaborator

@ZaynJarvis ZaynJarvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感受一下

@ZaynJarvis ZaynJarvis merged commit 31bcfd2 into volcengine:main May 8, 2026
5 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 8, 2026
jwayong added a commit to jwayong/OpenViking that referenced this pull request May 8, 2026
* feat(copilot-plugin): scaffold examples/copilot/ npm workspace

Sets up the workspace root + three packages (shared / vscode-extension /
cli-plugin) with TypeScript + Vitest. Each package has a stub source file
and smoke test wired to the shared package, so consumers fail at the
package boundary rather than via missing types.

`npm run typecheck` and `npm test` from examples/copilot/ exercise all
three workspaces.

Refs #3

* feat(copilot-plugin): port config loader to shared/src/config.ts

Typed PluginConfig + loadConfig({agentIdDefault, hostOverrides}) +
isPluginEnabled. Resolution chain hostOverrides > env > ovcli.conf >
ov.conf > defaults preserves the Claude Code plugin's config contract so
one config file drives all three plugins.

ov.conf gains a new `copilot` block for tuning fields; the legacy
`claude_code` block remains a fall-through. envBool/num/str helpers live
in util/env.ts so other shared modules can reuse them.

27 unit tests cover precedence, copilot-vs-claude_code blocks,
bypass-glob expansion (CSV vs array vs hostOverrides), enable/disable
matrix, and the clamps/floors.

Refs #4

* feat(copilot-plugin): add debug/logger.ts JSONL logger with secret redaction

Append-only JSONL logger for the Copilot plugins. No-op when cfg.debug is
false so it's safe to wire into hot paths. Redacts apiKey / bearer /
token / secret / password / authorization values at any depth so nothing
sensitive lands on disk. Cycle-safe via a WeakSet visited tracker —
self-referential fields emit "[CIRCULAR]" instead of overflowing the
stack. Rotates at 10 MB by renaming to <path>.1 (overwriting any prior
backup).

Path resolution comes from cfg.debugLogPath, which is already
env-/host-overridable via OPENVIKING_DEBUG_LOG per #4.

12 unit tests cover disabled mode, JSONL line shape, child scope,
redaction (top-level / nested / array / case-insensitive), rotation, and
the cycle / BigInt fail-safes.

Refs #14

* feat(copilot-plugin): add util/async-writer.ts detached-write helper

spawnDetached(opts) wraps child_process.spawn with detached:true, stdio
["pipe","ignore","ignore"], env merged on top of process.env, and
child.unref() so the parent doesn't wait. Worker errors hit the
debug-log via the optional logger; the parent never sees them. Sync
spawn failures (rare on POSIX) come back as {detached:false, error}.

runWriteTask picks the async or sync path based on cfg.writePathAsync,
falls back to the syncHandler when no asyncSpawn factory is provided or
the spawn returns detached:false. Always resolves — sync-handler errors
are caught and logged so the host hot path stays safe.

The shared package can't know where each host's worker entrypoint lives,
so the asyncSpawn(payload) factory is supplied per-host (the VS Code
extension and CLI plugin will provide their own worker scripts in later
phases). Naming + behaviour parity with the CC plugin's
scripts/lib/async-writer.mjs is intentional.

8 unit tests cover: stdin payload delivery to a real worker fixture, env
passthrough, nonexistent-command tolerance, sync handler invocation,
sync handler error swallow, async path returning in <150ms while worker
delays 200ms, spawn-failure fallback to sync, and async=true with no
asyncSpawn falling back to sync.

Refs #13

* feat(copilot-plugin): add session/id.ts with deriveSessionId

deriveSessionId(host, hostSessionId) returns
cp-<sha256(host + ':' + hostSessionId)>. The cp- prefix lets the
OpenViking server distinguish Copilot sessions from Claude Code (cc-)
sessions by URI alone, and using host as part of the digest input
guarantees the VS Code extension and the CLI plugin never collide on a
single OpenViking session even when they share an upstream session id.

Pure + deterministic: no clock, no env, no fs. The ':' separator
prevents collisions like ("ab","c") vs ("a","bc"). Hosts that don't
expose a stable session id (some CLI invocations) feed in a digest of
cwd + start-time as hostSessionId — the shared package doesn't dictate
how that's computed.

Eleven unit tests cover format (cp- prefix + 64 hex), determinism,
host- and hostSessionId-sensitivity, the empty-id edge case, the
':' separator collision-prevention, and three pinned SHA-256 vectors so
any accidental algorithm/prefix/separator change breaks the test loudly.

Refs #12

* feat(copilot-plugin): add OVClient HTTP REST client

Typed wrapper over the OpenViking REST API the CC plugin's hooks
already exercise in production:
  GET    /health
  POST   /api/v1/search/find
  POST   /api/v1/sessions/{id}/messages   (looped per turn)
  POST   /api/v1/sessions/{id}/commit
  GET    /api/v1/sessions/{id}/context?token_budget=N

Headers (Authorization + X-OpenViking-{Account,User,Agent}) are sent
conditionally on the corresponding cfg field being non-empty, matching
ov-session.mjs's behaviour so the same ovcli.conf drives both plugins
identically.

Bypass at the client layer: cfg.bypassSession or a glob match in
cfg.bypassSessionPatterns against bypassContext.cwd /
bypassContext.hostSessionId short-circuits every method with a
synthetic ok result — [] for recall, null for archive overview,
{skipped:N} for writes — so the host hot path stays cheap and silent
for scratch sessions.

Errors are normalised: HTTP non-2xx, server-side body.status==='error',
and thrown fetch errors all collapse to {ok:false, error:{message,
status?}}. fetchArchiveOverview maps 404 to ok:true with null overview
since "session not yet on the server" is the resume-prime norm rather
than an error.

Timeout via AbortController, default cfg.timeoutMs with a per-call
override on recall. fetchImpl is injectable for tests; in production it
defaults to global fetch (Node 22+).

20 unit tests cover header injection (with + without auth), wire shape
of each endpoint, bucket flattening on recall, multi-turn append with
mid-batch failure, commit force flag, 404→null on archive, URL
encoding, error-mapping (HTTP / wrapped status / thrown), abort-on-
timeout via fake timers, bypass via session/cwd/host-id, and the
no-fetch construction guard.

Refs #5

* feat(copilot-plugin): add capture/sanitize.ts injected-block stripper

Port of stripInjectedBlocks + sanitize from
examples/claude-code-memory-plugin/scripts/auto-capture.mjs, with two
additions: a new <copilot-context> marker (symmetric with
<openviking-context> for any host that surfaces a Copilot-side block)
and a public INJECTED_BLOCK_PATTERNS catalogue so Phase 0 spike
findings can append host-specific markers without touching the
sanitiser's call sites.

Two entry points by design:
- stripInjectedBlocks preserves whitespace (newlines, code fences) so
  output is safe to push back into OV as-is.
- sanitize additionally collapses whitespace; only suitable for
  classification (trigger detection, capture-or-skip), never for
  storage.

Six patterns covered: <relevant-memories>, <openviking-context>,
<copilot-context>, <system-reminder>, ^[Subagent Context]$ lines, and
NUL bytes.

19 unit tests cover marker removal (one per kind, multiple-per-line,
multi-line, all-mixed), whitespace preservation, idempotency, strict
sanitize collapsing, and a self-referential pollution scenario where a
recall block from turn N is embedded in turn N+1 — the stripped output
must contain zero marker bytes while the user's actual content
survives intact.

Refs #9

* feat(copilot-plugin): add recall/rank.ts pure ranking pipeline

1:1 port of the ranking logic in
examples/claude-code-memory-plugin/scripts/auto-recall.mjs (originally
lifted from openclaw-plugin/memory-ranking.ts). Boost magnitudes,
stopword list, and tokenisation regex are preserved verbatim so the
Copilot plugins behave identically against the same fixtures.

Pipeline (rankRecallHits): filter by clampScore >= scoreThreshold →
sort desc by rankItem(profile) → dedupeItems → truncate to recallLimit.
Sort is V8-stable so ties preserve input order, which is the contract.

Boosts (rankItem):
- leaf +0.12 for level=2 or *.md URIs
- event +0.10 when query is temporal AND item is in events/
- preference +0.08 when query is preference AND item is preferences/
- lexical overlap up to +0.20, computed against the URI + abstract
  text, capped at the first 8 query tokens

Dedupe rule: events/cases dedupe by URI (so distinct events with
matching abstracts don't collapse); everything else dedupes by abstract
content with the URI as a fallback key.

recallTokenBudget is intentionally NOT enforced here — that lives in
recall/format.ts (#7), since budget logic needs per-item content
resolution (HTTP fetch for level=2 items) which doesn't fit a pure
function. Documented in the file header.

estimateTokens (chars/4) is exported from this module so the formatter
and ranker speak the same units.

27 unit tests cover clampScore, estimateTokens, buildQueryProfile,
lexicalOverlapBoost (with the 8-token slice cap), rankItem (each boost
path verified independently), isEventOrCaseItem + dedupeItems
(event-by-URI vs default-by-abstract), and the rankRecallHits pipeline
(threshold filter, limit truncation, query-aware sort, stable ordering
on tied ranks, dedupe-after-sort keeps highest-ranked, empty + negative
recallLimit edge cases).

Refs #6

* feat(copilot-plugin): add recall/format.ts <openviking-context> renderer

formatRecallBlock(items, opts) emits the exact block shape the CC
plugin's auto-recall.mjs produces, so anything downstream that already
parses the CC output works against the Copilot output too — including
capture/sanitize.ts's stripper (round-trip verified by a sanitize-
based test).

Token budget: front-of-list items render with full content until the
budget is exhausted; subsequent items degrade to URI-only hints
instead of being dropped (preserves a useful pointer even when
content can't fit). The first content line always lands even if it
alone overflows the budget — recall returning one very long memory
is still better than an empty block. Mirrors openclaw spec §6.2.

Content resolution chain (resolveItemContent):
  - preferAbstract=true and abstract present → abstract
  - level=2 + fetchContent provided → fetchContent(uri); null/throw
    falls back to abstract → URI
  - otherwise → abstract → URI

maxContentChars caps each line; over-cap content is sliced + "..."
suffixed. Score rendered as whole-number percent. Type label comes
from item.type (set by OVClient when it flattens buckets); falls
back to "item" when missing.

Returns {block, contentCount, hintCount, budgetUsed} so the host can
log telemetry (mirrors the CC plugin's "injection_built" log line).
Empty input returns {block: null, ...} so the caller can skip
injection cleanly.

Also: small OVClient enhancement (still issue #5's contract — adding
a feature, not changing behaviour). flattenRecallBuckets now stamps
the singularised bucket name onto each hit as `type` ("memories" →
"memory") so the formatter can label items without the host
stitching the source label back. Server-set `type` fields are
preserved.

20 unit tests for formatRecallBlock cover block shape (open/close/
header verbatim, line format, score rounding, type fallback,
sanitize round-trip), token budget (URI-hint degradation, first-
item-always-included, budgetUsed accounting, tokenBudget=0 edge
case), content resolution (preferAbstract honoured, fetchContent
invoked, null/throw fallbacks, missing-abstract → URI, non-level-2
ignores fetchContent), and recallMaxContentChars truncation. Plus
2 OVClient tests for the type-tagging behaviour.

Refs #7

* feat(copilot-plugin): add capture/transcript.ts canonicaliser + adapters

canonicaliseTranscript(turns, opts) is the shared core: sanitize each
turn via stripInjectedBlocks → drop empty → drop assistant turns when
captureAssistantTurns=false → drop turns whose sanitized text exceeds
captureMaxLength. Preserves input order, never mutates input.

captureMaxLength is intentionally a *rejection threshold*, not a
truncation cap, mirroring the CC plugin's auto-capture.mjs:shouldCapture
semantic. Tool I/O inlining can balloon a turn easily, and storing a
half-truncated message is worse than skipping it. The commit-queue (#10)
decides whether to retry skipped turns elsewhere.

Two host adapters:
- fromVSCodeChatHistory uses duck-typed structural interfaces
  (VSCodeChatRequestTurnLike, VSCodeChatResponseTurnLike) so the
  shared package never imports `vscode`. Discriminates user vs
  assistant by `prompt: string` vs `response: string`. Hosts
  pre-flatten the real ChatResponseTurn.response parts array to a
  string before handing the turn off. Unknown turn shapes silently
  skipped (forward-compatible against future VS Code API additions).
- fromCaptureToolArgs handles the upcoming CLI MCP openviking_capture
  tool's `{user, assistant?}` payload — produces one or two
  CanonicalTurnInputs and defers to the core.

21 unit tests cover sanitisation (injected-block strip, multi-marker
mix, whitespace preservation), empty handling (block-only, whitespace-
only, undefined-text defensive), captureAssistantTurns on/off,
captureMaxLength rejection semantics (drops overlong, keeps at-cap,
0 disables), input order + non-mutation, VS Code adapter
discrimination + forward-compatible skip + filter + strip, and CLI
adapter (user-only, paired, empty-assistant fallback, strip).

Refs #11

* feat(copilot-plugin): add capture/commit-queue.ts CommitQueue

Per-session commit queue ties together OVClient (#5), async-writer
(#13), estimateTokens (#6), and the debug logger (#14). enqueue(turns)
appends to OV synchronously (so the server knows about the turns
before the next recall), accumulates a chars/4 token counter, and
when the counter crosses commitTokenThreshold dispatches a commit —
detached via runWriteTask when async=true + asyncSpawn provided,
awaited inline otherwise. flush() is the explicit force-commit for
SessionEnd / SubagentStop / PreCompact paths.

Failure modes:
- appendTurns failure → tokens NOT accumulated, no commit triggered
  (turns aren't on the server, so committing would archive nothing)
- commit failure → caught + logged, never thrown to host hot path;
  counter still resets eagerly so subsequent enqueues start fresh
  (data is on the server and the next commit catches it)

Double-commit guard: an in-flight dispatch sets flushInFlight which
short-circuits subsequent triggers until it resolves. Detached
dispatches resolve almost instantly; the guard mostly protects
against truly racing callers.

Queue takes client: CommitClient = Pick<OVClient, "appendTurns" |
"commit"> so tests can use a tiny mock without constructing a full
OVClient (no fetch stub needed).

14 unit tests cover append + token accumulation (empty no-op,
sub-threshold no commit, multi-enqueue accumulation, single-enqueue
threshold cross, exactly-at-threshold dispatch via >= comparison),
append-failure short-circuit, flush below threshold + with zero
pending, async-vs-sync dispatch (sync awaits 30ms inline, async with
asyncSpawn returns in <150ms while mock delays commit 200ms, async
without asyncSpawn falls back to inline per runWriteTask contract),
failure tolerance (commit fail doesn't throw + counter still
resets), and the double-commit guard via a re-entrant flush() during
an in-flight commit.

Refs #10

* feat(copilot-plugin): add recall/cache.ts short-TTL LRU cache

RecallCache amortises duplicate recall round-trips within a single
turn. In VS Code, the @openviking participant and the
openviking_recall LM tool can both fire on the same user prompt;
without the cache that's two HTTP calls to OV for an identical
(query, sessionId) pair. Default TTL 5s, default 64 entries.

Key shape: (query, sessionId, scope?) joined by Unit-Separator so
component fields with delimiters can't collide. JS Map preserves
insertion order, so LRU is implemented by delete + re-set on access
(promotes MRU end); eviction picks the first iterator entry. Expired
entries are deleted on read so size doesn't leak.

getOrFetch is a deliberate cache-miss-equals-no-cache path: it calls
the supplied fetch exactly once on miss and returns its result
verbatim. Successful results land in the cache; errors pass through
unchanged so the next turn can retry. Cache miss is therefore
bit-identical to wiring fetch directly — only side effect on a miss
is one extra cache.set after resolve.

ttlMs=0 disables caching entirely so hosts can flip the cache off
without rewriting call sites. Clock is injectable for deterministic
time tests. The TTL is fixed at write time — a hit promotes MRU
position but does NOT extend TTL (test pins this so future
optimisations don't silently change the contract).

17 unit tests cover defaults exposure, hit/miss + key discrimination
+ re-set TTL refresh, TTL expiry behaviour + hit-doesn't-extend
+ ttlMs=0 disable, LRU eviction (over-cap + hit-promotes-past-LRU
+ maxEntries=1), getOrFetch (no-fetch on hit, exactly-once on miss,
error not cached, miss-bit-identical-to-direct-fetch), and clear().

Refs #8

Phase 1 shared package is now feature-complete. Twelve modules:
config, env utils, debug logger, async-writer, session id, OVClient,
sanitize, rank, format, transcript, commit-queue, cache.

* feat(copilot-plugin): scaffold VS Code extension manifest + activation

Real VS Code manifest with chatParticipants (openviking.memory),
languageModelTools (empty — populated by #22 + #26),
configuration (8 openviking.* properties; full ~20 in #19), and an
esbuild → vsce pipeline that produces openviking-copilot.vsix from
src/extension.ts. The bundle is CJS so VS Code's loader works
cleanly, with `vscode` marked external. The extension package no
longer carries `type: module` since the bundled output is CJS.

Two-file activation split:
- extension-core.ts (vscode-free) — buildActivationHandle reads
  PluginConfig with agentIdDefault: copilot-vscode, builds
  OVClient + DebugLogger, exposes a queue registry. Returns null
  when isPluginEnabled() is false so disabled installs cost
  nothing. registerCommitQueue is idempotent. runDeactivate
  parallel-flushes every registered queue with try/catch around
  each so one failing flush can't block the others.
- extension.ts (the vscode adapter) — thin: imports vscode, maps
  workspace settings to Partial<PluginConfig> overrides, hands off
  to extension-core, registers runDeactivate as a Disposable.

Splitting the core out of the adapter is the standard pattern for
testable VS Code extensions: extension-core.test.ts runs in plain
Vitest, no @vscode/test-electron needed for the activation logic
itself.

Tests: 13 new extension-core tests cover gating (disabled by
default, enabledOverride forces, env force-enable), config flow
(hostOverrides win, env precedence, agent-id default, logger +
client wired), and queue lifecycle (idempotent register, parallel
flush on deactivate, no-op for null handle, throwing flush
doesn't block siblings, empty registry completes cleanly).
Crucially, beforeEach points OPENVIKING_CONFIG_FILE /
OPENVIKING_CLI_CONFIG_FILE at /tmp paths that don't exist so
tests are deterministic regardless of the developer's real
~/.openviking/ovcli.conf.

`npm run package` produces a 9.05 KB .vsix containing the manifest
+ a 22.96 KB CJS bundle. .vscodeignore strips test/source/config
files from the artefact.

Refs #15

* feat(copilot-plugin): register @openviking chat participant + slash commands

Splits the chat participant across two files following the
extension-core pattern from #15:
- participant-core.ts (vscode-free) — buildRecallContext (recall +
  rank + format with cache short-circuit), runStore (enqueue + force
  flush), runForget (uri validation + DELETE). All return user-facing
  message strings; no throws.
- participant.ts — vscode adapter. registerOpenVikingParticipant
  derives the OV session id from the first workspace folder via
  deriveSessionId("copilot-vscode", ...), constructs ParticipantState
  with a per-session CommitQueue and RecallCache, registers the queue
  with the activation handle so it gets flushed on deactivate.

Manifest declares /recall, /store, /forget on the openviking.memory
participant. handleRequest dispatches on request.command:
  - /recall — buildRecallContext, render block in a code fence (or
    "_No relevant memories found._" when empty)
  - /store — runStore (rejects empty input, force-flushes after
    enqueue so the memory archives immediately)
  - /forget — runForget (rejects non-viking:// URIs without a network
    call, surfaces server errors as ⚠️)
  - default — buildRecallContext, push the block as the first stream
    chunk, delegate to request.model.sendRequest with the recalled
    context as a leading user message, pipe LM tokens into the
    stream, then captureTurn (sanitise via canonicaliseTranscript,
    enqueue against the per-session CommitQueue) once the stream
    completes. autoCapture=false short-circuits the capture path.

OVClient gains forget(uri, {recursive?}) — small enhancement to #5
that issues DELETE /api/v1/fs?uri=...&recursive=... with the same
bypass + tenant-header behaviour as the rest of the client. The
internal fetchJSON method-type widens to include "DELETE".

extension.ts wires registerOpenVikingParticipant after activation;
the participant disposable lands in context.subscriptions so VS Code
disposes it cleanly.

Tests: 14 new participant-core tests (recall short-circuits, hits
render, scoreThreshold drops below-floor hits, second call hits the
cache, transport error → empty result, store rejects empty +
appends + force-flushes + reports failure, forget validates uri +
issues DELETE + surfaces error) + 3 new OVClient.forget tests
(URL-encodes uri, appends &recursive=true, bypass short-circuit).
228/228 passing across all workspaces. Bundle still builds cleanly
to dist/extension.cjs at 45.5 KB.

Refs #16

* feat(copilot-plugin): extract capture/on-response.ts entry point

Pulls the canonicalise + enqueue dance out of participant.ts into a
dedicated capture/on-response.ts module. captureChatTurn(opts) is the
single entry point that takes raw user/assistant text + the
participant's cfg/queue/logger, runs through canonicaliseTranscript
(sanitise → filter assistant if disabled → cap-overlong), and feeds
the surviving turns to queue.enqueue. Returns {enqueued, skipped,
triggeredCommit, pendingAfter} for telemetry; always resolves.

VS Code's current chat-extension API does not expose a global
"any participant produced a response" event — the participant's
request handler IS the subscription point. Phase 3 (#25) will plug
additional sources (e.g. default-chat events, contingent on Phase 0)
into this same entry point. Documented in the file header so the
abstraction's intent is obvious to whoever picks up #25.

Bypass is transparent: OVClient short-circuits at appendTurns +
commit when cfg.bypassSession is true or any bypassSessionPatterns
match. Tests confirm capturing in a bypassed session reports success
to the caller while issuing zero HTTP calls.

Async-detach: with cfg.writePathAsync=true + an asyncSpawn factory
on the CommitQueue, the actual commit RTT happens in a detached
worker — verified by a test that wraps client.commit in a forever-
blocking promise and asserts the total elapsed stays under 150ms.

8 new tests across 4 groups: gating (autoCapture=false /
empty-after-canonical → no enqueue / no network), happy paths
(both-roles enqueued, user-only mode drops assistant, strip-injected-
blocks pollution-test on BOTH user AND assistant text), bypass
transparency (bypassSession + bypassSessionPatterns), and the
async-detached path.

participant.ts now delegates captureTurn → captureChatTurn — no
behavioural change, just consolidation. Bundle still builds clean
(46.4 KB).

Refs #17

* feat(copilot-plugin): add MCP server definition provider for VS Code

Wire OpenViking's HTTP MCP endpoint (`/mcp`) into Copilot Chat via
VS Code's runtime mcpServerDefinitionProviders API:

- package.json contributes mcpServerDefinitionProviders[id=openviking]
- src/mcp/manifest.ts is the pure builder: takes a resolved
  PluginConfig and emits {name, uri, headers}. Authorization +
  X-OpenViking-{Account,User,Agent} headers attached only when their
  cfg fields are non-empty, exactly matching OVClient.buildHeaders so
  MCP and REST traffic land with the same identity on the server.
  baseUrl trailing slashes are stripped before appending /mcp.
- src/mcp/register.ts is the vscode adapter:
  registerOpenVikingMcpProvider(handle) defensively feature-detects
  both vscode.lm.registerMcpServerDefinitionProvider and
  vscode.McpHttpServerDefinition. When either is missing on an older
  VS Code build, logs `mcp_provider_unavailable` and returns a no-op
  disposable so the extension still loads cleanly. The provider re-
  reads cfg every call so workspace-settings changes flow through on
  the next invocation.

Design choice (documented in the file header): runtime provider over
static .mcp.json. The connection details live in PluginConfig which
the extension already resolves with the env > host > ovcli.conf >
ov.conf chain — the dynamic provider injects *resolved* values so we
never need `${VAR}` substitution and never write the apiKey to a
JSON file on disk. This sidesteps the "header substitution syntax"
question Phase 0 was meant to spike for the static-file path.

extension.ts now registers the MCP provider after the participant;
the disposable lands in context.subscriptions for clean teardown.

10 unit tests cover MCP_PROVIDER_ID parity with the manifest entry,
name + uri shape (default + trailing-slash variants),
local-only-mode (empty headers), and remote / multi-tenant mode
(Authorization, all 4 tenant headers, only-populated-fields, header
shape mirrors OVClient).

246/246 tests passing across all workspaces. Bundle 47.9 KB.

Refs #18

* feat(copilot-plugin): full settings schema + SecretStorage Set API Key

Expand the VS Code surface from 8 hand-maintained settings to all 25
PluginConfig fields per PLAN.md §8.2, plus a SecretStorage-backed
`OpenViking: Set API Key` command so users never need to inline the
apiKey into settings.json.

Single source-of-truth: settings-schema.ts exports an
OPENVIKING_SETTINGS array of typed descriptors {key, type, default,
cfgField, secret?, enumValues?}. The manifest's
contributes.configuration block is hand-authored to match (so VS
Code gets exactly the JSON-schema fragments it needs); a
drift-detection test compares the two and breaks the build if
either side gets out of sync. Adding a new setting now means
touching schema.ts + the manifest — extension.ts's
readWorkspaceOverrides automatically picks it up by iterating the
descriptor list.

Set-API-Key flow split (matches the extension-core pattern):
- commands-core.ts (vscode-free) — runSetApiKeyCommand(secrets,
  input, opts?). password:true prompt, non-empty validate, trim,
  store, info message. Returns {saved, reason?: 'cancelled' | 'empty'}
  so callers can branch + tests can assert.
- commands.ts (vscode adapter) — wires vscode.window.showInputBox +
  vscode.window.showInformationMessage + context.secrets into the
  duck-typed SecretStorageLike + InputProvider shapes.

extension.ts becomes async (returns Promise<void>):
1. Register commands FIRST so the user can reach Set API Key even
   when the plugin is currently disabled.
2. Await context.secrets.get(SECRETS_API_KEY) and layer it above
   settings.json's apiKey value (SecretStorage > settings >
   env > ovcli.conf > ov.conf > defaults).
3. readWorkspaceOverrides drives off OPENVIKING_SETTINGS — no more
   hand-maintained mapping; the loop dispatches on descriptor.type
   (string / boolean / number / string-array / enum) and writes
   into PluginConfig fields by descriptor.cfgField.

Manifest:
- All 25 properties with markdownDescription. apiKey carries an
  explicit ⚠️ warning + nudge toward the SecretStorage command.
  Numeric properties get min/max bounds matching the floors in
  config.ts. captureMode is enum-restricted to semantic | keyword.
- New `commands` contribution declares the Set API Key command
  with category: "OpenViking" so it groups in the command palette.

17 new tests:
- settings-schema.test.ts (10): catalogue shape (25 entries, prefix
  invariant, secret/enum singletons), exported constants, drift
  detection (each schema entry → manifest property with matching
  type, default match, no orphan manifest properties), apiKey
  warning text, Set-API-Key command declared in manifest.
- commands-core.test.ts (7): happy path (password:true / store /
  trim / custom secretKey / validate function), cancel
  (undefined → reason:cancelled, no store, no info), empty
  (whitespace / "" → reason:empty).

263/263 passing across all workspaces. Bundle 52.88 KB, vsix 18.15 KB.

Refs #19

* feat(copilot-plugin): scaffold openviking-copilot-mcp CLI bin

Real npm manifest with:
- bin: openviking-copilot-mcp → ./dist/mcp-server.js
- dependencies: @openviking/copilot-shared (workspace) +
  @modelcontextprotocol/sdk@^1.21.0 (used in #21)
- esbuild bundle script (scripts/build.mjs) reads
  package.json#version and injects it as the __OV_CLI_VERSION__
  define so --version reports the published value without runtime
  fs reads. Single ESM file out, source shebang preserved
  (banner removed mid-flight after the first build duplicated
  the shebang and broke Node's loader).
- prepack runs build, so `npm pack` always produces a fresh
  artefact.

cli.ts is the testable runMain(argv, opts) — vscode-free, takes
injectable stdout/stderr/loadConfig/isPluginEnabled. Flags:
  --help / -h     usage to stdout, exit 0
  --version / -v  semver to stdout, exit 0
  --check         loadConfig({agentIdDefault:'copilot-cli'}),
                  redacted summary to stdout (apiKey shown as
                  '<set, N chars>', never the value), exit 0/3
                  reflecting isPluginEnabled
  default         stub-to-stderr pointing at #21 where the real
                  MCP server bootstrap lands, exit 0
  unknown arg     error-to-stderr + exit 2

mcp-server.ts is a thin shebang-bearing shim that calls
runMain(process.argv.slice(2)) and exits with the returned code.
Top-level fatal handler logs stack to stderr.

10 unit tests cover --help / -h, --version / -v, --check (calls
loadConfig with the right agentIdDefault, summary content,
apiKey redaction with regex match on '<set, N chars>',
exit-code reflects enabled state), default invocation
(stub-to-stderr + exit 0), and unknown-arg handling.

End-to-end verified: npm run build → 13.2 kB ESM bundle, runs
cleanly under node, --check reads my real ~/.openviking/ov.conf
and prints the resolved baseUrl + agentId. npm pack →
openviking-copilot-cli-memory-0.0.0.tgz (4.4 kB, 2 files).

Old smoke.test.ts removed (replaced by cli.test.ts which exercises
the real entry point). .gitignore picks up *.tgz.

Refs #20

* feat(copilot-plugin): add CLI MCP server tools

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* feat(copilot-plugin): add Phase 2 recall tools (#34)

* feat(copilot-plugin): add recall tools

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(copilot-plugin): type recall config

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

---------

Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* docs(copilot): document VS Code default chat capture limitation (#35)

Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* feat(copilot-cli): add capture MCP tool (#36)

Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* feat(copilot-cli): add copilot() shell-wrapper fallback

Optional degraded-fidelity capture path for the GitHub Copilot CLI.
The openviking_capture MCP tool from #26 stays the primary capture
mechanism; this wrapper closes one specific gap: captures the model
recorded mid-session that didn't cross commitTokenThreshold to trigger
an automatic commit.

Three pieces:

1. cli.ts gains `--commit-flush --session=<id>`. Loads cfg, builds an
   OVClient, force-commits the given session id. Bypass is honoured
   automatically because OVClient.commit short-circuits internally.
   Maps OVResult to exit codes (0=ok, 1=transport error with HTTP
   status if present, 2=missing/empty --session). commitFlush is
   injectable for tests so they don't hit a real server. The argv
   parser learns --key=value form alongside the existing --flag form.

2. server.ts wires OPENVIKING_CLI_SESSION_ID into runStdioMcpServer:
   when set, the MCP server uses it as defaultSessionId so the
   in-process openviking_capture calls and the wrapper's post-exit
   --commit-flush both target the same OV session.

3. wrapper/copilot.sh is a sourceable bash/zsh function. Per
   invocation it derives `cp-<uuid>` (uuidgen) or `cp-$$-$(date +%s)`
   (fallback), exports OPENVIKING_CLI_SESSION_ID, runs `command
   copilot "$@"`, then on exit fires `openviking-copilot-mcp
   --commit-flush --session=<id>`. Preserves the user's exit code
   regardless of what the post-exit call does.

   Configuration env vars:
   - OPENVIKING_BYPASS_SESSION=1   skip the wrapper entirely (no env
                                    var set, no post-exit call)
   - OPENVIKING_WRAPPER_QUIET=1    swallow commit-flush stderr too
   - OPENVIKING_DEBUG=1            wrapper + MCP-server hook log
                                    lands in ~/.openviking/logs/
   - OPENVIKING_CLI_SESSION_ID    (the wrapper sets it)

Why "degraded fidelity": the wrapper does NOT see the user's prompts
or the assistant's responses. Capture itself still requires the model
to call openviking_capture; without that, the OV session has nothing
pending and the post-exit commit archives nothing. The wrapper's
value is closing the model-called-capture-but-didn't-trigger-commit
edge case. README spells this out with a diagram and a "when NOT to
use" section so users have the right expectations.

Tests:
- 6 cli.test.ts cases for --commit-flush (missing session, success,
  transport error, HTTP status formatting, whitespace trim, empty
  after trim)
- 1 stdio integration in server.test.ts: spawns the real bin with
  OPENVIKING_CLI_SESSION_ID=cp-wrapper-coord-test +
  OPENVIKING_BYPASS_SESSION=true, calls openviking_capture without
  a sessionId, asserts the response's sessionId is the wrapper
  value
- Manual end-to-end shell smoke: wrapper exports env var, runs
  fake-copilot (exit 42), preserves the 42 exit code, then calls
  openviking-copilot-mcp --commit-flush --session=cp-<uuid>. Bypass
  mode skips entirely with empty env var and no post-exit call.

299/299 tests passing across all workspaces; typecheck clean.

Refs #27

* feat(task): add async task tracking for add-resource and add-skill operations (#1763)

* feat: add async task tracking for add-resource/add-skill/write operations

- Return task_id when add-resource/add-skill/write called without --wait
- Add 'ov task status <task_id>' and 'ov task list' CLI commands
- Bridge RequestWaitTracker and TaskTracker via background monitor coroutines
- Format TaskRecord timestamps as ISO 8601 in to_dict()
- Always generate telemetry_id (remove 'not wait' condition)
- Extract _create_write_task helper to eliminate code duplication
- Add unregister_wait_telemetry in _monitor_write_queue for consistency
- Update CLI async prompt from 'ov wait' to 'ov task status <task_id>'
- Add 7 new tests covering async task tracking

* refactor: remove task_id from write operations, keep only add-resource/add-skill

Write operations (write/create_file/write_memory) are primarily called
by internal flows like session commit, which have their own task tracking.
Only add-resource and add-skill are user-facing CLI commands that need
task_id for async progress tracking.

* fix: address PR review feedback

1. Fix async failure path leaking request-scoped tracker/telemetry state
   - Add monitor_started flag to ensure cleanup when monitor coroutine
     hasn't been launched yet
   - Finally block now cleans up if wait or not telemetry_id or not monitor_started

2. Fix TaskRecord timestamp API compatibility break
   - Keep original created_at/updated_at as float (backward compatible)
   - Add new created_at_iso/updated_at_iso fields with ISO 8601 strings

3. Fix ruff format lint failure on resource_service.py

4. Add regression test for async failure cleanup
   - test_add_resource_async_failure_cleans_up_tracker verifies no
     RequestWaitTracker or telemetry registry state leaks when processor
     raises before task/monitor creation

* fix: add missing unregister_wait_telemetry in add_skill finally block

* fix: remove unused imports in test_add_resource_async_failure_cleans_up_tracker

* fix: queue failure shows as completed and business error creates unreachable task

- _monitor_queue_processing: check error_count from build_queue_status,
  mark task as failed when queue processing has errors
- add_resource: skip task creation when process_resource returns
  status=error, preventing unreachable ghost tasks
- Improve test_add_resource_async_failure_cleans_up_tracker: patch
  internal processor instead of add_resource itself to cover finally
  cleanup logic
- Add test_add_skill_async_returns_task_id for add_skill coverage
- Add test_add_resource_business_error_no_task regression test
- Add test_monitor_marks_failed_on_queue_error regression test

* test: move add_skill tests to test_session_task_tracking.py

Move add_skill task tracking tests from test_api_resources.py to
test_session_task_tracking.py where other task tracking tests live,
and add sync no-task-id coverage.

* test: update async task queryable assertion to include failed status

Queue errors now correctly mark task as failed (Bug 1 fix), so the
test assertion must accept 'failed' as a valid terminal status.

* refactor: use explicit return result for business error path

* fix: remove duplicate asyncio imports in test_api_resources.py

* fix: avoid task creation on watch conflict

---------

Co-authored-by: qin-ctx <qinhaojie.exe@bytedance.com>

* ci(copilot-plugin): add three workflows for shared / vscode / cli

Per PLAN.md §9.5. All three .github/workflows/copilot-*.yml files
gate on `paths: examples/copilot/**` so unrelated changes skip them
cleanly:

- copilot-shared.yml runs on ubuntu-latest only (pure-TS doesn't
  benefit from a matrix). Builds shared, typechecks, runs the new
  `test:coverage` script which exits non-zero when v8 line coverage
  on packages/shared/src drops below 80%. Uploads the coverage
  report as an artefact.
- copilot-vscode.yml runs on ubuntu/macos/windows × Node 22 with
  fail-fast:false. Typecheck + Vitest unit suite + esbuild bundle.
  Linux job additionally runs `vsce package` as a smoke for the
  publishable .vsix.
- copilot-cli.yml runs on the same matrix. Builds the bin first so
  the stdio integration test in server.test.ts can spawn it. After
  the suite passes, runs `--help` and `--version` against the built
  bundle as a real-process smoke. Linux job runs `npm pack` and
  uploads the tarball. Adds workflow_dispatch + nightly schedule so
  a future real-Copilot-CLI fixture can plug into the same job
  without changing trigger config.

Supporting changes:
- packages/shared/vitest.config.ts gains v8 coverage config with
  thresholds.lines:80 (current measurement: 91.85% — comfortably
  clear). Includes lcov reporter so coverage tooling like Codecov
  can pick it up later.
- packages/shared/package.json gains a `test:coverage` script.
- root package.json adds @vitest/coverage-v8 as a dev dep.
- cli-plugin/scripts/build.mjs guards chmodSync behind
  `process.platform !== 'win32'` + try/catch so Windows CI doesn't
  fail with EPERM. The chmod is a no-op on Windows anyway since
  NTFS doesn't track POSIX mode bits.

Local validation: 299/299 tests green, typecheck clean, coverage
gate passes at 91.85% lines, exit code 0.

Acceptance status:
- [x] Coverage gate ≥80% lines on packages/shared/src/
- [x] Workflows skip when no relevant files changed (paths filter)
- [⏳] All three pass on a no-op PR — verifiable on next GitHub
      Actions run after this commit lands

Refs #30

* fix(storage): use UTC date comparison for modTime display in ls (#1909)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(openclaw-plugin): support OpenClaw install and ClawHub release flow (#1904)

Squashes OpenClaw plugin install compatibility, setup helper source-build support, built dist output, and ClawHub release workflow updates into one publish commit.

Co-authored-by: LinQiang391 <linqiang391@users.noreply.github.com>

* feat(rebuild): add rebuild api scaffold (#1592)

* feat(admin): add rebuild api scaffold

feat: add admin rebuild API

fix: harden admin rebuild execution

feat(cli): add rebuild command support

fix(rebuild): support namespace rebuild routing

refactor(rebuild): unify memory semantic rebuild mode

refactor(rebuild): move http endpoint to content route

fix(rebuild): skip root namespace vectorization

fix(rebuild): harden namespace classification

refactor: rename rebuild api to reindex

refactor: rename reindex executor module

refactor(reindex): remove unused reason field

* fix(reindex): tighten namespace URI handling

Share segment-based Viking URI classification across context inference and reindex execution, add skill namespace support, and require root reindex requests to select an account.

* refactor: reuse indexing pipeline in reindex

* Revert "refactor: reuse indexing pipeline in reindex"

This reverts commit 2725fe6733bf65df5baac9fc45d643469aff5d43.

* fix(reindex): respect semantic vectorization skips

Avoid scheduling semantic DAG vectorization work during semantic_and_vectors reindex, and keep resource vector text selection aligned with normal vectorize_file handling for non-text files.

---------

Co-authored-by: qin-ctx <qinhaojie.exe@bytedance.com>

* feat: ov CLI support  (#1916)

* feat: support ov config switch and ov config setup-cli

* feat: support ov config switch and ov config setup-cli

* feat: add ov tui image preview

---------

Co-authored-by: openviking <openviking@example.com>

* feat(claude-code-plugin): session-start profile injection, /ov status command, tool-output capture cleanup (#1914)

* feat(claude-code-plugin): inject user profile + memory listings on session start

Previously, profile/preferences/entities only reached the agent when the
user's prompt happened to trigger semantic auto-recall (UserPromptSubmit).
Trivial first prompts (e.g. `git status`) left the agent with no identity
context.

Session-start hook now always builds a profile injection block —
profile.md plus a description-annotated recursive ls of preferences/ and
entities/ — composed into the same <openviking-context source="..."> envelope
that already carries archive context on resume/compact. Subagents are
unaffected (they go through subagent-start.mjs).

Budget enforcement uses a CJK-aware token estimate (codepoint >= 0x3000
counts at 1.5 tokens, else chars/4) so a "10k token budget" reflects real
tokenizer cost for Chinese content rather than the 4-6× undercount the
flat chars/4 heuristic produces.

Profile truncation on overflow keeps the head (identity facts) and tail
(most-recent timeline events), eliding the noisy middle, instead of
hard-cutting at the head.

Each invocation mirrors the composed payload to ~/.openviking/last_inject.md
for user-facing audit.

New env vars / config (config.mjs):
- OPENVIKING_NO_AUTO_INJECT (bool, default false) — kill switch for the new
  injection; auto-recall is unaffected.
- OPENVIKING_PROFILE_TOKEN_BUDGET (int, default 10000) — total cap for the
  block; profile gets up to half, listings split the remainder.

* feat(claude-code-plugin): add /ov slash command for plugin status

Tight five-section status report covering: server URL + /health latency,
resolved identity (account/user/agent), last session-start injection
(size, age, audit-file path), last auto-recall (item count, top score,
token budget use), and toggle state for the three injection paths
(auto-inject / auto-recall / auto-capture).

Final line shows where url + api_key were actually resolved from (env vs
ovcli.conf vs default), per the same priority chain config.mjs uses —
rather than enumerating every file on disk that *could have* contributed.

Reuses existing ~/.openviking/state/ files (last-recall.json,
last-session-event.json) and the audit file written by session-start.mjs;
no new server-side state.

* fix(cc-memory-plugin): drop tool output by default; keep tool input verbatim

After #1849 / #1850 the plugin captured tool I/O at a 4 KB-per-block cap
under one knob (TOOL_BLOCK_MAX_CHARS = 4096). Field thinking surfaced
two refinements:

1. **Tool *output* (tool_result content) is mostly noise for memory
   extraction.** Memory extraction cares about user preferences, project
   context, decisions, and what the agent did — not about the bytes a
   tool happened to return. The agent's prose around the tool call almost
   always summarizes the meaningful bit ("I checked the docs and confirmed
   X"); the raw 4 KB of fetched markdown adds nothing the prose doesn't
   already cover. Storing it just inflates session size and extraction
   token cost.

   Renamed TOOL_BLOCK_MAX_CHARS → TOOL_RESULT_MAX_CHARS and changed
   default to 0. When 0, tool_result blocks are dropped entirely. Operators
   wanting replay-style archives can set >0 to retain truncated output.

2. **Tool *input* should not be truncated.** Inputs are agent-authored
   (URLs, file paths, queries, commands). They're usually short, and a
   pathologically long input is itself signal worth surfacing — a
   memory extractor seeing "agent ran `bash` with a 10 KB script" learns
   something the truncated form would hide.

   Replaced truncateForLog(block.input) with formatToolInput(block.input),
   which JSON-serializes structured inputs but applies no length cap.

Both changes apply symmetrically to auto-capture.mjs and subagent-stop.mjs.

* fix(claude-code-plugin): address Copilot review on PR #1914

Nine review comments, all valid:

profile-inject.mjs:
- header doc said chars/4 but estimateTokens is CJK-aware → fixed
- estimateTokens now exported so callers can log token counts that match
  the budget logic
- elideProfile derived maxChars from maxTokens*4, but the estimator counts
  CJK at 1.5 tokens/char → for CJK profiles the truncated string could
  still bust the token cap. New tokensToCharsBudget() converts using the
  content's actual CJK density
- formatListing always included header + first entry, so very small budgets
  silently violated the cap. Now: stub-out when header alone exceeds
  budget; only emit "+N more" tail when it fits; close silently otherwise
- profileBytes was UTF-16 char count, labeled "B" → renamed to profileChars

session-start.mjs:
- header doc said budget=5000, code default is 10000 → doc fix
- local estimateTokens was flat chars/4 while injection enforces CJK-aware
  budget → import the shared estimator from profile-inject so logs match
  reality
- /health probe ran even when no injection path would fire (e.g.
  NO_AUTO_INJECT=1 + startup) → short-circuit before the network call
- profileBytes references updated to profileChars

ov-status.mjs:
- header doc said "Active config file + env overrides" but the bottom
  block was removed earlier → header fixed to describe Auth source
- auth source detection only considered env + ovcli.conf; could misreport
  "(none)" when key was actually coming from ov.conf claude_code.apiKey
  or server.root_api_key. Now mirrors config.mjs's full priority chain
  (env → ovcli.conf → ov.conf → default)

* fix(wizard): use volcengine provider for BytePlus + add Custom VLM option (#1915)

BytePlus was configured with provider="openai" but the BytePlus
endpoint uses the Volcengine API (multimodal_embeddings at
/embeddings/multimodal). Using the OpenAI SDK sends requests to the
wrong path (/embeddings with string input), causing 500 errors or
hangs.

Also adds a "Custom (OpenAI-compatible)" VLM option to both the cloud
and local wizard flows, so users can point to any OpenAI-compatible
endpoint (e.g., MiMo, vLLM, LiteLLM proxies).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(oauth): native OAuth 2.1 authorization for MCP clients (#1870)

* feat(oauth): hand-sewn OAuth 2.1 M1+M2 (config, JWT, storage, /oauth/token, JWT discriminator)

Snapshot before evaluating migration to mcp.server.auth SDK provider. The
hand-rolled HS256 JWT implementation in openviking/server/oauth/jwt.py is
the main candidate for replacement: its surface area is small but it would
require careful crypto review by maintainers, while the official MCP SDK
already ships an OAuth provider wired into FastMCP.

Included so far:
- OAuthConfig + integration into OpenVikingConfig (default disabled)
- openviking/server/oauth/{jwt,storage,otp,router}.py
- POST /oauth/token (authorization_code + refresh_token, PKCE S256, RFC 6749 errors)
- JWT discriminator in resolve_identity (fail-closed; ResolvedIdentity.from_oauth)
- WWW-Authenticate Bearer hint on /mcp 401 (RFC 9728)
- 49 OAuth-specific unit/integration tests (all passing)

Not yet implemented (M3 / MVP gap):
- /oauth/register (DCR), /oauth/authorize (HTML + OTP submit), well-known metadata
- POST /api/v1/auth/otp REST endpoint

* refactor(oauth): switch to mcp.server.auth SDK provider, drop hand-sewn JWT

Replaces the hand-rolled HS256 JWT signer / token endpoint / DCR with
the OAuth 2.1 surface shipped in mcp.server.auth. We supply a Provider
that adapts the existing OAuthStore (SQLite) to the SDK Protocol, plus
two custom routes the SDK doesn't own: an OTP-entry HTML page (the URL
provider.authorize() returns) and POST /api/v1/auth/otp for issuing
OTPs against an existing API key.

Net result: all OAuth crypto is now the SDK's responsibility (PKCE
S256, redirect_uri matching, error formatting). The OpenViking-side code
contains zero cryptography — access tokens are opaque random strings
prefixed with `ovat_` and looked up in SQLite by SHA-256 hash. Refresh
tokens, auth codes, OTPs use the same scheme.

Highlights:
- openviking/server/oauth/provider.py: OpenVikingOAuthProvider implements
  the 8-method SDK Protocol, including subclassing AuthorizationCode /
  RefreshToken / AccessToken to pin (account_id, user_id, role) per
  token. Refresh-token replay triggers per-user chain revocation.
- openviking/server/oauth/storage.py: adds oauth_access_tokens and
  oauth_pending_authorizations tables; peek_auth_code / peek_refresh
  for non-destructive lookups; revoke_user_tokens cascades all OAuth
  state for an (account, user) pair when a key is rotated.
- openviking/server/oauth/router.py: minimal authorize page (inline
  HTML with frame-ancestors 'none') + OTP endpoint authenticated via
  existing get_request_context dependency.
- openviking/server/auth.py: replaces JWT discriminator with prefix
  match + provider.load_access_token; still fail-closed.
- openviking/server/app.py: mounts SDK routes via create_auth_routes
  alongside our authorize-page + OTP routes.
- Deletes openviking/server/oauth/jwt.py and tests/server/oauth/test_jwt.py.

Tests: 32 passing, including a full DCR -> OTP -> authorize page ->
token-exchange -> /mcp lookup happy path, refresh rotation, and replay
detection. Existing test_auth.py regression unchanged.

Phase 1 still missing for full Claude.ai connectivity:
- WWW-Authenticate hint already present on /mcp 401 (from M2)
- /.well-known/oauth-protected-resource (RFC 9728) — not currently
  emitted by the SDK; small custom route still TODO.

* docs(oauth): rewrite design doc to reflect mcp.server.auth SDK approach

The earlier draft described a hand-sewn HS256 JWT plan; the implementation
took a different route after discovering mcp.server.auth ships a complete
RFC 6749 / 7591 / 8414 server. Updated to reflect:

- SDK owns the protocol surface (DCR, /authorize parsing, /token, metadata,
  PKCE, redirect_uri matching, error codes).
- OpenViking only contributes a Provider implementation, the OTP-entry
  HTML page, and POST /api/v1/auth/otp.
- Tokens are opaque (ovat_ / ovrt_ / ovac_ prefixes) — no JWT, no crypto
  on our side.
- Implementation status: M1/M2/M3 done; only RFC 9728 protected-resource
  metadata + reverse-proxy issuer derivation remain for full Claude.ai
  end-to-end connectivity.

* feat(oauth): add /.well-known/oauth-protected-resource (RFC 9728)

The /mcp 401 path already advertises this URL via WWW-Authenticate
Bearer resource_metadata="...", but the endpoint itself didn't exist —
clients fetched it and got a 404, which silently broke the discovery
chain even though /.well-known/oauth-authorization-server worked. Wire
up the resource metadata document so the full RFC 9728 → RFC 8414
discovery chain works end-to-end.

Uses mcp.shared.auth.ProtectedResourceMetadata pydantic model. Reads
X-Forwarded-Proto/Host so the published resource URL matches what the
client used (matches our existing WWW-Authenticate behavior).

Cache-Control: max-age=3600 — metadata is stable across requests.

* feat(console): add OTP issuance button in Settings panel

Adds a "Get OTP" button under the Settings panel of the 8020 web
console. Clicking it issues an OAuth OTP via the user's existing API
key (already loaded into sessionStorage) and displays it inline with
a copy-to-clipboard button.

Replaces the previous workflow of users having to:
  curl -X POST -H "X-Api-Key: $KEY" http://1933/api/v1/auth/otp

…with a single button-click flow that the user can reach from any
machine with a browser.

Wires:
- console/app.py: new POST /console/api/v1/ov/auth/otp proxy route,
  forwarding to upstream /api/v1/auth/otp. Not gated by write_enabled
  since OTP issuance is an authentication artifact, not data mutation.
- index.html: new OAuth section in the Settings panel with otpBox
  (hidden until OTP is generated) and a Copy button.
- app.js: getOtpBtn click handler calls callConsole, otpCopyBtn copies
  to clipboard. Clear failure messages when the user has no API key
  loaded yet.

This is the lightweight half of the Console-OAuth integration. The
fuller "same-origin auto-authorize" flow (Phase 2) — where the
authorize page detects sessionStorage and submits the OTP form
automatically — is still TBD and will reuse this proxy route.

* feat(oauth): device-flow style authorize page + console verify form

Pivots the OTP flow direction so the UX matches OAuth 2.0 Device
Authorization Grant (RFC 8628) more closely:

  Old (push):  user goes to console -> Get OTP -> copy -> paste in
               client's authorize page -> submit -> redirect.
  New (pull):  client's authorize page DISPLAYS a 6-char code -> user
               types it into the console verify form -> page polls -> redirect.

This removes one tab switch and aligns with how users mentally model
authorization ("I'm approving the request shown over there from
where I'm already signed in"). The legacy POST /api/v1/auth/otp +
"Get OTP" button are kept under a collapsed details element for any
scripted/CLI flows that still drive the older pattern.

Also wires OPENVIKING_PUBLIC_BASE_URL env var as the highest-priority
public origin override, used consistently by:
  - /.well-known/oauth-protected-resource
  - WWW-Authenticate header
  - authorize page links
  - SDK issuer at app start.

Server changes:
- storage.py: oauth_pending_authorizations gains display_code,
  verified, verified_account_id/user_id/role columns; new
  find_pending_by_display_code + mark_pending_verified.
- provider.authorize() now generates display_code at pending creation
  and returns the page URL.
- router.py:
  * GET /oauth/authorize/page — renders the code + same-origin quick-
    authorize panel (sessionStorage detection, but click still required
    so authorization is never silent).
  * GET /oauth/authorize/page/status — polled by the page until verified;
    response carries the redirect_url with auth_code on approval.
  * POST /api/v1/auth/oauth-verify — authenticated; binds caller
    identity to a pending row (decision=approve|deny).

Console changes:
- Settings panel: new "Authorize an MCP client" section with code input
  and Authorize/Deny buttons. Legacy "Get OTP" still available under
  details.
- console proxy gains POST /console/api/v1/ov/auth/oauth-verify.

Tests: 38 OAuth tests passing, including a full device-flow happy path,
deny path, idempotency (one-shot pending), unknown-code rejection,
status-410 on consumed/expired, refresh rotation, OPENVIKING_PUBLIC_BASE_URL
override, and X-Forwarded-* fallback.

* docs(oauth): add 11-oauth guide + Caddy/nginx templates + .env-driven compose

Adds a top-level OAuth 2.1 guide (zh/en) covering the production path
end-to-end. Opens with a 5-step recommended setup so readers don't have
to wade through the rationale before they can deploy. Drops the "MCP"
qualifier from the doc name — OAuth 2.1 here is generic and serves any
OAuth client, not just MCP.

- docs/{en,zh}/guides/11-oauth.md: new. Recommended setup at the top,
  then background, full device flow, HTTP-local vs HTTPS-production
  deployment, Caddy + nginx templates, docker-compose with the shipped
  Caddy service, curl walkthrough, config reference, troubleshooting.
- docker-compose.yml: replace the prior PR's commented-out hint with a
  single OPENVIKING_PUBLIC_BASE_URL var (read by both the openviking
  service and an optional Caddy reverse-proxy service that's also
  shipped commented-out). Same env var drives Caddy via
  {$OPENVIKING_PUBLIC_BASE_URL}, so the public domain is configured
  once in .env.
- docs/{en,zh}/guides/06-mcp-integration.md: replace the "OAuth Proxy
  (planned, use community Cloudflare Worker)" section with a short
  pointer to the new 11-oauth guide. The community proxy is still
  mentioned as an alternative.

Same env-variable design also matches what the MCP add_resource tool
expects (it already reads OPENVIKING_PUBLIC_BASE_URL), so deployments
get a single source of truth for the public address.

* fix(oauth): read API key from localStorage on authorize page

The same-origin "Quick authorize" panel was reading sessionStorage,
which is per-tab. Since the OAuth authorize page opens in a different
tab from the console, the panel never showed up even when the user was
signed in.

The console persists the API key in localStorage as well (key
"ov_console_api_key" — see static/console_settings.js's
LEGACY_API_KEY_STORAGE_KEY) for cross-tab use, and that copy is what
the authorize page should consult.

Switch the page JS to localStorage first, fall back to sessionStorage
for resilience. No console-side change needed; the localStorage entry
has been written by the console all along.

* docs: add public access guide + default port 1934 aggregated proxy

- Add Caddyfile with :1934 HTTP aggregated proxy (merges 1933+8020)
- Enable Caddy service by default in docker-compose.yml on port 1934
- Add docs/{en,zh}/guides/12-public-access.md with full HTTPS setup guide
- Simplify 11-oauth.md: replace inline reverse proxy config with refs to 12
- Add HTTPS requirement callout to OAuth recommended setup
- Update 03-deployment.md to mention port 1934 as recommended entry point

* fix(oauth): address Copilot review + ruff format

- Update oauth_config.py docstrings to describe opaque tokens, not JWT
  (we switched away from JWT during implementation)
- Remove unused authorize_rate_limit_per_min config field — was never
  enforced anywhere in router/storage, dead config misled operators
- Wrap all OAuthStore read paths in self._lock (matching writes); the
  shared sqlite3.Connection with check_same_thread=False is not safe
  for concurrent cursor use across threads
- Clarify provider.exchange_refresh_token comment that replay revokes
  the entire (account, user) family, not just the (client, account,
  user) chain — broader blast radius is intentional
- ruff format: 8 files reformatted to satisfy CI lint

* perf(docker): add cargo + ccache cache mounts to py-builder stage

The two heavy RUN steps in py-builder (uv sync + maturin build) re-execute
on every Python source change because the upstream COPY layer for openviking/
invalidates the cache. Each rerun was ~510s + ~115s ≈ 10 min of wasted work
even though Rust/C++ source was unchanged.

Add BuildKit cache mounts so cargo and the C++ engine compilation can skip
work whose inputs are unchanged:

- Mount /cargo-target, cargo registry, and cargo git so cargo's incremental
  build artifacts persist across layer reruns. Pin CARGO_TARGET_DIR so the
  path stays stable when uv builds wheels in ephemeral isolated tempdirs.
- Install ccache and prepend /usr/lib/ccache to PATH so cmake (which calls
  shutil.which("gcc")) resolves the ccache wrapper. ccache is path-agnostic,
  so it benefits the cmake_build subdir even though setup.py recreates it
  in a fresh tempdir each wheel build.
- Mount /root/.ccache so the ccache hash store persists across reruns.

Expected: hot rebuilds on Python-only changes drop step 15 from ~510s to
~60-120s (uv wheel packaging overhead remains; cargo + g++ skip on cache hit).

* perf(docker): drop redundant second maturin build step

The second RUN step in py-builder built ragfs-python a second time and
extracted its .so into the installed openviking package. This was
redundant: setup.py's build_ragfs_python_artifact() already runs maturin
during step 15 (uv sync --no-editable), and because build_meta passes
'bdist_wheel' through PEP 517, _should_require_ragfs_artifact() returns
True and the build fails closed if maturin can't produce ragfs_python.so.
The .so is then bundled into the wheel via package_data and installed
into /app/.venv on wheel install. The second step's only effect was to
overwrite the same file, costing ~115s per build.

Verified after the fact by inspecting the installed venv and importing
ragfs_python in the runtime container.

* feat(oauth): bind OAuth token lifetime to authorizing API key

Previously OAuth tokens lived independently of the API key that authorized
them. Rotating a user's key did not invalidate already-issued OAuth access /
refresh tokens, so a compromised key remained dangerous even after rotation.

Tie every OAuth token to the SHA-256 fingerprint of the API key whose holder
authorized it:

- APIKeyManager grows get_user_key_fingerprint(account_id, user_id) ->
  sha256(stored_key_value). The stored value is whatever sits in
  user_info["key"] (plaintext key or argon2id hash), written once on
  create / regenerate and never mutated in place, so the fp is stable per
  key-generation and changes the moment regenerate_key runs.

- OAuth storage gains an authorizing_key_fp column on oauth_codes,
  oauth_pending_authorizations (verified_key_fp), oauth_refresh_tokens, and
  oauth_access_tokens. ALTER TABLE migration guarded by PRAGMA table_info
  for dev DBs that predate the field.

- Provider data classes thread the fp through authorize ->
  exchange_authorization_code -> _issue_token_pair, and refresh rotation
  preserves it from the consumed token's record.

- Router endpoints capture the caller's current fp at the only two
  identity-binding moments: /api/v1/auth/otp (caller) and
  /api/v1/auth/oauth-verify (verifier). If the manager returns None
  (ROOT key, trusted-mode identity, or removed user), refuse to issue
  OAuth state -- there is no key whose lifecycle we could honor.

- auth.py:_try_resolve_oauth_token recomputes the user's current fp on
  every OAuth bearer auth and demands strict equality via
  hmac.compare_digest. NULL / empty / mismatch all fail closed with a
  401 telling the client to re-authorize.

Crypto notes: sha256 over a 256-bit-random API key (or its argon2id hash)
is preimage-safe, so an oauth.db leak does not reveal the API key. No new
secret material introduced; the fp is derived deterministically from data
that already exists.

Tests: 3 new lifecycle tests in test_auth_integration (rotation rejected,
user-removed rejected, missing-fp fail-closed), 3 new router tests
(no-fp caller / verifier rejected, fp recorded on access + refresh), 2 new
APIKeyManager tests (fp changes on rotate / vanishes on remove).
Pre-existing inserts in test_storage updated to pass _FP. 82/82 OAuth +
APIKeyManager tests pass.

* docs(oauth): document OAuth lifetime ≤ authorizing key lifetime

The fingerprint binding landed in the previous commit; users need to know
that key rotation now auto-invalidates derived OAuth tokens (no separate
revoke step) and that ROOT / trusted-mode identities cannot issue OAuth.

Updates both en and zh under docs/guides/11-oauth.md, replacing the
"operator should also revoke ..." paragraph with the new automatic
behavior + brief note on the SHA-256 fingerprint scheme.

* fix(oauth): close 4 review findings on token lifecycle

External security review of #1870 surfaced four real gaps in the OAuth
implementation. All four directly affect the lifecycle / privilege model.

P1: role downgrade did not invalidate OAuth tokens
  set_role rewrit…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants