Skip to content

smriti daemon: long-running ingest service with file-watching and debounce #70

@ashu17706

Description

@ashu17706

Problem

smriti ingest claude is currently wired as a fire-and-forget hook running after every Claude Code Stop event. Because each invocation pays the full cold-start tax (Bun runtime + module graph + SQLite open + eventually embedding-model load) and the actual scan can run for minutes against a multi-month DB, concurrent fires across multiple Claude sessions stack up faster than they finish.

Today I had 42 concurrent smriti ingest claude processes running, oldest from 4 days ago, with a combined ~13,449 minutes (~9 CPU-days) of duplicate work. Short-term mitigation: I wrapped the hook in lockf -t 0 /tmp/smriti-ingest.lock so only one runs at a time. That stops the pile-up but doesn't address the underlying cost.

Postmortem write-up: docs/papers/stop-hook-never-stopped.md.

Goal

A long-lived smriti daemon that pays cold-start costs once and turns the Stop hook into a ~5ms poke. Eliminates the process-pile-up class of bugs by construction (single process), warms the embedding model, and replaces the current full-tree rescan with FS-event-driven incremental ingest.

The config constants DAEMON_PID_FILE and DAEMON_DEBOUNCE_MS in src/config.ts already exist — this issue completes that planned work.

What this daemon is really for (and when it gets urgent)

Three things become "warm" inside a long-running daemon, and they are nowhere near equal in cost:

Warm thing Cold-start cost RAM footprint
Bun runtime + module graph ~100–200ms (TS parse + module resolution + Bun init) ~30–60 MB
SQLite WAL connection + prepared statements ~20–50ms (file open + WAL recovery + N × prepare()) ~10 MB
Embedding model ~2–5 seconds (read ~300MB GGUF, init context) ~300–500 MB resident

The embedding model is 20–50× the cost of the other two combined. The daemon's real economic case isn't "amortize cold start" in general — it's specifically amortize the embedding model. Items 1 and 2 are nice-to-have; item 3 is the entire argument.

That reframes the urgency of this issue around a single design choice: where does the embedding model live?

Embedding model location Daemon needed for perf? Reasoning
In-process (node-llama-cpp loaded inside QMD/Smriti) Yes — only way to keep it warm across ingests Current planned trajectory; this issue is sized for this case
Local Ollama (separate daemon Ollama already owns) Probably not — Ollama is already the warm host Smriti just needs locking + a tight CLI invocation per fire
Remote API (Anthropic / OpenAI / Voyage / etc.) No (perf-wise) Network round-trip dominates; local cold start is invisible

Urgency vs correctness

The 42-concurrent-process incident this issue grew out of was a locking problem, not a warmth problem. The lockf -t 0 mitigation already shipped in ~/.claude/hooks/save-memory.sh solves the locking part end-to-end — pile-ups can no longer happen.

So the residual value of building this daemon decomposes as:

  • Locking / no-pile-up — already solved by lockf. No daemon needed.
  • FS-event-driven incremental ingest (no full rescan) — real win regardless of embedding host; ~constant whether item 3 is hot or cold.
  • Amortize Bun + SQLite cold start — ~150ms per fire. Real but not transformative.
  • Amortize embedding model load — 2–5s per fire. Transformative, but only if item 3 lives in our process.

If Smriti commits to in-process embedding, the daemon is correct + urgent.
If Smriti delegates embedding to Ollama or a remote API, the daemon is correct but not urgentlockf plus a tight CLI plus FS watching delivers most of the visible benefit at a fraction of the cost. The Ollama path in particular is interesting: Ollama is itself a daemon, so we'd be paying for two daemons to keep one model warm.

Decision gate

Before committing engineering time to this issue:

  • Confirm whether in-process embedding (via QMD's node-llama-cpp path) is part of the near-term roadmap, or whether embedding is delegated to Ollama / remote API.
  • If delegated: downgrade this issue to "future architecture" and promote lockf from temporary fix to permanent default. Re-scope the work to "FS watcher + debounce in front of the existing CLI," which is a much smaller change.
  • If in-process: proceed with the full design below.

Prior art: QMD's MCP daemon

QMD (our upstream) already solves half of this problem. qmd mcp --http --daemon is a long-lived HTTP MCP server that keeps embedding/rerank/generation models warm so AI agents querying via MCP don't pay the cold-start tax. The relevant pattern is in zero8dotdev/qmd:src/cli/qmd.ts around the mcp subcommand handler:

// Single-instance guard via PID file
const pidPath = resolve(cacheDir, "mcp.pid");
if (existsSync(pidPath)) {
  const existingPid = parseInt(readFileSync(pidPath, "utf-8").trim());
  try {
    process.kill(existingPid, 0);                 // alive?
    console.error(`Already running (PID ${existingPid}).`);
    process.exit(1);
  } catch { /* stale — continue */ }
}

// Detach
const child = nodeSpawn(process.execPath, spawnArgs, {
  detached: true,
  stdio: ["ignore", logFd, logFd],
});
child.unref();
writeFileSync(pidPath, String(child.pid));

And the lifecycle shape:

  • qmd mcp --http --daemon — start detached
  • qmd mcp stop — read PID file, SIGTERM, unlink
  • qmd statuskill(pid, 0) liveness probe, silently clean up stale PID files

What we inherit and what we add

Dimension QMD does it Smriti needs it
Warm models for query/search (read side) qmd mcp --daemon inherited via QMD
Warm models for ingest/index (write side) ❌ user runs qmd update manually gap this issue fills
FS watching for auto-ingest ❌ not applicable (user-driven) new in Smriti
Debouncing event bursts ❌ not applicable new in Smriti
Single-instance guard ✅ PID file + kill(pid, 0) borrow this pattern
Detach mechanism spawn({ detached: true }).unref() borrow this pattern
Lifecycle CLI (stop / status) ✅ clean shape mirror this

QMD's indexing model is user-driven (run qmd update when you want), so they never had the auto-fire pile-up problem and chose not to build FS watching or debouncing. Smriti deliberately is event-driven (auto-ingest after each Claude turn) because that's the only way to keep memory fresh without user friction — but that's exactly the choice that creates this issue's problem.

So the Smriti daemon is: QMD's MCP-daemon pattern applied to the write side, plus FS watcher and debounce queue.

Design

Single-instance: socket bind or PID file?

Two viable patterns:

  1. Unix socket bind (~/.cache/smriti/daemon.sock / \\.\pipe\smriti on Windows). Kernel auto-cleans on process exit; no stale-recovery code. Doubles as the IPC transport for the hook poke.
  2. PID file + kill(pid, 0) liveness probe — what QMD uses. Needs a small stale-cleanup branch but is dead simple.

Recommendation: socket bind. We need a socket anyway for the hook poke (see "Hook contract change" below), and a single mechanism doing single-instance + IPC is less surface area than maintaining both a PID file and a socket. QMD picked PID file because their daemon's IPC transport is already an HTTP port (which itself enforces single-instance), so a PID file is just for find/kill. We don't have an existing port to lean on — the socket is doing real work.

We'll still write the daemon PID into the socket directory (e.g. ~/.cache/smriti/daemon.pid) for diagnostics — same use as QMD's, error messages and smriti daemon status. Not load-bearing.

Boot work (once)

  • Open SQLite (WAL mode already)
  • Load embedding model into memory and keep it warm
  • Subscribe to FS events on all configured agent log roots (~/.claude/projects, ~/.codex, ~/.cline/tasks, Copilot storage dir, etc.) via fs.watch
    • macOS / Windows: { recursive: true } works natively
    • Linux: walk the tree at boot, watch each leaf directory, re-watch on dir-create events. Consider chokidar to abstract this.

Per-project debounce queue

On any FS event under a project's log directory:

  1. Resolve projectId from the path.
  2. Reset a per-project timer to SMRITI_DAEMON_DEBOUNCE_MS (default 30s).
  3. When the timer fires, run incremental ingest for that project's new content only.

Debouncing matters because Claude Code writes session JSONLs line-by-line — firing on the first write would re-ingest the same partial session repeatedly.

Hook contract change

Stop hook becomes a thin notifier:

#!/bin/bash
SOCK="$HOME/.cache/smriti/daemon.sock"
if [ -S "$SOCK" ]; then
  : | nc -U "$SOCK" 2>/dev/null
else
  /usr/bin/lockf -t 0 /tmp/smriti-ingest.lock smriti ingest claude 2>/dev/null
fi
exit 0

The poke is a hint, not a request — the daemon's own FS watch is the authoritative trigger. The lockf fallback keeps the system working when the daemon isn't running.

Auto-start on first use

If smriti ingest or any other entry point sees no socket, it forks a detached daemon and continues. This mirrors QMD's --daemon flag behaviour but is implicit rather than explicit — the user shouldn't have to remember to start anything:

Bun.spawn(["smriti", "daemon"], { stdio: ["ignore", "ignore", "ignore"] }).unref();

Lifecycle commands (mirror QMD's shape)

  • smriti daemon — run in foreground (for systemd/launchd integration and debugging)
  • smriti daemon start — fork-and-detach
  • smriti daemon stop — connect to socket, send shutdown, wait for clean exit (QMD does this via SIGTERM from PID file; ours can do it inline over the socket and fall back to PID-file SIGTERM)
  • smriti daemon status — report PID, uptime, pending queues, last ingest timestamps

Naming and CLI shape deliberately mirror qmd mcp stop / qmd mcp status so users moving between the two have one mental model.

Tasks

  • src/daemon/server.ts — socket bind, single-instance check, lifecycle signals
  • src/daemon/watcher.ts — FS watch abstraction with Linux recursion fallback
  • src/daemon/queue.ts — per-project debounce queue
  • src/daemon/client.tssmriti daemon stop/status and auto-start helpers
  • PID file at DAEMON_PID_FILE for diagnostics + smriti daemon status (matches QMD's pattern; not load-bearing for mutual exclusion)
  • Wire smriti daemon subcommand in src/index.ts
  • Update ~/.claude/hooks/save-memory.sh template + docs to use poke-with-fallback
  • Tests: socket bind contention, debounce coalescing, FS watch under Linux/macOS, auto-start race
  • Update CLAUDE.md quick reference and docs/internal/ingest-architecture.md
  • Postmortem reference: link docs/papers/stop-hook-never-stopped.md from the daemon docs

Out of scope (for now)

  • Cross-machine daemon (still single-user, single-machine)
  • launchd/systemd unit files — auto-start covers the common case; service files can come later
  • IPC API beyond poke/stop/status — we may want smriti daemon reindex <project> etc. eventually
  • Replacing in-process ingest entirely — the one-shot smriti ingest path stays as the fallback when no daemon is running
  • Combining with QMD's MCP daemon into a single process — they have different deps and lifecycles; keep separate

Acceptance

  1. With the daemon running, a Claude Code Stop hook completes in <50ms.
  2. New session files in ~/.claude/projects/ are reflected in smriti search within DEBOUNCE_MS + ingest_time of being written — no manual smriti ingest needed.
  3. Killing the daemon (SIGKILL) leaves no stale socket or DB lock; next invocation starts cleanly.
  4. Running smriti daemon twice on the same machine: second invocation exits 0 with a message, doesn't pile up.
  5. Concurrent Claude Code sessions across multiple projects never produce more than one smriti process (the daemon itself).
  6. smriti daemon stop / smriti daemon status behave like QMD's qmd mcp stop / qmd status — same UX, same error semantics for stale state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions