Problem
smriti ingest claude is currently wired as a fire-and-forget hook running after every Claude Code Stop event. Because each invocation pays the full cold-start tax (Bun runtime + module graph + SQLite open + eventually embedding-model load) and the actual scan can run for minutes against a multi-month DB, concurrent fires across multiple Claude sessions stack up faster than they finish.
Today I had 42 concurrent smriti ingest claude processes running, oldest from 4 days ago, with a combined ~13,449 minutes (~9 CPU-days) of duplicate work. Short-term mitigation: I wrapped the hook in lockf -t 0 /tmp/smriti-ingest.lock so only one runs at a time. That stops the pile-up but doesn't address the underlying cost.
Postmortem write-up: docs/papers/stop-hook-never-stopped.md.
Goal
A long-lived smriti daemon that pays cold-start costs once and turns the Stop hook into a ~5ms poke. Eliminates the process-pile-up class of bugs by construction (single process), warms the embedding model, and replaces the current full-tree rescan with FS-event-driven incremental ingest.
The config constants DAEMON_PID_FILE and DAEMON_DEBOUNCE_MS in src/config.ts already exist — this issue completes that planned work.
What this daemon is really for (and when it gets urgent)
Three things become "warm" inside a long-running daemon, and they are nowhere near equal in cost:
| Warm thing |
Cold-start cost |
RAM footprint |
| Bun runtime + module graph |
~100–200ms (TS parse + module resolution + Bun init) |
~30–60 MB |
| SQLite WAL connection + prepared statements |
~20–50ms (file open + WAL recovery + N × prepare()) |
~10 MB |
| Embedding model |
~2–5 seconds (read ~300MB GGUF, init context) |
~300–500 MB resident |
The embedding model is 20–50× the cost of the other two combined. The daemon's real economic case isn't "amortize cold start" in general — it's specifically amortize the embedding model. Items 1 and 2 are nice-to-have; item 3 is the entire argument.
That reframes the urgency of this issue around a single design choice: where does the embedding model live?
| Embedding model location |
Daemon needed for perf? |
Reasoning |
In-process (node-llama-cpp loaded inside QMD/Smriti) |
Yes — only way to keep it warm across ingests |
Current planned trajectory; this issue is sized for this case |
| Local Ollama (separate daemon Ollama already owns) |
Probably not — Ollama is already the warm host |
Smriti just needs locking + a tight CLI invocation per fire |
| Remote API (Anthropic / OpenAI / Voyage / etc.) |
No (perf-wise) |
Network round-trip dominates; local cold start is invisible |
Urgency vs correctness
The 42-concurrent-process incident this issue grew out of was a locking problem, not a warmth problem. The lockf -t 0 mitigation already shipped in ~/.claude/hooks/save-memory.sh solves the locking part end-to-end — pile-ups can no longer happen.
So the residual value of building this daemon decomposes as:
- Locking / no-pile-up — already solved by
lockf. No daemon needed.
- FS-event-driven incremental ingest (no full rescan) — real win regardless of embedding host; ~constant whether item 3 is hot or cold.
- Amortize Bun + SQLite cold start — ~150ms per fire. Real but not transformative.
- Amortize embedding model load — 2–5s per fire. Transformative, but only if item 3 lives in our process.
If Smriti commits to in-process embedding, the daemon is correct + urgent.
If Smriti delegates embedding to Ollama or a remote API, the daemon is correct but not urgent — lockf plus a tight CLI plus FS watching delivers most of the visible benefit at a fraction of the cost. The Ollama path in particular is interesting: Ollama is itself a daemon, so we'd be paying for two daemons to keep one model warm.
Decision gate
Before committing engineering time to this issue:
Prior art: QMD's MCP daemon
QMD (our upstream) already solves half of this problem. qmd mcp --http --daemon is a long-lived HTTP MCP server that keeps embedding/rerank/generation models warm so AI agents querying via MCP don't pay the cold-start tax. The relevant pattern is in zero8dotdev/qmd:src/cli/qmd.ts around the mcp subcommand handler:
// Single-instance guard via PID file
const pidPath = resolve(cacheDir, "mcp.pid");
if (existsSync(pidPath)) {
const existingPid = parseInt(readFileSync(pidPath, "utf-8").trim());
try {
process.kill(existingPid, 0); // alive?
console.error(`Already running (PID ${existingPid}).`);
process.exit(1);
} catch { /* stale — continue */ }
}
// Detach
const child = nodeSpawn(process.execPath, spawnArgs, {
detached: true,
stdio: ["ignore", logFd, logFd],
});
child.unref();
writeFileSync(pidPath, String(child.pid));
And the lifecycle shape:
qmd mcp --http --daemon — start detached
qmd mcp stop — read PID file, SIGTERM, unlink
qmd status — kill(pid, 0) liveness probe, silently clean up stale PID files
What we inherit and what we add
| Dimension |
QMD does it |
Smriti needs it |
| Warm models for query/search (read side) |
✅ qmd mcp --daemon |
inherited via QMD |
| Warm models for ingest/index (write side) |
❌ user runs qmd update manually |
gap this issue fills |
| FS watching for auto-ingest |
❌ not applicable (user-driven) |
new in Smriti |
| Debouncing event bursts |
❌ not applicable |
new in Smriti |
| Single-instance guard |
✅ PID file + kill(pid, 0) |
borrow this pattern |
| Detach mechanism |
✅ spawn({ detached: true }).unref() |
borrow this pattern |
Lifecycle CLI (stop / status) |
✅ clean shape |
mirror this |
QMD's indexing model is user-driven (run qmd update when you want), so they never had the auto-fire pile-up problem and chose not to build FS watching or debouncing. Smriti deliberately is event-driven (auto-ingest after each Claude turn) because that's the only way to keep memory fresh without user friction — but that's exactly the choice that creates this issue's problem.
So the Smriti daemon is: QMD's MCP-daemon pattern applied to the write side, plus FS watcher and debounce queue.
Design
Single-instance: socket bind or PID file?
Two viable patterns:
- Unix socket bind (
~/.cache/smriti/daemon.sock / \\.\pipe\smriti on Windows). Kernel auto-cleans on process exit; no stale-recovery code. Doubles as the IPC transport for the hook poke.
- PID file +
kill(pid, 0) liveness probe — what QMD uses. Needs a small stale-cleanup branch but is dead simple.
Recommendation: socket bind. We need a socket anyway for the hook poke (see "Hook contract change" below), and a single mechanism doing single-instance + IPC is less surface area than maintaining both a PID file and a socket. QMD picked PID file because their daemon's IPC transport is already an HTTP port (which itself enforces single-instance), so a PID file is just for find/kill. We don't have an existing port to lean on — the socket is doing real work.
We'll still write the daemon PID into the socket directory (e.g. ~/.cache/smriti/daemon.pid) for diagnostics — same use as QMD's, error messages and smriti daemon status. Not load-bearing.
Boot work (once)
- Open SQLite (WAL mode already)
- Load embedding model into memory and keep it warm
- Subscribe to FS events on all configured agent log roots (
~/.claude/projects, ~/.codex, ~/.cline/tasks, Copilot storage dir, etc.) via fs.watch
- macOS / Windows:
{ recursive: true } works natively
- Linux: walk the tree at boot, watch each leaf directory, re-watch on dir-create events. Consider
chokidar to abstract this.
Per-project debounce queue
On any FS event under a project's log directory:
- Resolve
projectId from the path.
- Reset a per-project timer to
SMRITI_DAEMON_DEBOUNCE_MS (default 30s).
- When the timer fires, run incremental ingest for that project's new content only.
Debouncing matters because Claude Code writes session JSONLs line-by-line — firing on the first write would re-ingest the same partial session repeatedly.
Hook contract change
Stop hook becomes a thin notifier:
#!/bin/bash
SOCK="$HOME/.cache/smriti/daemon.sock"
if [ -S "$SOCK" ]; then
: | nc -U "$SOCK" 2>/dev/null
else
/usr/bin/lockf -t 0 /tmp/smriti-ingest.lock smriti ingest claude 2>/dev/null
fi
exit 0
The poke is a hint, not a request — the daemon's own FS watch is the authoritative trigger. The lockf fallback keeps the system working when the daemon isn't running.
Auto-start on first use
If smriti ingest or any other entry point sees no socket, it forks a detached daemon and continues. This mirrors QMD's --daemon flag behaviour but is implicit rather than explicit — the user shouldn't have to remember to start anything:
Bun.spawn(["smriti", "daemon"], { stdio: ["ignore", "ignore", "ignore"] }).unref();
Lifecycle commands (mirror QMD's shape)
smriti daemon — run in foreground (for systemd/launchd integration and debugging)
smriti daemon start — fork-and-detach
smriti daemon stop — connect to socket, send shutdown, wait for clean exit (QMD does this via SIGTERM from PID file; ours can do it inline over the socket and fall back to PID-file SIGTERM)
smriti daemon status — report PID, uptime, pending queues, last ingest timestamps
Naming and CLI shape deliberately mirror qmd mcp stop / qmd mcp status so users moving between the two have one mental model.
Tasks
Out of scope (for now)
- Cross-machine daemon (still single-user, single-machine)
- launchd/systemd unit files — auto-start covers the common case; service files can come later
- IPC API beyond poke/stop/status — we may want
smriti daemon reindex <project> etc. eventually
- Replacing in-process ingest entirely — the one-shot
smriti ingest path stays as the fallback when no daemon is running
- Combining with QMD's MCP daemon into a single process — they have different deps and lifecycles; keep separate
Acceptance
- With the daemon running, a Claude Code Stop hook completes in <50ms.
- New session files in
~/.claude/projects/ are reflected in smriti search within DEBOUNCE_MS + ingest_time of being written — no manual smriti ingest needed.
- Killing the daemon (SIGKILL) leaves no stale socket or DB lock; next invocation starts cleanly.
- Running
smriti daemon twice on the same machine: second invocation exits 0 with a message, doesn't pile up.
- Concurrent Claude Code sessions across multiple projects never produce more than one
smriti process (the daemon itself).
smriti daemon stop / smriti daemon status behave like QMD's qmd mcp stop / qmd status — same UX, same error semantics for stale state.
Problem
smriti ingest claudeis currently wired as a fire-and-forget hook running after every Claude Code Stop event. Because each invocation pays the full cold-start tax (Bun runtime + module graph + SQLite open + eventually embedding-model load) and the actual scan can run for minutes against a multi-month DB, concurrent fires across multiple Claude sessions stack up faster than they finish.Today I had 42 concurrent
smriti ingest claudeprocesses running, oldest from 4 days ago, with a combined ~13,449 minutes (~9 CPU-days) of duplicate work. Short-term mitigation: I wrapped the hook inlockf -t 0 /tmp/smriti-ingest.lockso only one runs at a time. That stops the pile-up but doesn't address the underlying cost.Postmortem write-up:
docs/papers/stop-hook-never-stopped.md.Goal
A long-lived
smriti daemonthat pays cold-start costs once and turns the Stop hook into a ~5ms poke. Eliminates the process-pile-up class of bugs by construction (single process), warms the embedding model, and replaces the current full-tree rescan with FS-event-driven incremental ingest.The config constants
DAEMON_PID_FILEandDAEMON_DEBOUNCE_MSinsrc/config.tsalready exist — this issue completes that planned work.What this daemon is really for (and when it gets urgent)
Three things become "warm" inside a long-running daemon, and they are nowhere near equal in cost:
prepare())The embedding model is 20–50× the cost of the other two combined. The daemon's real economic case isn't "amortize cold start" in general — it's specifically amortize the embedding model. Items 1 and 2 are nice-to-have; item 3 is the entire argument.
That reframes the urgency of this issue around a single design choice: where does the embedding model live?
node-llama-cpploaded inside QMD/Smriti)Urgency vs correctness
The 42-concurrent-process incident this issue grew out of was a locking problem, not a warmth problem. The
lockf -t 0mitigation already shipped in~/.claude/hooks/save-memory.shsolves the locking part end-to-end — pile-ups can no longer happen.So the residual value of building this daemon decomposes as:
lockf. No daemon needed.If Smriti commits to in-process embedding, the daemon is correct + urgent.
If Smriti delegates embedding to Ollama or a remote API, the daemon is correct but not urgent —
lockfplus a tight CLI plus FS watching delivers most of the visible benefit at a fraction of the cost. The Ollama path in particular is interesting: Ollama is itself a daemon, so we'd be paying for two daemons to keep one model warm.Decision gate
Before committing engineering time to this issue:
node-llama-cpppath) is part of the near-term roadmap, or whether embedding is delegated to Ollama / remote API.lockffrom temporary fix to permanent default. Re-scope the work to "FS watcher + debounce in front of the existing CLI," which is a much smaller change.Prior art: QMD's MCP daemon
QMD (our upstream) already solves half of this problem.
qmd mcp --http --daemonis a long-lived HTTP MCP server that keeps embedding/rerank/generation models warm so AI agents querying via MCP don't pay the cold-start tax. The relevant pattern is inzero8dotdev/qmd:src/cli/qmd.tsaround themcpsubcommand handler:And the lifecycle shape:
qmd mcp --http --daemon— start detachedqmd mcp stop— read PID file,SIGTERM, unlinkqmd status—kill(pid, 0)liveness probe, silently clean up stale PID filesWhat we inherit and what we add
qmd mcp --daemonqmd updatemanuallykill(pid, 0)spawn({ detached: true }).unref()stop/status)QMD's indexing model is user-driven (run
qmd updatewhen you want), so they never had the auto-fire pile-up problem and chose not to build FS watching or debouncing. Smriti deliberately is event-driven (auto-ingest after each Claude turn) because that's the only way to keep memory fresh without user friction — but that's exactly the choice that creates this issue's problem.So the Smriti daemon is: QMD's MCP-daemon pattern applied to the write side, plus FS watcher and debounce queue.
Design
Single-instance: socket bind or PID file?
Two viable patterns:
~/.cache/smriti/daemon.sock/\\.\pipe\smrition Windows). Kernel auto-cleans on process exit; no stale-recovery code. Doubles as the IPC transport for the hook poke.kill(pid, 0)liveness probe — what QMD uses. Needs a small stale-cleanup branch but is dead simple.Recommendation: socket bind. We need a socket anyway for the hook poke (see "Hook contract change" below), and a single mechanism doing single-instance + IPC is less surface area than maintaining both a PID file and a socket. QMD picked PID file because their daemon's IPC transport is already an HTTP port (which itself enforces single-instance), so a PID file is just for find/kill. We don't have an existing port to lean on — the socket is doing real work.
We'll still write the daemon PID into the socket directory (e.g.
~/.cache/smriti/daemon.pid) for diagnostics — same use as QMD's, error messages andsmriti daemon status. Not load-bearing.Boot work (once)
~/.claude/projects,~/.codex,~/.cline/tasks, Copilot storage dir, etc.) viafs.watch{ recursive: true }works nativelychokidarto abstract this.Per-project debounce queue
On any FS event under a project's log directory:
projectIdfrom the path.SMRITI_DAEMON_DEBOUNCE_MS(default 30s).Debouncing matters because Claude Code writes session JSONLs line-by-line — firing on the first write would re-ingest the same partial session repeatedly.
Hook contract change
Stop hook becomes a thin notifier:
The poke is a hint, not a request — the daemon's own FS watch is the authoritative trigger. The lockf fallback keeps the system working when the daemon isn't running.
Auto-start on first use
If
smriti ingestor any other entry point sees no socket, it forks a detached daemon and continues. This mirrors QMD's--daemonflag behaviour but is implicit rather than explicit — the user shouldn't have to remember to start anything:Lifecycle commands (mirror QMD's shape)
smriti daemon— run in foreground (for systemd/launchd integration and debugging)smriti daemon start— fork-and-detachsmriti daemon stop— connect to socket, send shutdown, wait for clean exit (QMD does this viaSIGTERMfrom PID file; ours can do it inline over the socket and fall back to PID-fileSIGTERM)smriti daemon status— report PID, uptime, pending queues, last ingest timestampsNaming and CLI shape deliberately mirror
qmd mcp stop/qmd mcp statusso users moving between the two have one mental model.Tasks
src/daemon/server.ts— socket bind, single-instance check, lifecycle signalssrc/daemon/watcher.ts— FS watch abstraction with Linux recursion fallbacksrc/daemon/queue.ts— per-project debounce queuesrc/daemon/client.ts—smriti daemon stop/statusand auto-start helpersDAEMON_PID_FILEfor diagnostics +smriti daemon status(matches QMD's pattern; not load-bearing for mutual exclusion)smriti daemonsubcommand insrc/index.ts~/.claude/hooks/save-memory.shtemplate + docs to use poke-with-fallbackCLAUDE.mdquick reference anddocs/internal/ingest-architecture.mddocs/papers/stop-hook-never-stopped.mdfrom the daemon docsOut of scope (for now)
smriti daemon reindex <project>etc. eventuallysmriti ingestpath stays as the fallback when no daemon is runningAcceptance
~/.claude/projects/are reflected insmriti searchwithinDEBOUNCE_MS + ingest_timeof being written — no manualsmriti ingestneeded.smriti daemontwice on the same machine: second invocation exits 0 with a message, doesn't pile up.smritiprocess (the daemon itself).smriti daemon stop/smriti daemon statusbehave like QMD'sqmd mcp stop/qmd status— same UX, same error semantics for stale state.