smriti daemon: long-running ingest service with file-watching and debounce

## Problem

`smriti ingest claude` is currently wired as a fire-and-forget hook running after every Claude Code Stop event. Because each invocation pays the full cold-start tax (Bun runtime + module graph + SQLite open + eventually embedding-model load) and the actual scan can run for minutes against a multi-month DB, concurrent fires across multiple Claude sessions stack up faster than they finish.

Today I had **42 concurrent `smriti ingest claude` processes** running, oldest from 4 days ago, with a combined ~13,449 minutes (~9 CPU-days) of duplicate work. Short-term mitigation: I wrapped the hook in `lockf -t 0 /tmp/smriti-ingest.lock` so only one runs at a time. That stops the pile-up but doesn't address the underlying cost.

Postmortem write-up: `docs/papers/stop-hook-never-stopped.md`.

## Goal

A long-lived `smriti daemon` that pays cold-start costs once and turns the Stop hook into a ~5ms poke. Eliminates the process-pile-up class of bugs by construction (single process), warms the embedding model, and replaces the current full-tree rescan with FS-event-driven incremental ingest.

The config constants `DAEMON_PID_FILE` and `DAEMON_DEBOUNCE_MS` in `src/config.ts` already exist — this issue completes that planned work.

## What this daemon is really for (and when it gets urgent)

Three things become "warm" inside a long-running daemon, and they are nowhere near equal in cost:

| Warm thing | Cold-start cost | RAM footprint |
|---|---|---|
| Bun runtime + module graph | ~100–200ms (TS parse + module resolution + Bun init) | ~30–60 MB |
| SQLite WAL connection + prepared statements | ~20–50ms (file open + WAL recovery + N × `prepare()`) | ~10 MB |
| Embedding model | **~2–5 seconds** (read ~300MB GGUF, init context) | **~300–500 MB resident** |

The embedding model is 20–50× the cost of the other two combined. The daemon's real economic case isn't "amortize cold start" in general — it's specifically **amortize the embedding model**. Items 1 and 2 are nice-to-have; item 3 is the entire argument.

That reframes the urgency of this issue around a single design choice: **where does the embedding model live?**

| Embedding model location | Daemon needed for perf? | Reasoning |
|---|---|---|
| **In-process** (`node-llama-cpp` loaded inside QMD/Smriti) | **Yes** — only way to keep it warm across ingests | Current planned trajectory; this issue is sized for this case |
| **Local Ollama** (separate daemon Ollama already owns) | Probably not — Ollama is already the warm host | Smriti just needs locking + a tight CLI invocation per fire |
| **Remote API** (Anthropic / OpenAI / Voyage / etc.) | No (perf-wise) | Network round-trip dominates; local cold start is invisible |

### Urgency vs correctness

The 42-concurrent-process incident this issue grew out of was a **locking** problem, not a **warmth** problem. The `lockf -t 0` mitigation already shipped in `~/.claude/hooks/save-memory.sh` solves the locking part end-to-end — pile-ups can no longer happen.

So the residual value of building this daemon decomposes as:

- **Locking / no-pile-up** — already solved by `lockf`. No daemon needed.
- **FS-event-driven incremental ingest (no full rescan)** — real win regardless of embedding host; ~constant whether item 3 is hot or cold.
- **Amortize Bun + SQLite cold start** — ~150ms per fire. Real but not transformative.
- **Amortize embedding model load** — 2–5s per fire. Transformative, but only if item 3 lives in our process.

If Smriti commits to in-process embedding, the daemon is **correct + urgent**.
If Smriti delegates embedding to Ollama or a remote API, the daemon is **correct but not urgent** — `lockf` plus a tight CLI plus FS watching delivers most of the visible benefit at a fraction of the cost. The Ollama path in particular is interesting: Ollama is itself a daemon, so we'd be paying for two daemons to keep one model warm.

### Decision gate

Before committing engineering time to this issue:

- [ ] Confirm whether in-process embedding (via QMD's `node-llama-cpp` path) is part of the near-term roadmap, or whether embedding is delegated to Ollama / remote API.
- [ ] If delegated: downgrade this issue to "future architecture" and promote `lockf` from temporary fix to permanent default. Re-scope the work to "FS watcher + debounce in front of the existing CLI," which is a much smaller change.
- [ ] If in-process: proceed with the full design below.

## Prior art: QMD's MCP daemon

QMD (our upstream) already solves half of this problem. `qmd mcp --http --daemon` is a long-lived HTTP MCP server that keeps embedding/rerank/generation models warm so AI agents querying via MCP don't pay the cold-start tax. The relevant pattern is in [`zero8dotdev/qmd:src/cli/qmd.ts`](https://github.com/zero8dotdev/qmd/blob/main/src/cli/qmd.ts) around the `mcp` subcommand handler:

```ts
// Single-instance guard via PID file
const pidPath = resolve(cacheDir, "mcp.pid");
if (existsSync(pidPath)) {
  const existingPid = parseInt(readFileSync(pidPath, "utf-8").trim());
  try {
    process.kill(existingPid, 0);                 // alive?
    console.error(`Already running (PID ${existingPid}).`);
    process.exit(1);
  } catch { /* stale — continue */ }
}

// Detach
const child = nodeSpawn(process.execPath, spawnArgs, {
  detached: true,
  stdio: ["ignore", logFd, logFd],
});
child.unref();
writeFileSync(pidPath, String(child.pid));
```

And the lifecycle shape:

- `qmd mcp --http --daemon` — start detached
- `qmd mcp stop` — read PID file, `SIGTERM`, unlink
- `qmd status` — `kill(pid, 0)` liveness probe, silently clean up stale PID files

### What we inherit and what we add

| Dimension | QMD does it | Smriti needs it |
|---|---|---|
| Warm models for **query/search** (read side) | ✅ `qmd mcp --daemon` | inherited via QMD |
| Warm models for **ingest/index** (write side) | ❌ user runs `qmd update` manually | **gap this issue fills** |
| FS watching for auto-ingest | ❌ not applicable (user-driven) | **new in Smriti** |
| Debouncing event bursts | ❌ not applicable | **new in Smriti** |
| Single-instance guard | ✅ PID file + `kill(pid, 0)` | borrow this pattern |
| Detach mechanism | ✅ `spawn({ detached: true }).unref()` | borrow this pattern |
| Lifecycle CLI (`stop` / `status`) | ✅ clean shape | mirror this |

QMD's indexing model is user-driven (run `qmd update` when you want), so they never had the auto-fire pile-up problem and chose not to build FS watching or debouncing. Smriti deliberately is event-driven (auto-ingest after each Claude turn) because that's the only way to keep memory fresh without user friction — but that's exactly the choice that creates this issue's problem.

So the Smriti daemon is: **QMD's MCP-daemon pattern applied to the write side, plus FS watcher and debounce queue.**

## Design

### Single-instance: socket bind or PID file?

Two viable patterns:

1. **Unix socket bind** (`~/.cache/smriti/daemon.sock` / `\\.\pipe\smriti` on Windows). Kernel auto-cleans on process exit; no stale-recovery code. Doubles as the IPC transport for the hook poke.
2. **PID file + `kill(pid, 0)` liveness probe** — what QMD uses. Needs a small stale-cleanup branch but is dead simple.

**Recommendation: socket bind.** We need a socket anyway for the hook poke (see "Hook contract change" below), and a single mechanism doing single-instance + IPC is less surface area than maintaining both a PID file and a socket. QMD picked PID file because their daemon's IPC transport is already an HTTP port (which itself enforces single-instance), so a PID file is just for find/kill. We don't have an existing port to lean on — the socket is doing real work.

We'll still write the daemon PID into the socket directory (e.g. `~/.cache/smriti/daemon.pid`) for diagnostics — same use as QMD's, error messages and `smriti daemon status`. Not load-bearing.

### Boot work (once)

- Open SQLite (WAL mode already)
- Load embedding model into memory and keep it warm
- Subscribe to FS events on all configured agent log roots (`~/.claude/projects`, `~/.codex`, `~/.cline/tasks`, Copilot storage dir, etc.) via `fs.watch`
  - macOS / Windows: `{ recursive: true }` works natively
  - Linux: walk the tree at boot, watch each leaf directory, re-watch on dir-create events. Consider `chokidar` to abstract this.

### Per-project debounce queue

On any FS event under a project's log directory:

1. Resolve `projectId` from the path.
2. Reset a per-project timer to `SMRITI_DAEMON_DEBOUNCE_MS` (default 30s).
3. When the timer fires, run incremental ingest for that project's new content only.

Debouncing matters because Claude Code writes session JSONLs line-by-line — firing on the first write would re-ingest the same partial session repeatedly.

### Hook contract change

Stop hook becomes a thin notifier:

```bash
#!/bin/bash
SOCK="$HOME/.cache/smriti/daemon.sock"
if [ -S "$SOCK" ]; then
  : | nc -U "$SOCK" 2>/dev/null
else
  /usr/bin/lockf -t 0 /tmp/smriti-ingest.lock smriti ingest claude 2>/dev/null
fi
exit 0
```

The poke is a hint, not a request — the daemon's own FS watch is the authoritative trigger. The lockf fallback keeps the system working when the daemon isn't running.

### Auto-start on first use

If `smriti ingest` or any other entry point sees no socket, it forks a detached daemon and continues. This mirrors QMD's `--daemon` flag behaviour but is implicit rather than explicit — the user shouldn't have to remember to start anything:

```ts
Bun.spawn(["smriti", "daemon"], { stdio: ["ignore", "ignore", "ignore"] }).unref();
```

### Lifecycle commands (mirror QMD's shape)

- `smriti daemon` — run in foreground (for systemd/launchd integration and debugging)
- `smriti daemon start` — fork-and-detach
- `smriti daemon stop` — connect to socket, send shutdown, wait for clean exit (QMD does this via `SIGTERM` from PID file; ours can do it inline over the socket and fall back to PID-file `SIGTERM`)
- `smriti daemon status` — report PID, uptime, pending queues, last ingest timestamps

Naming and CLI shape deliberately mirror `qmd mcp stop` / `qmd mcp status` so users moving between the two have one mental model.

## Tasks

- [ ] `src/daemon/server.ts` — socket bind, single-instance check, lifecycle signals
- [ ] `src/daemon/watcher.ts` — FS watch abstraction with Linux recursion fallback
- [ ] `src/daemon/queue.ts` — per-project debounce queue
- [ ] `src/daemon/client.ts` — `smriti daemon stop/status` and auto-start helpers
- [ ] PID file at `DAEMON_PID_FILE` for diagnostics + `smriti daemon status` (matches QMD's pattern; not load-bearing for mutual exclusion)
- [ ] Wire `smriti daemon` subcommand in `src/index.ts`
- [ ] Update `~/.claude/hooks/save-memory.sh` template + docs to use poke-with-fallback
- [ ] Tests: socket bind contention, debounce coalescing, FS watch under Linux/macOS, auto-start race
- [ ] Update `CLAUDE.md` quick reference and `docs/internal/ingest-architecture.md`
- [ ] Postmortem reference: link `docs/papers/stop-hook-never-stopped.md` from the daemon docs

## Out of scope (for now)

- Cross-machine daemon (still single-user, single-machine)
- launchd/systemd unit files — auto-start covers the common case; service files can come later
- IPC API beyond poke/stop/status — we may want `smriti daemon reindex <project>` etc. eventually
- Replacing in-process ingest entirely — the one-shot `smriti ingest` path stays as the fallback when no daemon is running
- Combining with QMD's MCP daemon into a single process — they have different deps and lifecycles; keep separate

## Acceptance

1. With the daemon running, a Claude Code Stop hook completes in <50ms.
2. New session files in `~/.claude/projects/` are reflected in `smriti search` within `DEBOUNCE_MS + ingest_time` of being written — no manual `smriti ingest` needed.
3. Killing the daemon (SIGKILL) leaves no stale socket or DB lock; next invocation starts cleanly.
4. Running `smriti daemon` twice on the same machine: second invocation exits 0 with a message, doesn't pile up.
5. Concurrent Claude Code sessions across multiple projects never produce more than one `smriti` process (the daemon itself).
6. `smriti daemon stop` / `smriti daemon status` behave like QMD's `qmd mcp stop` / `qmd status` — same UX, same error semantics for stale state.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smriti daemon: long-running ingest service with file-watching and debounce #70

Problem

Goal

What this daemon is really for (and when it gets urgent)

Urgency vs correctness

Decision gate

Prior art: QMD's MCP daemon

What we inherit and what we add

Design

Single-instance: socket bind or PID file?

Boot work (once)

Per-project debounce queue

Hook contract change

Auto-start on first use

Lifecycle commands (mirror QMD's shape)

Tasks

Out of scope (for now)

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Warm thing	Cold-start cost	RAM footprint
Bun runtime + module graph	~100–200ms (TS parse + module resolution + Bun init)	~30–60 MB
SQLite WAL connection + prepared statements	~20–50ms (file open + WAL recovery + N × `prepare()`)	~10 MB
Embedding model	~2–5 seconds (read ~300MB GGUF, init context)	~300–500 MB resident

Embedding model location	Daemon needed for perf?	Reasoning
In-process (`node-llama-cpp` loaded inside QMD/Smriti)	Yes — only way to keep it warm across ingests	Current planned trajectory; this issue is sized for this case
Local Ollama (separate daemon Ollama already owns)	Probably not — Ollama is already the warm host	Smriti just needs locking + a tight CLI invocation per fire
Remote API (Anthropic / OpenAI / Voyage / etc.)	No (perf-wise)	Network round-trip dominates; local cold start is invisible

Dimension	QMD does it	Smriti needs it
Warm models for query/search (read side)	✅ `qmd mcp --daemon`	inherited via QMD
Warm models for ingest/index (write side)	❌ user runs `qmd update` manually	gap this issue fills
FS watching for auto-ingest	❌ not applicable (user-driven)	new in Smriti
Debouncing event bursts	❌ not applicable	new in Smriti
Single-instance guard	✅ PID file + `kill(pid, 0)`	borrow this pattern
Detach mechanism	✅ `spawn({ detached: true }).unref()`	borrow this pattern
Lifecycle CLI (`stop` / `status`)	✅ clean shape	mirror this

smriti daemon: long-running ingest service with file-watching and debounce #70

Description

Problem

Goal

What this daemon is really for (and when it gets urgent)

Urgency vs correctness

Decision gate

Prior art: QMD's MCP daemon

What we inherit and what we add

Design

Single-instance: socket bind or PID file?

Boot work (once)

Per-project debounce queue

Hook contract change

Auto-start on first use

Lifecycle commands (mirror QMD's shape)

Tasks

Out of scope (for now)

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions