Skip to content

perf: async analytics/leaderboard + SQLite FTS5 + live partial results#148

Closed
apstenku123 wants to merge 14 commits intovakovalskii:mainfrom
apstenku123:perf/async-jobs-sqlite-fts5
Closed

perf: async analytics/leaderboard + SQLite FTS5 + live partial results#148
apstenku123 wants to merge 14 commits intovakovalskii:mainfrom
apstenku123:perf/async-jobs-sqlite-fts5

Conversation

@apstenku123
Copy link
Copy Markdown

Summary

  • Non-blocking loadSessions() cold path + async background warmer for parse/cost caches
  • Async /api/analytics/cost and /api/leaderboard jobs with live partial results (UI shows $0 → $5562 → $11344 → ... → final as sessions aggregate)
  • Persistent SQLite + FTS5 index (~/.codedash/cache/index.sqlite) for sessions, messages, daily stats, and aggregate result cache
  • Shared groupSessionsByConversation helper used by Timeline / All Sessions / Projects / Cloud Sync / Activity — collapses codex exec retries of the same prompt into one representative card with +N more badge
  • Fixes getActiveSessions O(N) lsof blocking (80s → 350ms), broken regex that missed codex-up / codex-up-exec, and lsof -a flag so pid filter actually applies
  • Fixes massive user_messages overcount: Claude type=user entries include tool_result blocks (28x inflation measured on a real session)
  • Detects scripted sub-agent runs (originator=codex_exec + 9 first-prompt regex patterns) and hides them from default counts; ?include_helpers=1 opts in

Why

On a corpus of 4873 sessions / 1.1 GB of JSONL the GUI hung:

  • loadSessions() cold: 109 seconds (re-parsed every Claude file sync, plus 112s of sync git rev-parse × 56 projects, plus sync parseClaudeSessionFile on a 199 MB file)
  • /api/active: 80 seconds (matched 34 processes named codex-up* and ran per-pid lsof + loadSessions inside the loop)
  • /api/analytics/cost + /api/leaderboard: blocking, client got ERR_TIMED_OUT
  • Spammed [ACTIVE] pid=... codex/waiting on every /api/active poll

Chrome tabs accumulated hung keep-alive connections and the server eventually looked dead.

After

metric v6.15.10 cold this branch
loadSessions() 109 s 72 ms (cacheOnly)
/api/active 80 s (34 procs) 350 ms cold, 0 ms warm
searchFullText 3–10 s (in-mem rebuild) 5 ms (FTS5 MATCH)
/api/analytics/cost first byte ~30 s blocking ~20 ms (partial)
Full analytics cold run 30+ s blocking ~15 s (with live UI)
Full analytics repeat visit 30+ s (recomputed) instant (SQLite aggregate_cache hit)

On the test corpus leaderboard counts went from 2869 sessions / 867k prompts (inflated by retries + tool_results + scripted helpers) to a realistic 612 unique conversations / 8458 real user prompts / $53,637 real spend / 9-day streak.

Architectural notes

  • WAL-mode SQLite via _execAsync(sql) that spawns sqlite3 -cmd '.timeout 30000' <db> and streams SQL into stdin, so concurrent writers wait instead of erroring with database is locked. Reads don't block writers.
  • Fingerprint for aggregate_cache is quantized to 5-minute buckets so live codex processes appending to their rollout files don't invalidate the cache on every request.
  • Persistent cache dir at ~/.codedash/cache/ (was os.tmpdir() — macOS tmpdir cleanup wipes hours of parse work). Legacy tmpdir paths are migrated once on load.
  • Helper detection — two signals:
    1. session_meta.payload.originator === 'codex_exec' (standard codex exec)
    2. Regex match on first user prompt: ^You are in /, ^Read-only task\., ^Work (only )?in /, ^Pair-local .* lane, ^## X Y Agent, ^Read $OMX_, etc.

New endpoints

  • GET /api/warming{running, done, total, phase} for background parse
  • GET /api/sqlite-status{backfill, index: {sessions, messages, files, db_bytes}}
  • GET /api/analytics/cost now returns {status: 'running'|'done'|'error', progress, partialResult, result} (backwards-compat: done spreads result fields at the top level so existing UI code still works)
  • GET /api/leaderboard — same async job shape
  • GET /api/sessions?include_helpers=1 — opt in to scripted helper sessions (default hidden; X-Helper-Count header reports the filtered count)

Upgrade note

After installing users should do a hard-reload (Cmd+Shift+R) once — the poll loops in the split frontend modules are new code and old cached JS will show a stuck Loading… spinner.

Test plan

  • 4873-session corpus: cold loadSessions 109 s → 72 ms
  • Live partial result visible in Analytics poll loop
  • Cache hit path: after first computation, subsequent /api/analytics/cost returns status: 'done' with the cached result
  • SQLite FTS5 search: analytics query returns 3 highlighted snippets in 58 ms on a 1.2M-message corpus
  • Helper filter: 4873 → 2869 sessions by default, ?include_helpers=1 restores full set
  • getActiveSessions correctly picks up codex-up processes with cwd /Volumes/external/sources/nanochat (not kernel addresses)
  • Server survives SIGTERM with periodic cache flush (no lost parse work)
  • Persistent aggregate_cache survives process restart — verified via sqlite3 ~/.codedash/cache/index.sqlite "SELECT kind, length(result_json) FROM aggregate_cache"

Stops the GUI from hanging on users with large session histories
(4800+ codex rollouts / 1 GB+ JSONL). Tested on a 4873-session corpus
where cold Analytics previously took 100+ seconds to respond.

## Core changes

- **Non-blocking loadSessions** — sync path reads metadata only, uses
  the parse/cost disk caches, and queues uncached files for a
  background warmer. Cold `/api/sessions` now returns in ~300ms
  instead of blocking for 100+ seconds.
- **Async background jobs** for `/api/analytics/cost` and
  `/api/leaderboard`. HTTP returns a `{status, progress, partialResult}`
  snapshot immediately; the client polls at 500ms and the job publishes
  a live partial aggregate on each chunk so users see real numbers
  climb ($0 → $5562 → $11344 → ... → final).
- **Incremental cost aggregator** (`createCostAggregator`) — extracted
  from `getCostAnalytics` so the job can merge sessions one chunk at a
  time and finalize a snapshot per yield.
- **SQLite + FTS5 index** (`src/sqlite-index.js`) at
  `~/.codedash/cache/index.sqlite` — persistent sessions/messages/
  messages_fts (FTS5 porter+unicode61)/daily_stats/files_seen/
  aggregate_cache tables. Full-text search runs via `MATCH` in ~5 ms
  on 1.2M messages. Uses async `spawn` with `-cmd .timeout 30000` so
  concurrent writers don't deadlock.
- **Aggregate result cache** persisted in `aggregate_cache` with a
  5-min quantized fingerprint (`count|max_ts_bucket|filters|helpers`).
  Repeat visits within the bucket are instant; active codex writes
  don't invalidate the cache on every request.

## Hot-path fixes found along the way

- `getActiveSessions` ran `lsof` per matching process with a 2s
  timeout and called `loadSessions()` inside the loop → blocked the
  event loop for minutes when 30+ codex-up wrappers were running.
  Now: single `ps` call, tight regex (catches `codex`/`codex-up`/
  `codex-up-exec` as binary names), batched `lsof -a -d cwd -Fpn -p`
  (the `-a` flag is critical; without it lsof ORs conditions and
  returns cwds for unrelated processes), pid→cwd cache, 3s result
  cache, no inner loadSessions. 80s → 350ms cold, 0ms cached.
- `resolveGitRoot` called `git rev-parse` synchronously per unique
  project path — 56 projects × 2s = 112s of blocked event loop. Now
  queued to a background resolver with its own disk cache in
  `~/.codedash/cache/git-root-cache.json`.
- `scanCodexSessions` was O(n²) via `.find()` on an array, and
  re-read every rollout file on each call. Now: `Map<sid, session>`
  + `cacheOnly` parse mode + background warmer drains uncached files.
- `parseClaudeSessionFileAsync` — streaming read via `readline` for
  files >5 MB (user had a 199 MB session file). Yields to the event
  loop every 2000 lines via `await new Promise(r => setImmediate(r))`
  so HTTP requests aren't starved during parse.
- Persistent cache paths moved from `os.tmpdir()` to
  `~/.codedash/cache/` so macOS tmpdir cleanup doesn't wipe a
  morning's worth of parse work.
- Flush handlers on SIGINT/SIGTERM + periodic flush every 50 entries
  — killing the server mid-warm no longer loses hours of progress.

## Accuracy fixes

- `parseClaudeSessionFile` counted every `type=user` entry as a user
  prompt, but Claude Code stores tool_results as `type=user` with
  `content: [{type:'tool_result', ...}]`. One measured session had
  480 type=user entries but only 17 real user prompts — a 28x
  overcount. Now checks `content` for a real `text` block.
- `isSystemMessage` extended to skip Codex runtime injections that
  were counted as user prompts: `<cwd>`, `<turn_aborted>`,
  `<ide_selection>`, `<command_output>`, `# CLAUDE.md`,
  `Warning: The maximum number of unified exec`, `AUTOSTEERING:`,
  `[Sub-agent results]`.
- **Helper session detection** — codex rollouts with
  `session_meta.payload.originator === 'codex_exec'` (or scripted
  first-message patterns like `You are in /...`, `Read-only task.`,
  `Work in /...`, `Pair-local ... lane`, `## X Y Agent`, etc) are
  flagged `is_helper: true`. `/api/sessions` filters them by default;
  `?include_helpers=1` opts in. On the test corpus this removes
  2166/4873 scripted sub-agent runs.
- Leaderboard now counts **unique conversations** (via
  `group_key`), not retries. Real cost is summed over all rollouts
  (actual money spent); session count uses deduped groups. On the
  test corpus: 2869 → 612 unique conversations, 867k → 8458
  real prompts, cost stays at the real $53,637.

## Shared grouping helper

- `computeSessionGroupKey(s)` in data.js: `tool::project::firstMsg[0..200]`
  (or `helper::project` for helpers) — computed once per session on
  load, exposed as `s.group_key`.
- `groupSessionsByConversation(sessions)` in frontend/app.js — shared
  by **Timeline**, **All Sessions**, **Projects view**,
  **Cloud Sync**, and **Activity/Heatmap**. One helper, one
  representative per group, `+N more` badge on cards.

## Logging

- `CODEDASH_LOG=0` (default) silences stdout spam (previous
  `[ACTIVE] pid=... codex/waiting cpu=0%` lines were emitted on
  every /api/active poll). `ERROR`/`WARN`/`JOB` still go to stdout.
- All logs (including verbose tags) always go to
  `~/.codedash/logs/server.log` with timestamps. Set
  `CODEDASH_LOG=1` to also mirror to stdout.

## Exports / new endpoints

- `loadSessionsAsync(progressCb)` — async variant with incremental
  mtime-based change detection.
- `getWarmingStatus()` + `/api/warming` — background parse progress.
- `getSqliteBackfillStatus()` + `/api/sqlite-status` — FTS5 ingest
  progress + index size.
- `createCostAggregator()`, `computeSessionCostForAnalytics(session,
  opencodeCache)`, `buildOpencodeCostCache(sessions)` — so the async
  jobs can stream-aggregate without re-wiring `getCostAnalytics`.

## Benchmark (user's machine, 4873 sessions / 1.1 GB JSONL)

```
               before (v6.15.10)    after
loadSessions   109 s   (cold)      72 ms   (cacheOnly path)
/api/active    80 s    (34 procs)  350 ms  cold, 0 ms warm
search         3–10 s  (rebuild)   5 ms    (FTS5 MATCH)
analytics      30 s    (blocking)  first partial in ~500 ms,
                                   full in ~15 s, instant on cache hit
leaderboard    35 s    (blocking)  first partial in ~500 ms,
                                   full in ~15 s, instant on cache hit
```

Browser cache note: after upgrading, users should hard-reload
(Cmd+Shift+R) once so the split frontend modules re-load — the poll
loops are new code and old cached JS will show the initial
"Loading..." spinner without progressing.
Copilot AI review requested due to automatic review settings April 9, 2026 00:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors codedash’s analytics/leaderboard/search/session-loading paths to avoid long synchronous stalls by introducing background warming, async “job” endpoints with live partial results, and a persistent SQLite (FTS5) index/cache under ~/.codedash/cache/.

Changes:

  • Add a persistent SQLite + FTS5 index and aggregate-result cache for fast full-text search and repeat analytics/leaderboard visits.
  • Convert /api/analytics/cost and /api/leaderboard into async background jobs that stream progress + partial snapshots to the UI.
  • Add session de-duplication by conversation group (retry/resume collapsing) across multiple frontend views, plus default filtering of helper/sub-agent sessions.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/sqlite-index.js New sqlite3 CLI wrapper, schema, FTS search, and persistent aggregate cache helpers.
src/server.js Adds async analytics/leaderboard job endpoints, new status endpoints, helper filtering, and routes search via SQLite.
src/data.js Implements cache-only session loading + background warmer, SQLite ingest/backfill, helper detection, grouping keys, and streaming analytics aggregation.
src/frontend/analytics.js Adds polling loop and live “partial results” rendering for analytics.
src/frontend/leaderboard.js Adds polling loop and progress UI for leaderboard.
src/frontend/app.js Adds shared grouping helper + UI badges for collapsed conversation groups.
src/frontend/heatmap.js Uses shared grouping helper to dedupe sessions before counting.
src/frontend/cloud.js Uses shared grouping helper and shows “+N more” aggregate info.
src/frontend/styles.css Adds styles for group badges and analytics/leaderboard progress UI.
Comments suppressed due to low confidence (1)

src/data.js:671

  • parseClaudeSessionFile() compares entry.timestamp directly to numeric firstTs/lastTs. Elsewhere (e.g. _computeSessionDailyBreakdown) timestamps are handled as either numbers or ISO strings. If entry.timestamp is a string here, the comparisons can produce incorrect first/last timestamps. Normalize entry.timestamp to an epoch-ms number before comparing/assigning.
      if (entry.timestamp) {
        if (entry.timestamp < firstTs) firstTs = entry.timestamp;
        if (entry.timestamp > lastTs) lastTs = entry.timestamp;
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +712 to +716
let served = false;
try {
const sessions = loadSessions();
const filtered = includeHelpers ? sessions : sessions.filter(s => !s.is_helper);
const fingerprint = _analyticsFingerprint(filtered, '', '', includeHelpers ? 'h1' : 'h0');
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concurrency issue as analytics: the leaderboard endpoint starts _runLeaderboardJob() only after an async cache lookup resolves. Concurrent requests for the same key can start multiple jobs and race to overwrite _jobs.leaderboard. Consider deduplicating in-flight work per key (store a promise/state immediately) so only one job runs per key at a time.

Copilot uses AI. Check for mistakes.
Comment on lines +411 to +420
function _ensureSqliteBackfillRunning() {
if (_sqliteBackfillRunning) return;
let sqliteIndex;
try { sqliteIndex = require('./sqlite-index'); } catch { return; }

_sqliteBackfillRunning = true;
_sqliteBackfillStatus.running = true;
_sqliteBackfillStatus.startedAt = Date.now();
_sqliteBackfillStatus.phase = 'scanning';
_sqliteBackfillStatus.done = 0;
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ensureSqliteBackfillRunning() is described as a "one-shot" task, but there is no guard for the completed state. After the backfill finishes (_sqliteBackfillRunning resets to false), subsequent loadSessions() calls will start another full scan/backfill. Consider adding a persistent "completed" flag (or checking _sqliteBackfillStatus.phase === 'done') to avoid repeated full rescans.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +38
function _exec(sql, opts) {
opts = opts || {};
const args = ['-cmd', _CMD_BUSY, DB_FILE];
if (opts.json) args.push('-json');
args.push(sql);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_exec()/_execAsync() append -json after DB_FILE in the sqlite3 argv. The sqlite3 CLI stops parsing options once it sees the database filename, so -json in that position is treated as part of the SQL and can break all JSON queries (e.g. _execJson, loadAllFilesSeen, getIndexStatus, aggregate cache reads). Build args with options first (e.g. ['-cmd', _CMD_BUSY, '-json', DB_FILE]).

Copilot uses AI. Check for mistakes.
apstenku123 and others added 13 commits April 9, 2026 12:42
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…lskii#5, vakovalskii#10)

- vakovalskii#10: Move `-json` flag BEFORE DB_FILE in both `_exec()` and
  `_execAsync()`. sqlite3 CLI stops option parsing at the database
  filename, so `-json` after it was silently ignored (fell back to
  default text output on strict-POSIX systems).

- vakovalskii#3: Add synchronous placeholder job in the leaderboard endpoint
  before the async cache lookup, same pattern as the analytics
  endpoint fix from earlier commits. Prevents concurrent requests
  from each starting their own `_runLeaderboardJob()`.

- vakovalskii#5: Guard `_ensureSqliteBackfillRunning()` with a completed-phase
  check (`_sqliteBackfillStatus.phase === 'done'`). Without this,
  every `loadSessions()` call after the initial backfill finished
  would trigger another full directory scan.
- select s (session) inside tab for correct iTerm2 focus
- normalize tty: strip /dev/ prefix for reliable matching
- Add `src/embeddings.js` — optional vector search using
  @huggingface/transformers (pure JS ONNX, no Python/torch needed).
  Model: Xenova/all-MiniLM-L6-v2, 384-dim embeddings, ~23 MB ONNX.
  Falls back gracefully when npm package isn't installed.

- `/api/search?q=X&mode=text|semantic|hybrid` — three search modes:
  - `text`: FTS5 keyword MATCH (existing behavior, default fallback)
  - `semantic`: pure cosine similarity against pre-computed session
    embeddings
  - `hybrid`: FTS5 for recall (top 200) → vector re-rank for
    precision, combined score = 0.3×fts_matches + 0.7×similarity

- SQLite `session_embeddings` table stores pre-computed embeddings
  per session. Populated during SQLite backfill phase 2 (after FTS5
  ingest). Batched 32 at a time with setImmediate yields.

- `/api/embeddings/status` — reports model availability, dim, count.

- Frontend: Text / Hybrid / Semantic toggle buttons next to search
  bar. Default: hybrid. Re-runs search on mode change.

- Cherry-pick upstream v6.15.11: iTerm2 focus fix (select session in
  tab, normalize tty /dev/ prefix).
Rewrites embeddings.js to match the codex-git retrieval architecture
(kb_hybrid_search.rs + kb_embedding_store.rs):

6-stage pipeline (per Memento paper, arXiv 2603.18743, Figure 8):
  Stage 1: FTS5 sparse recall (top-20)
  Stage 2: Dense embedding recall (top-20)
  Stage 3: Reciprocal Rank Fusion (k=60, Cormack et al. 2009)
  Stage 4: Utility reranking (per-entry success/failure rate)
  Stage 5: Threshold filter
  Stage 6: Top-k

Weights from Memento paper results:
  BM25=0.3 (Recall@1=0.32), Embedding=0.7 (Recall@1=0.54)
  Utility: final = rrf * (0.7 + 0.3 * utility_rate)

3-level provider chain (matches codex-git priorities):
  1. Local: MiniLM-L6-v2 (default, 384d, 23MB) or Qwen3-Embedding-0.6B
     (1024d, configurable via ~/.codedash/embedding-config.json)
  2. API: OpenAI-compatible /embeddings endpoint (GitHub Models at
     models.github.ai, Copilot proxy, or any provider)
  3. TF-IDF fallback: 256-dim bag-of-words hash (always available)

New: utility tracker (SQLite search_utility table) — records click/
expand/ignore per session×query, feeds Stage 4 reranking. POST
/api/search/utility endpoint for frontend to report outcomes.

GET /api/embeddings/status now reports: model, dim, count, available
models, config, and pipeline parameters.

Tested: 36 results in 313ms for "megatron training" with full
6-stage pipeline on 500 pre-computed embeddings.
## copilot-client.js (NEW)
- Auto-discovers GitHub Copilot OAuth tokens from:
  ~/.config/github-copilot/apps.json (preferred, VS Code refresh)
  ~/.copilot/auth/credential.json (Copilot CLI fallback)
- Tries ALL available tokens (not just first) — handles stale tokens
- Token exchange via GET api.github.com/copilot_internal/v2/token
  Returns dynamic endpoint (enterprise vs individual)
- chatCompletion(messages, {model, max_tokens, reasoning_effort})
  Default: gpt-4.1 (free for Pro). Also: gpt-5-mini with xhigh
- summarizeSession(messages) — truncated first+last 10, GPT summary
- Session token cached until expiry, auto-refresh 60s before

## Progressive message loading
- GET /api/session/:id?offset=0&limit=50 — paginated
- GET /api/session/:id/stream — SSE chunks of 50 messages
- Frontend: loads first 50 immediately, "Load more" button
- Message role filters: All | User | Assistant | Tools

## Summarize
- POST /api/summarize/:id — calls Copilot gpt-4.1
- GET /api/copilot/status — auth state, model, api_base
- Frontend: "Summarize" button in detail header
- Summary rendered in styled box above messages

## Other
- Disabled update nag banner (updateAvailable always false)
- Merged upstream changes
- Tests: copilot-client (7 tests), embeddings (8 tests)
  Both pass with real Copilot API calls

Tested: gpt-4.1 "Hello!" 1.6s, gpt-5-mini xhigh "4" 687ms
@vakovalskii
Copy link
Copy Markdown
Owner

Account suspended. Also conflicts with current codebase and introduces unnecessary complexity (SQLite FTS5 dependency).

vakovalskii added a commit that referenced this pull request Apr 10, 2026
Merged: #156 (star sync), #155 (clipboard fallback), #159 (bind URL fix),
#157 (session name vs first prompt), #160 (MCP badges toggle), #100 (Warp launch API)
Closed: #128 (dup), #148 (banned), #161 (bad diff)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants