perf: async analytics/leaderboard + SQLite FTS5 + live partial results#148
perf: async analytics/leaderboard + SQLite FTS5 + live partial results#148apstenku123 wants to merge 14 commits intovakovalskii:mainfrom
Conversation
Stops the GUI from hanging on users with large session histories
(4800+ codex rollouts / 1 GB+ JSONL). Tested on a 4873-session corpus
where cold Analytics previously took 100+ seconds to respond.
## Core changes
- **Non-blocking loadSessions** — sync path reads metadata only, uses
the parse/cost disk caches, and queues uncached files for a
background warmer. Cold `/api/sessions` now returns in ~300ms
instead of blocking for 100+ seconds.
- **Async background jobs** for `/api/analytics/cost` and
`/api/leaderboard`. HTTP returns a `{status, progress, partialResult}`
snapshot immediately; the client polls at 500ms and the job publishes
a live partial aggregate on each chunk so users see real numbers
climb ($0 → $5562 → $11344 → ... → final).
- **Incremental cost aggregator** (`createCostAggregator`) — extracted
from `getCostAnalytics` so the job can merge sessions one chunk at a
time and finalize a snapshot per yield.
- **SQLite + FTS5 index** (`src/sqlite-index.js`) at
`~/.codedash/cache/index.sqlite` — persistent sessions/messages/
messages_fts (FTS5 porter+unicode61)/daily_stats/files_seen/
aggregate_cache tables. Full-text search runs via `MATCH` in ~5 ms
on 1.2M messages. Uses async `spawn` with `-cmd .timeout 30000` so
concurrent writers don't deadlock.
- **Aggregate result cache** persisted in `aggregate_cache` with a
5-min quantized fingerprint (`count|max_ts_bucket|filters|helpers`).
Repeat visits within the bucket are instant; active codex writes
don't invalidate the cache on every request.
## Hot-path fixes found along the way
- `getActiveSessions` ran `lsof` per matching process with a 2s
timeout and called `loadSessions()` inside the loop → blocked the
event loop for minutes when 30+ codex-up wrappers were running.
Now: single `ps` call, tight regex (catches `codex`/`codex-up`/
`codex-up-exec` as binary names), batched `lsof -a -d cwd -Fpn -p`
(the `-a` flag is critical; without it lsof ORs conditions and
returns cwds for unrelated processes), pid→cwd cache, 3s result
cache, no inner loadSessions. 80s → 350ms cold, 0ms cached.
- `resolveGitRoot` called `git rev-parse` synchronously per unique
project path — 56 projects × 2s = 112s of blocked event loop. Now
queued to a background resolver with its own disk cache in
`~/.codedash/cache/git-root-cache.json`.
- `scanCodexSessions` was O(n²) via `.find()` on an array, and
re-read every rollout file on each call. Now: `Map<sid, session>`
+ `cacheOnly` parse mode + background warmer drains uncached files.
- `parseClaudeSessionFileAsync` — streaming read via `readline` for
files >5 MB (user had a 199 MB session file). Yields to the event
loop every 2000 lines via `await new Promise(r => setImmediate(r))`
so HTTP requests aren't starved during parse.
- Persistent cache paths moved from `os.tmpdir()` to
`~/.codedash/cache/` so macOS tmpdir cleanup doesn't wipe a
morning's worth of parse work.
- Flush handlers on SIGINT/SIGTERM + periodic flush every 50 entries
— killing the server mid-warm no longer loses hours of progress.
## Accuracy fixes
- `parseClaudeSessionFile` counted every `type=user` entry as a user
prompt, but Claude Code stores tool_results as `type=user` with
`content: [{type:'tool_result', ...}]`. One measured session had
480 type=user entries but only 17 real user prompts — a 28x
overcount. Now checks `content` for a real `text` block.
- `isSystemMessage` extended to skip Codex runtime injections that
were counted as user prompts: `<cwd>`, `<turn_aborted>`,
`<ide_selection>`, `<command_output>`, `# CLAUDE.md`,
`Warning: The maximum number of unified exec`, `AUTOSTEERING:`,
`[Sub-agent results]`.
- **Helper session detection** — codex rollouts with
`session_meta.payload.originator === 'codex_exec'` (or scripted
first-message patterns like `You are in /...`, `Read-only task.`,
`Work in /...`, `Pair-local ... lane`, `## X Y Agent`, etc) are
flagged `is_helper: true`. `/api/sessions` filters them by default;
`?include_helpers=1` opts in. On the test corpus this removes
2166/4873 scripted sub-agent runs.
- Leaderboard now counts **unique conversations** (via
`group_key`), not retries. Real cost is summed over all rollouts
(actual money spent); session count uses deduped groups. On the
test corpus: 2869 → 612 unique conversations, 867k → 8458
real prompts, cost stays at the real $53,637.
## Shared grouping helper
- `computeSessionGroupKey(s)` in data.js: `tool::project::firstMsg[0..200]`
(or `helper::project` for helpers) — computed once per session on
load, exposed as `s.group_key`.
- `groupSessionsByConversation(sessions)` in frontend/app.js — shared
by **Timeline**, **All Sessions**, **Projects view**,
**Cloud Sync**, and **Activity/Heatmap**. One helper, one
representative per group, `+N more` badge on cards.
## Logging
- `CODEDASH_LOG=0` (default) silences stdout spam (previous
`[ACTIVE] pid=... codex/waiting cpu=0%` lines were emitted on
every /api/active poll). `ERROR`/`WARN`/`JOB` still go to stdout.
- All logs (including verbose tags) always go to
`~/.codedash/logs/server.log` with timestamps. Set
`CODEDASH_LOG=1` to also mirror to stdout.
## Exports / new endpoints
- `loadSessionsAsync(progressCb)` — async variant with incremental
mtime-based change detection.
- `getWarmingStatus()` + `/api/warming` — background parse progress.
- `getSqliteBackfillStatus()` + `/api/sqlite-status` — FTS5 ingest
progress + index size.
- `createCostAggregator()`, `computeSessionCostForAnalytics(session,
opencodeCache)`, `buildOpencodeCostCache(sessions)` — so the async
jobs can stream-aggregate without re-wiring `getCostAnalytics`.
## Benchmark (user's machine, 4873 sessions / 1.1 GB JSONL)
```
before (v6.15.10) after
loadSessions 109 s (cold) 72 ms (cacheOnly path)
/api/active 80 s (34 procs) 350 ms cold, 0 ms warm
search 3–10 s (rebuild) 5 ms (FTS5 MATCH)
analytics 30 s (blocking) first partial in ~500 ms,
full in ~15 s, instant on cache hit
leaderboard 35 s (blocking) first partial in ~500 ms,
full in ~15 s, instant on cache hit
```
Browser cache note: after upgrading, users should hard-reload
(Cmd+Shift+R) once so the split frontend modules re-load — the poll
loops are new code and old cached JS will show the initial
"Loading..." spinner without progressing.
There was a problem hiding this comment.
Pull request overview
This PR refactors codedash’s analytics/leaderboard/search/session-loading paths to avoid long synchronous stalls by introducing background warming, async “job” endpoints with live partial results, and a persistent SQLite (FTS5) index/cache under ~/.codedash/cache/.
Changes:
- Add a persistent SQLite + FTS5 index and aggregate-result cache for fast full-text search and repeat analytics/leaderboard visits.
- Convert
/api/analytics/costand/api/leaderboardinto async background jobs that stream progress + partial snapshots to the UI. - Add session de-duplication by conversation group (retry/resume collapsing) across multiple frontend views, plus default filtering of helper/sub-agent sessions.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/sqlite-index.js | New sqlite3 CLI wrapper, schema, FTS search, and persistent aggregate cache helpers. |
| src/server.js | Adds async analytics/leaderboard job endpoints, new status endpoints, helper filtering, and routes search via SQLite. |
| src/data.js | Implements cache-only session loading + background warmer, SQLite ingest/backfill, helper detection, grouping keys, and streaming analytics aggregation. |
| src/frontend/analytics.js | Adds polling loop and live “partial results” rendering for analytics. |
| src/frontend/leaderboard.js | Adds polling loop and progress UI for leaderboard. |
| src/frontend/app.js | Adds shared grouping helper + UI badges for collapsed conversation groups. |
| src/frontend/heatmap.js | Uses shared grouping helper to dedupe sessions before counting. |
| src/frontend/cloud.js | Uses shared grouping helper and shows “+N more” aggregate info. |
| src/frontend/styles.css | Adds styles for group badges and analytics/leaderboard progress UI. |
Comments suppressed due to low confidence (1)
src/data.js:671
parseClaudeSessionFile()comparesentry.timestampdirectly to numericfirstTs/lastTs. Elsewhere (e.g._computeSessionDailyBreakdown) timestamps are handled as either numbers or ISO strings. Ifentry.timestampis a string here, the comparisons can produce incorrect first/last timestamps. Normalizeentry.timestampto an epoch-ms number before comparing/assigning.
if (entry.timestamp) {
if (entry.timestamp < firstTs) firstTs = entry.timestamp;
if (entry.timestamp > lastTs) lastTs = entry.timestamp;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let served = false; | ||
| try { | ||
| const sessions = loadSessions(); | ||
| const filtered = includeHelpers ? sessions : sessions.filter(s => !s.is_helper); | ||
| const fingerprint = _analyticsFingerprint(filtered, '', '', includeHelpers ? 'h1' : 'h0'); |
There was a problem hiding this comment.
Same concurrency issue as analytics: the leaderboard endpoint starts _runLeaderboardJob() only after an async cache lookup resolves. Concurrent requests for the same key can start multiple jobs and race to overwrite _jobs.leaderboard. Consider deduplicating in-flight work per key (store a promise/state immediately) so only one job runs per key at a time.
| function _ensureSqliteBackfillRunning() { | ||
| if (_sqliteBackfillRunning) return; | ||
| let sqliteIndex; | ||
| try { sqliteIndex = require('./sqlite-index'); } catch { return; } | ||
|
|
||
| _sqliteBackfillRunning = true; | ||
| _sqliteBackfillStatus.running = true; | ||
| _sqliteBackfillStatus.startedAt = Date.now(); | ||
| _sqliteBackfillStatus.phase = 'scanning'; | ||
| _sqliteBackfillStatus.done = 0; |
There was a problem hiding this comment.
_ensureSqliteBackfillRunning() is described as a "one-shot" task, but there is no guard for the completed state. After the backfill finishes (_sqliteBackfillRunning resets to false), subsequent loadSessions() calls will start another full scan/backfill. Consider adding a persistent "completed" flag (or checking _sqliteBackfillStatus.phase === 'done') to avoid repeated full rescans.
src/sqlite-index.js
Outdated
| function _exec(sql, opts) { | ||
| opts = opts || {}; | ||
| const args = ['-cmd', _CMD_BUSY, DB_FILE]; | ||
| if (opts.json) args.push('-json'); | ||
| args.push(sql); |
There was a problem hiding this comment.
_exec()/_execAsync() append -json after DB_FILE in the sqlite3 argv. The sqlite3 CLI stops parsing options once it sees the database filename, so -json in that position is treated as part of the SQL and can break all JSON queries (e.g. _execJson, loadAllFilesSeen, getIndexStatus, aggregate cache reads). Build args with options first (e.g. ['-cmd', _CMD_BUSY, '-json', DB_FILE]).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…lskii#5, vakovalskii#10) - vakovalskii#10: Move `-json` flag BEFORE DB_FILE in both `_exec()` and `_execAsync()`. sqlite3 CLI stops option parsing at the database filename, so `-json` after it was silently ignored (fell back to default text output on strict-POSIX systems). - vakovalskii#3: Add synchronous placeholder job in the leaderboard endpoint before the async cache lookup, same pattern as the analytics endpoint fix from earlier commits. Prevents concurrent requests from each starting their own `_runLeaderboardJob()`. - vakovalskii#5: Guard `_ensureSqliteBackfillRunning()` with a completed-phase check (`_sqliteBackfillStatus.phase === 'done'`). Without this, every `loadSessions()` call after the initial backfill finished would trigger another full directory scan.
- select s (session) inside tab for correct iTerm2 focus - normalize tty: strip /dev/ prefix for reliable matching
- Add `src/embeddings.js` — optional vector search using
@huggingface/transformers (pure JS ONNX, no Python/torch needed).
Model: Xenova/all-MiniLM-L6-v2, 384-dim embeddings, ~23 MB ONNX.
Falls back gracefully when npm package isn't installed.
- `/api/search?q=X&mode=text|semantic|hybrid` — three search modes:
- `text`: FTS5 keyword MATCH (existing behavior, default fallback)
- `semantic`: pure cosine similarity against pre-computed session
embeddings
- `hybrid`: FTS5 for recall (top 200) → vector re-rank for
precision, combined score = 0.3×fts_matches + 0.7×similarity
- SQLite `session_embeddings` table stores pre-computed embeddings
per session. Populated during SQLite backfill phase 2 (after FTS5
ingest). Batched 32 at a time with setImmediate yields.
- `/api/embeddings/status` — reports model availability, dim, count.
- Frontend: Text / Hybrid / Semantic toggle buttons next to search
bar. Default: hybrid. Re-runs search on mode change.
- Cherry-pick upstream v6.15.11: iTerm2 focus fix (select session in
tab, normalize tty /dev/ prefix).
Rewrites embeddings.js to match the codex-git retrieval architecture
(kb_hybrid_search.rs + kb_embedding_store.rs):
6-stage pipeline (per Memento paper, arXiv 2603.18743, Figure 8):
Stage 1: FTS5 sparse recall (top-20)
Stage 2: Dense embedding recall (top-20)
Stage 3: Reciprocal Rank Fusion (k=60, Cormack et al. 2009)
Stage 4: Utility reranking (per-entry success/failure rate)
Stage 5: Threshold filter
Stage 6: Top-k
Weights from Memento paper results:
BM25=0.3 (Recall@1=0.32), Embedding=0.7 (Recall@1=0.54)
Utility: final = rrf * (0.7 + 0.3 * utility_rate)
3-level provider chain (matches codex-git priorities):
1. Local: MiniLM-L6-v2 (default, 384d, 23MB) or Qwen3-Embedding-0.6B
(1024d, configurable via ~/.codedash/embedding-config.json)
2. API: OpenAI-compatible /embeddings endpoint (GitHub Models at
models.github.ai, Copilot proxy, or any provider)
3. TF-IDF fallback: 256-dim bag-of-words hash (always available)
New: utility tracker (SQLite search_utility table) — records click/
expand/ignore per session×query, feeds Stage 4 reranking. POST
/api/search/utility endpoint for frontend to report outcomes.
GET /api/embeddings/status now reports: model, dim, count, available
models, config, and pipeline parameters.
Tested: 36 results in 313ms for "megatron training" with full
6-stage pipeline on 500 pre-computed embeddings.
## copilot-client.js (NEW)
- Auto-discovers GitHub Copilot OAuth tokens from:
~/.config/github-copilot/apps.json (preferred, VS Code refresh)
~/.copilot/auth/credential.json (Copilot CLI fallback)
- Tries ALL available tokens (not just first) — handles stale tokens
- Token exchange via GET api.github.com/copilot_internal/v2/token
Returns dynamic endpoint (enterprise vs individual)
- chatCompletion(messages, {model, max_tokens, reasoning_effort})
Default: gpt-4.1 (free for Pro). Also: gpt-5-mini with xhigh
- summarizeSession(messages) — truncated first+last 10, GPT summary
- Session token cached until expiry, auto-refresh 60s before
## Progressive message loading
- GET /api/session/:id?offset=0&limit=50 — paginated
- GET /api/session/:id/stream — SSE chunks of 50 messages
- Frontend: loads first 50 immediately, "Load more" button
- Message role filters: All | User | Assistant | Tools
## Summarize
- POST /api/summarize/:id — calls Copilot gpt-4.1
- GET /api/copilot/status — auth state, model, api_base
- Frontend: "Summarize" button in detail header
- Summary rendered in styled box above messages
## Other
- Disabled update nag banner (updateAvailable always false)
- Merged upstream changes
- Tests: copilot-client (7 tests), embeddings (8 tests)
Both pass with real Copilot API calls
Tested: gpt-4.1 "Hello!" 1.6s, gpt-5-mini xhigh "4" 687ms
|
Account suspended. Also conflicts with current codebase and introduces unnecessary complexity (SQLite FTS5 dependency). |
Summary
loadSessions()cold path + async background warmer for parse/cost caches/api/analytics/costand/api/leaderboardjobs with live partial results (UI shows$0 → $5562 → $11344 → ... → finalas sessions aggregate)~/.codedash/cache/index.sqlite) for sessions, messages, daily stats, and aggregate result cachegroupSessionsByConversationhelper used by Timeline / All Sessions / Projects / Cloud Sync / Activity — collapsescodex execretries of the same prompt into one representative card with+N morebadgegetActiveSessionsO(N)lsofblocking (80s → 350ms), broken regex that missedcodex-up/codex-up-exec, andlsof -aflag so pid filter actually appliestype=userentries includetool_resultblocks (28x inflation measured on a real session)originator=codex_exec+ 9 first-prompt regex patterns) and hides them from default counts;?include_helpers=1opts inWhy
On a corpus of 4873 sessions / 1.1 GB of JSONL the GUI hung:
loadSessions()cold: 109 seconds (re-parsed every Claude file sync, plus 112s of syncgit rev-parse× 56 projects, plus syncparseClaudeSessionFileon a 199 MB file)/api/active: 80 seconds (matched 34 processes namedcodex-up*and ran per-pidlsof+loadSessionsinside the loop)/api/analytics/cost+/api/leaderboard: blocking, client gotERR_TIMED_OUT[ACTIVE] pid=... codex/waitingon every/api/activepollChrome tabs accumulated hung keep-alive connections and the server eventually looked dead.
After
loadSessions()/api/activesearchFullTextMATCH)/api/analytics/costfirst byteaggregate_cachehit)On the test corpus leaderboard counts went from 2869 sessions / 867k prompts (inflated by retries + tool_results + scripted helpers) to a realistic 612 unique conversations / 8458 real user prompts / $53,637 real spend / 9-day streak.
Architectural notes
_execAsync(sql)that spawnssqlite3 -cmd '.timeout 30000' <db>and streams SQL into stdin, so concurrent writers wait instead of erroring withdatabase is locked. Reads don't block writers.~/.codedash/cache/(wasos.tmpdir()— macOS tmpdir cleanup wipes hours of parse work). Legacy tmpdir paths are migrated once on load.session_meta.payload.originator === 'codex_exec'(standardcodex exec)^You are in /,^Read-only task\.,^Work (only )?in /,^Pair-local .* lane,^## X Y Agent,^Read $OMX_, etc.New endpoints
GET /api/warming—{running, done, total, phase}for background parseGET /api/sqlite-status—{backfill, index: {sessions, messages, files, db_bytes}}GET /api/analytics/costnow returns{status: 'running'|'done'|'error', progress, partialResult, result}(backwards-compat:donespreadsresultfields at the top level so existing UI code still works)GET /api/leaderboard— same async job shapeGET /api/sessions?include_helpers=1— opt in to scripted helper sessions (default hidden;X-Helper-Countheader reports the filtered count)Upgrade note
After installing users should do a hard-reload (Cmd+Shift+R) once — the poll loops in the split frontend modules are new code and old cached JS will show a stuck
Loading…spinner.Test plan
loadSessions109 s → 72 ms/api/analytics/costreturnsstatus: 'done'with the cached resultanalyticsquery returns 3 highlighted snippets in 58 ms on a 1.2M-message corpus?include_helpers=1restores full setgetActiveSessionscorrectly picks upcodex-upprocesses with cwd/Volumes/external/sources/nanochat(not kernel addresses)sqlite3 ~/.codedash/cache/index.sqlite "SELECT kind, length(result_json) FROM aggregate_cache"