v0.next: timeline rail + per-turn cost + Codex support + Opus pricing fix by lroolle · Pull Request #7 · thevibeworks/ccx

lroolle · 2026-04-14T16:49:27Z

Closes the v0.next — complete product pass milestone (#6).

Summary

Twelve commits that ship the five v0.next issues plus one real pricing bug the reference-driven verification tool surfaced:

#	Issue	Commit
#3	Deep-link / load-earlier hash-nav fix	`b6797c8`
#2	Per-turn spend breakdown in info panel	`6385a6b`
#4	In-memory LRU session cache	`fe540a8`
#5 + #2	Time-machine scrubber rail with cost heat	`7208560`
#5 + #6	Markdown headers + outline truncation fix	`6b9fad0`
new	Opus 4.5/4.6 overcharge fix + pricing verify tool	`dbdba60`
new	Persistent disk-backed session cache	`290a6f1`
new	Narrow auto-expand session nav sidebar	`0357a50`
new	Exec-style turn-by-turn export format	`729f86a`
new	Codex per-message cost attribution + GPT-5 pricing	`a637914`
review	Round 1 fixes (pricing delimiter, Codex multi-assistant, path canonicalization)	`c884786`
style	gofmt stack-added files	`ae9e127`

The load-bearing catches

Opus 4.5 / 4.6 overcharge (dbdba60). The embedded pricing table had these at tier $15/$75 (the Opus 4/4.1 rate). Claude Code's own src/utils/modelCost.ts has them at tier $5/$25 — roughly 3× lower. Every per-turn cost figure ccx ever displayed for Opus 4.5/4.6 sessions was inflated by ~3×. Fixed. cmd/ccx-verify-pricing (new) parses the Claude Code source with regex and diffs against our pricing table, make verify-pricing wires it into the Makefile, and unit tests feed it a fake source with deliberate drift to prove the detector works.

GPT-5 pricing match pollution (c884786). Round 1 review found that LookupPricing used plain strings.Contains with no delimiter awareness. That made openai/gpt-5-codex-mini resolve to the full GPT-5 tier instead of the mini tier — a ~40× overcharge for Codex mini sessions. Replaced with containsDelimited + hasVariantToken helpers that require delimiter boundaries. Adversarial tests from the review are pinned.

Codex multi-assistant attribution (c884786). Round 1 also caught that per-message attribution dropped tokens whenever a turn produced multiple agent_message events between token_count checkpoints (reasoning → tool → reply is standard GPT-5 shape). Earlier messages in the burst showed $0.00. Fixed with a usageWatermark index + distributeCodexDelta that evenly splits delta across every untagged assistant.

Rail shipping details

The timeline rail went through three redesigns during review:

v1: left-side dots (wrong — should be right-edge, user feedback)
v2: right-edge scrubber with fisheye zoom (still felt static)
v3 (what ships): index-based positioning so idle gaps don't distort the layout, ruler-style horizontal notches, cost-weighted tick size (heat from per-turn cost), 5-tick fisheye zoom on hover, hysteresis (1.5%) so the tooltip doesn't flicker at tick boundaries, rAF-throttled mousemove, 120ms grace period on leave, floating tooltip with kind badge, elapsed offset, turn ordinal, preview, cost, cumulative running total, and token count. Sub-agent (Task) and Skill tool calls get their own minor notches. 14px inward gutter from the viewport edge keeps it clear of the browser scrollbar.

Per-turn spend breakdown (#2)

The info panel now shows a Per-turn spend section sorted by cost desc. Each row links to #msg-<anchor> and composes with the load-earlier fix so deep-links into collapsed history auto-reload. Session-level Cost row in the Tokens section. Single source of truth: session.Stats.CostUSD == sum(message.Usage.CostUSD) by construction.

Performance (#4 + #9)

Two-tier cache:

Layer 1 (in-memory LRU, cap 16): cold parse of a 1500-message session ~19ms → cache hit ~330ns. ~60,000× speedup on repeat views. Transparent, invalidated on (mtime, size) change.
Layer 2 (persistent disk cache): gob-encoded to $XDG_DATA_HOME/ccx/session-cache/<sha256>.gob. Survives ccx web restart. Corrupt files dropped + swept. gob.Register(map[string]any{}) + gob.Register([]any{}) registered in init() so ContentBlock.ToolInput round-trips cleanly.

Codex support (#8)

Unified data layer for features across Claude Code and Codex:

MessageUsage.ReasoningTokens field for GPT-5 reasoning output
Codex backend now populates per-message usage via token_count event deltas with distributeCodexDelta
GPT-5 / GPT-5.4 pricing tiers (marked UNVERIFIED — no automated drift check, update manually if OpenAI rebases)
Downstream: per-turn spend section, timeline rail heat, exec export cost footers all work for Codex sessions now (they were all zeros before)

Tests

Full suite green. Coverage by package:

Package	Coverage
`internal/parser`	high (expanded with pricing + turns + cumulative tests)
`internal/provider`	85.3%
`internal/render`	80.8%
`internal/web`	expanded (19 timeline tests, 8 markdown/nav tests)
`cmd/ccx-verify-pricing`	5 tests including adversarial drift detection

New tests:

9 disk-cache tests (round-trip with real session, corrupt file, mtime/size invalidation, multi-process restart)
16 parser/pricing tests (delimited match, GPT-5 family, adversarial overcharge inputs, cost math)
7 Codex attribution tests (multi-assistant delta split, integer-remainder preservation, tag protection)
14 exec export tests (extraction, turn rendering, cost footer)
8 timeline rail tests (ticks, gridlines, heat normalization, cumulative)
5 verify-pricing tool tests (regex extraction, drift detection with fake source)
8 markdown + outline-nav tests

Reviews

Round 1 (linus-code-reviewer) caught two HIGH severity bugs — the pricing match pollution and the Codex multi-assistant attribution loss. Both fixed in c884786 with adversarial tests pinned.

Round 2 (independent second pass) verified the round 1 fixes hold up under 20+ adversarial inputs, noted 3 LOW items that are intentionally deferred (UTF-8 byte-slicing in tickSnippet, disk_cache double-close defer, truncatePath pre-existing), and returned a ship it verdict. Cleared the remaining gofmt drift (ae9e127).

Known limitations (documented, not ship-blockers)

Opus 4.6 fast mode bills at tier $30/$150 but ccx returns the default $5/$25 rate regardless of usage.speed. Requires plumbing the speed field through the parser.
GPT-5 pricing drift — no equivalent of Claude Code's modelCost.ts in the Codex reference checkout, so make verify-pricing only cross-checks the Claude side. Update internal/parser/pricing.go manually if OpenAI rebases.
Codex per-message attribution uses even-split by count, not proportional to individual message output size, because Codex doesn't expose per-message counts. Per-turn aggregation is exact; per-message is approximate.
Timeline rail fisheye feel (zoom factors, hysteresis constant, fade timings) was tuned without browser testing — values in the commit are reasonable defaults, verify locally and tune if needed.
Empty heading ## lines are dropped by the new server-side markdown header parser. Valid per CommonMark but visually useless.

Test plan

make fmt        # already clean on stack-added files
make test       # full suite green
make build      # clean
make verify-pricing   # pricing table matches Claude Code source
./bin/ccx web   # open a real 500+ msg session, verify:
  - rail on right edge, notches visible, hover expands
  - fisheye zoom follows cursor, tooltip snaps
  - per-turn spend section in info panel shows real costs
  - load-earlier hash nav jumps correctly (deep-link URL test)
  - narrow sidebar collapses to 56px, expands on hover
./bin/ccx export -f exec <session-id>   # exec report renders

🤖 Generated with Claude Code

When a session exceeds progressiveLoadThreshold (500 msgs), ccx renders only the last 3 sections and shows a "Load earlier" button. Deep-link URLs like /session/<id>#msg-<uuid> targeting messages inside the hidden range silently failed: the hash handler did nothing because the target was not in the DOM. Fix: - Auto-reload with ?all=1 when the hash target is missing AND a load-earlier button exists. Preserves the hash, uses location.replace so the broken page doesn't pollute history. - Hoist the hash handler into a named jumpToHashTarget function and re-fire it on hashchange so in-app nav clicks after initial load also jump correctly. - Walk up from msgEl, opening every ancestor <details> and unfolding every .thread.folded, instead of only opening a single child details. Fixes a pre-existing bug where deep-linked messages inside collapsed threads stayed hidden. - In-session search: on 0 matches with hidden history active, the info line shows a "search full history" link that reloads into full content. Pending query is persisted via sessionStorage so the search picks up where it left off. Tests verify the HTML markers (load-earlier element, jumpToHashTarget function, ccx-pending-search key) are present under the right conditions (>500 msgs without ?all=1) and absent under ?all=1. Browser-level behavior (reload round-trip, sessionStorage hand-off) not exercised in this container environment; unit tests cover the HTML output. Verify locally with a real 500+ msg session. Closes #3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Answers the "where did my quota go" question inside the session viewer. Surpasses ccusage by going per-turn instead of per-session, tree-aware instead of blind JSONL summation, visual instead of table-only, and offline-by-default with pinned pricing. Data layer: - Add MessageUsage struct on Message, populated during ParseSession from message.usage (input, output, cache_read, cache_create, cost). Previously only session-level aggregates were kept. - Add SessionStats.CostUSD computed post-parse as the sum of per-message costs — single source of truth so session total always matches the per-turn sum the UI renders. - Quick parser (session list view) also computes CostUSD so the session card totals stay consistent with the session view. Pricing (internal/parser/pricing.go): - Embedded Claude pricing table: Opus 4.5/4.6, Sonnet 4.5/4.6, Haiku 4.5, 3.5 Sonnet/Haiku. - LookupPricing: exact match only, plus date-suffix stripping (claude-sonnet-4-5-20250929 -> claude-sonnet-4-5). No fuzzy matching — that's how ccusage shipped 5x overcharges (ryoppippi/ccusage#934). - Unknown models return nil; callers treat nil as "cost unavailable" rather than zero. We'd rather not show a number than show a wrong one. - ComputeCost: straightforward per-category multiplication. Cache reads at the discounted rate, cache writes at the surcharged rate. Turn aggregation (internal/parser/turns.go): - TurnStats: per-user-turn aggregation over a flat wire-ordered slice. Anchored by KindUserPrompt or KindCommand; tool results, meta, and compact summaries ride along with the enclosing turn. - Sidechain (isSidechain: true) messages ARE counted because the user is billed for them. ccx does not attempt sidechain cache-read dedup — that would produce numbers that don't match the bill. - FlattenSessionMessages helper for walking the session tree into wire order. Web UI: - Info panel: new "Per-turn spend" section, sorted by cost desc so expensive turns surface immediately. Each row is a link to #msg-<anchor>, composing with the load-earlier hash-nav fix from #3 for cross-range deep linking. - Tokens section: Cost row shown when at least one message has a pricing match. - Info panel now scrollable (max-height + overflow-y auto) so the breakdown fits for long sessions. - When model is unknown, the section still renders token-only with a "No pricing match" note — surfacing which turns used the most tokens is useful even when cost can't be computed. - formatCost helper with adaptive precision: <$0.01 floor, $0.xxxx under $1, $x.xx to $100, $xxx to $10k, $x.xk beyond. Tests (16 new): - pricing_test.go: exact match, date-suffix strip, unknown returns nil, empty returns nil, no fuzzy match guard (ccusage #934), ComputeCost math, nil safety, KnownModels determinism. - turns_test.go: per-message usage persists, session cost equals sum of per-message costs, two-turn aggregation, sum of turn costs equals session cost, sidechain inclusion. - server_test.go: spend section rendering with cost, token-only rendering for unpriced models, info-cost row absent when no cost. Also fix a latent panic: renderSessionPage(session.ID[:8]) crashed for session IDs shorter than 8 characters. Now guarded. Scope for this commit: the session-level web UI answer to "where did my quota go". Deferred to follow-up within the same milestone: ccx usage CLI command (cross-session daily/weekly/monthly axes), per-message inline cost badge in conversation view, Codex pricing table. Partial on #2; closes the session-level web surface. CLI + badge tracked in follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Users repeatedly clicking between sessions paid the full cold-parse cost on every view. For a 1500-message session that's ~19 ms of JSON parsing + tree reconstruction burned on every navigation. For 5000-message sessions, ~54 ms. Fix: wrap Multi.ParseSession in a bounded LRU cache keyed by absolute path, invalidated by (mtime, size). Cap 16 entries. Benchmark before/after on a 1500-message fixture: Cold (no cache) 3.6 ms / 4 MB / 30k allocs Warm (cache hit) 330 ns / 256 B / 2 allocs Cache hit is ~11_000x faster on this synthetic fixture. On the full-shape parser benchmark with realistic content blocks, the cold path is ~19 ms — which means the realized speedup on repeat views is closer to 60_000x. Implementation: - internal/provider/cache.go: container/list + map LRU, mutex- guarded. getOrLoad(path, loader) for transparent wrap-and-cache. - internal/provider/multi.go: ParseSession consults the cache first, delegates to parseThroughBackends on miss, stores the result. ClearSessionCache() exposed for diagnostics/tests. Invalidation is by (mtime, size). If Claude Code rewrites the file (live session growing), the next access re-parses. Live tail still works. Scope discipline for this pass: - No SQLite persistence. Cache is process-local; restart flushes it. Adding a disk cache introduces serialization cost that partially negates the in-memory win; revisit if users need cache survival across restarts. - No streaming render. Server still renders HTML in one pass. Progressive-load already avoids rendering hidden sections to HTML — the earlier "CSS hides everything" claim in the issue body was incorrect; renderMessagesProgressive only emits visible sections. - No parser-level allocation reduction. ~300 allocs/message is higher than strictly necessary, but the cache sidesteps the cost entirely for repeat views. A cold-path optimization pass can follow later once real (not synthetic) sessions show it matters. Tests: - TestSessionCache_HitSkipsLoader: second Get doesn't call loader - TestSessionCache_InvalidatesOnMtimeChange: mtime bump forces reload - TestSessionCache_LRUEvictsOldest: cap enforcement + most- recently-touched survives - TestSessionCache_ConcurrentLoadsSafe: 50 concurrent getOrLoad calls don't race or duplicate entries - TestSessionCache_ClearDropsEverything - BenchmarkSessionCache_ColdVsWarm: the numbers cited above Profile methodology + results committed to docs/profiles/long-session.md. Closes #4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ye zoom (#5, #2) Right-edge semantic scrubber that merges #2's per-turn spend data into #5's timeline rail. The rail answers both "where am I" and "where did the money go", and it behaves like a proper hover- activated scrubber instead of a styled scrollbar. Positioning: by anchor INDEX, not wall-clock time. A session with an idle gap (user walked away for 3 hours and resumed) no longer compresses all the work into a tiny sliver of the rail. Every turn gets an even slot. The original Timestamp and Offset are still kept for the tooltip so "+1m42s" still tells you when the turn happened. Layout: - Fixed 14px inward from the viewport right edge (clears the browser scrollbar and any overlay progress bar). Pulled in from top (56px) and bottom (24px) to avoid top nav and dock toolbar. - 24px-wide invisible hit target with a 10px-visible spine inside (forgiving entry — you don't need pixel-perfect aim). - On hover: widens to 52px, spine gains contrast, gridline labels fade in, cursor becomes crosshair. Index-based ruler gridlines: - chooseGridStep picks a round step (5/10/20/25/50/100/250/500/ 1000) targeting 4-10 gridlines. Sessions under 20 turns skip gridlines entirely (rail is short enough to scan). - Labels are turn ordinals ("10", "20", "30"...). Major every 5th step is visually bolder. Cost-weighted ticks (from #2 integration): - enrichTicksWithCost maps TurnStats back onto ticks via AnchorID and normalizes cost to 0-1 heat (highest turn = 1.0). Unknown models fall back to token-count heat so Codex sessions still paint a useful heatmap. - Tick base size drives off inline --heat CSS var; fisheye zoom layers on top multiplicatively. Cumulative cost: - enrichTicksWithCost walks ticks in session order accumulating per-turn cost into CumulativeCostUSD. - Tooltip shows "so far $X.XX" next to the per-turn cost so you can see the running total at any point. Interaction model (per the user's spec): - applyFisheyeZoom on 5 nearest ticks: zoom-0 at 2.6×, zoom-1 at 1.9×, zoom-2 at 1.35×. Springy bezier transition. Cheap: binary search + ±2 iteration. - selectWithHysteresis: tooltip only switches to a new tick when the cursor is meaningfully closer (> 1.5% of rail height) than the locked one. Prevents flicker between adjacent ticks at midpoints. - rAF-throttled mousemove: pendingClientY is coalesced into the next frame via processRailFrame, so long sessions don't stall the render pipeline. - Grace period (120ms) on mouseleave: brief slips off the rail don't tear the interaction down; re-entering cancels the fade and continues. - Tooltip viewport clamp: computed against window.innerHeight minus tooltipHeight so it never runs off the top or bottom of the page. - Playhead (dashed guide) snaps to the nearest tick for orientation. - Click handler uses the currently-highlighted nearest tick and navigates via #msg-<uuid> (composes with #3 fix for deep links into collapsed history). Tooltip content: [kind] +1m42s · turn 42 first line of message $0.0423 ∑ $2.30 ⧫ 12.3k Tests (19 total web/ tests, 6 new for this pass): - TestComputeTimelineTicks_SingleAnchorCentered (50%) - TestComputeTimelineTicks_PositionByIndexNotTime (rejects time-based positioning with contrived uneven timestamps) - TestComputeTimelineTicks_LongIdleGapDoesntDistort (10 turns with 3h idle in the middle still distribute evenly) - TestChooseGridStep (sessions under 20 turns get no gridlines; 30-2000 turns get 4-10) - TestComputeTimelineGridlines_IndexBased (monotonic, non-empty labels) - TestEnrichTicksWithCost_CumulativeMatchesSessionTotal (last tick's cumulative = sum of all per-turn costs) Browser verification (hysteresis feel, grace period, tooltip clamp, rAF smoothness) not exercised in this container environment. Go tests cover HTML structure, tick positioning math, cumulative accuracy, and JS helper wiring. Verify locally with a real session. Closes #5. Integrates #2 data for one coherent visual surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two small but visible bugs in the session view: #5: Server-side markdown had no ATX heading support. Claude often starts a response with '## Summary' or '### Findings' and ccx would render that as a literal '<p>## Summary</p>' in static views. The JS-side renderer already handled headings — the Go side was just incomplete. Fix: add parseATXHeading + a heading branch in renderMarkdown. Recognises # through ###### followed by a space, strips trailing '#' characters, and emits div.md-h<level> matching the existing CSS. Also handles simple '- item' / '* item' lists the same way. Rejected inputs are explicit: >6 hashes, no space after hashes, empty text after the marker, leading whitespace. These fall through to the paragraph renderer instead of becoming empty headings. #6: Outline nav dropped the assistant summary on tool-heavy turns. renderConversationNav hard-capped children at 10 and emitted '+N more' for the tail. Long turns that dispatched lots of tools would drop their final assistant text response — the exact thing the user wants to see to know what the turn concluded with. Fix: when truncating, show the first 9 children, the '+N more' marker, and always render the final child. The hidden count is adjusted to exclude the preserved tail (15 tools + summary now shows '+6 more' instead of '+7 more'). Tests: markdown_test.go: 7 tests — parseATXHeading happy paths + all non-heading cases (including '## ' empty-text rejection and the '####### too many' case), full renderMarkdown pipeline through heading levels 1-6, headings with bold/code passthrough, basic list rendering. nav_test.go: 3 tests — summary preserved on truncated turns, short turns show everything with no '+N more' marker, hidden count correctly excludes the preserved summary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reading Claude Code's own pricing source (src/utils/modelCost.ts) surfaced a real bug in ccx's embedded pricing table: Opus 4.5 and Opus 4.6 were hardcoded at tier \$15/\$75 (the old Opus 4.1 rate), but Claude Code's current source has them at tier \$5/\$25 — a roughly 3x lower rate. Every per-turn cost figure ccx displayed for Opus 4.5/4.6 sessions was inflated by ~3x. Fixed. Pricing table rewrite (internal/parser/pricing.go): - Named tier constants (costTier_3_15, costTier_15_75, costTier_5_25, costTier_30_150, costHaiku_35, costHaiku_45) mirror Claude Code's naming so drift comparison is direct. - pricingTable map covers every model Claude Code's MODEL_COSTS references: claude-3-5-haiku, claude-haiku-4-5, claude-3-5-sonnet, claude-3-7-sonnet, claude-sonnet-4, claude-sonnet-4-5, claude-sonnet-4-6, claude-opus-4, claude-opus-4-1, claude-opus-4-5, claude-opus-4-6. Previously ccx was missing claude-opus-4, claude-opus-4-1, claude-sonnet-4, claude-3-7-sonnet entirely — those sessions returned nil cost. - LookupPricing now mirrors firstPartyNameToCanonical in Claude Code's model.ts: case-insensitive substring match with more-specific versions checked first (4-6 before 4-5 before 4-1 before 4). Handles bare canonical names, dated IDs, and Bedrock ARNs uniformly. Non-Claude models still return nil. - PricingTableCopy exposes a defensive copy for the verify tool. - Opus 4.6 fast mode (tier \$30/\$150) is documented as a known limitation — proper support requires plumbing usage.speed through the parser. Verification tool (cmd/ccx-verify-pricing): - Reads reference/claude-code-2188/src/utils/modelCost.ts with regex (no bun/node dependency). Extracts the COST_TIER_* constants and the MODEL_COSTS mapping. - Reads reference/claude-code-2188/src/utils/model/configs.ts for each config's firstParty string. - Canonicalizes those to short names using the same ordered substring rules as ccx's LookupPricing and Claude Code's firstPartyNameToCanonical. - Compares ccx's PricingTableCopy against the parsed tiers field-by-field (input, output, cache_read, cache_write with 1e-9 tolerance). Reports per-model drift or "OK". Exit 1 on any mismatch, 0 on clean. - Also reports ccx-only rows (models ccx prices that CC source does not reference) as stale drift. - make verify-pricing wraps it. CLAUDE_SOURCE=<path> env var overrides the default reference checkout location. Tests: parser/pricing_test.go updates: - TestLookupPricing_NonClaudeFamilyReturnsNil: gpt-4, gpt-5.4-mini, grok-2, llama, mistral — all nil. - TestLookupPricing_DatedBedrockAndBareNamesAllMap: the three common shapes of claude-sonnet-4-5 all resolve to tier \$3/\$15. - TestLookupPricing_OpusMoreSpecificBeforeLess: opus-4-6 returns \$5/\$25, opus-4-1 returns \$15/\$75, opus-4 (bare) returns \$15/\$75. Match order correctness. cmd/ccx-verify-pricing/main_test.go (new, 5 tests): - TestParseModelCostTiers_ReadsValues: regex extracts inputTokens/outputTokens/cache fields from a fake source. - TestParseModelCostsMap_ExtractsBindings: line-oriented parser maps config names to tier names. - TestParseFirstPartyNames_ExtractsFirstParty: reads configs.ts firstParty strings. - TestCompareAgainstCCX_DetectsDrift: fake source with deliberately wrong opus-4-6 pricing — tool must detect and report the drift, but NOT false-positive on matching models. - TestCanonicalize_OrderMatters: bedrock ARNs, dated IDs, bare names, and mixed-case inputs all canonicalize correctly. Verification on the real source (11 models): OK CLAUDE_3_5_HAIKU_CONFIG [COST_HAIKU_35] OK CLAUDE_3_5_V2_SONNET_CONFIG [COST_TIER_3_15] OK CLAUDE_3_7_SONNET_CONFIG [COST_TIER_3_15] OK CLAUDE_HAIKU_4_5_CONFIG [COST_HAIKU_45] OK CLAUDE_OPUS_4_1_CONFIG [COST_TIER_15_75] OK CLAUDE_OPUS_4_5_CONFIG [COST_TIER_5_25] OK CLAUDE_OPUS_4_6_CONFIG [COST_TIER_5_25] OK CLAUDE_OPUS_4_CONFIG [COST_TIER_15_75] OK CLAUDE_SONNET_4_5_CONFIG [COST_TIER_3_15] OK CLAUDE_SONNET_4_6_CONFIG [COST_TIER_3_15] OK CLAUDE_SONNET_4_CONFIG [COST_TIER_3_15] ccx pricing table matches Claude Code source — no drift. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two-tier cache architecture: in-memory LRU (from #4) satisfies hot access, persistent disk cache survives process restart. Now a fresh `ccx web` doesn't re-pay the parse cost for sessions you've already viewed. Storage: one gob file per source session, named by sha256(abs_path), with in-band (mtime, size) metadata for freshness checks. Layout: internal/provider/disk_cache.go — gob file store + freshness internal/provider/disk_cache_test.go — 9 tests incl. round-trip internal/provider/multi.go — two-tier lookup chain Flow (ParseSession): 1. in-memory LRU (existing, O(1)) 2. miss → disk cache (os.Stat + gob.Decode, ~5ms) 3. miss → backend.ParseSession (live parse, ~19ms for 1500 msg) 4. successful parse writes to BOTH cache tiers Failure modes are always graceful. Corrupt gob file, tmp-write crash, unwriteable data dir, schema drift — every one of those falls through to live parse. The cache is an optimization layer, never a correctness layer. Cache location: $XDG_DATA_HOME/ccx/session-cache/ (or ~/.local/share/ccx/session-cache/ by default). Separate directory from the stars database so ccx cache clear can nuke it independently. Serialization notes: - gob.Register(map[string]any{}) + gob.Register([]any{}) set up in init() so ContentBlock.ToolInput survives the round trip. Without registration gob drops the Task/Skill tool inputs. - parser.Message has an unexported `raw rawMessage` field used only during parsing. gob skips unexported fields so the serialized tree is smaller than the in-memory one — fine because nothing downstream reads `.raw` after parse. - Writes go tmp-then-rename so a crash mid-encode can't leave a half-written cache file. Tests (9 new in provider, 85.3% package coverage): - TestDiskCache_RoundTripsRealSession: full parse -> put -> get, verifies stats, per-message Usage, and 3-block Content survive. This is the load-bearing test: without it, gob drops could ship undetected. - TestDiskCache_MissOnMtimeChange: stale entry removed on read. - TestDiskCache_MissOnSizeChange: size drift also invalidates. - TestDiskCache_MissOnNonExistentPath: never-cached path returns miss. - TestDiskCache_PutNilSessionIsNoop: defensive. - TestDiskCache_ClearRemovesEverything. - TestDiskCache_CorruptFileGracefullyDropped: fake garbage in place of a gob file produces a miss + file cleanup. - TestMulti_DiskCacheSurvivesMultiRebuild: parse with Multi A, tear down, parse same path with fresh Multi B pointing at same disk dir, verify backend's parseCount stays 0 (disk served it). - TestMulti_DiskCacheInvalidatesOnFileMtimeChange: edit the session file after caching, verify next parse actually hits the backend. NewMultiWithDiskCache(dir, backends...) exposed for tests. Production code uses NewMulti which defaults to config.DataDir/session-cache/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The session nav panel on the left of the conversation view was 170px wide always, eating into reading space on 1280-1440px viewports. Now it collapses to a compact 56px column and expands to 260px only on hover — reclaiming ~114px of horizontal space for the actual content. Interaction: - Default state: 56px-wide column showing only session-id stubs and provider badges. The "Sessions" header is replaced by a circular 'S' dot in the session accent colour — recognisable at a glance. - Hover: width animates to 260px over 180ms ease. Full session summaries, IDs, and the "Sessions" label fade in. A soft left-side shadow lifts the expanded column off the content. - The expansion is OVERLAY, not push — the flex layout slot stays at 56px so the outline sidebar (middle column) and main content (right column) don't shift sideways when the cursor drifts over the sidebar. Accomplished by setting the outer <aside> to 56px with overflow:visible and letting the inner .panel-header + .panel-list children be the elements that actually widen. - Mobile (<600px) still hides the panel entirely as before. No JS — pure CSS :hover + width transition. Keyboard tab order and focus states unchanged. No schema changes. Tests: TestHandleSession_SessionNavIsNarrowByDefault: smoke test that the rendered page contains the .panel-nav.session-nav { width: 56px; } rule plus the :hover { width: 260px; } expansion. Actual layout behaviour is CSS-only so Go tests can only verify the rules are emitted — manual browser verification is the load-bearing check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New export format "exec" (alias "exec-md") that renders a session as an executive report instead of a rendered conversation. Each turn shows: the user's request, the files the agent edited, and the agent's final summary text — with per-turn cost footer when pricing is available. Useful for the "what did I get done this session" share that --brief can't quite produce. Brief gives you the full conversation minus tool noise; exec gives you a structured report where each turn is one readable block. Sample output: # Session s-abc123 > initial summary line from the session header **Duration:** 1h23m · **Messages:** 42 · **Tool calls:** 18 · **Total cost:** $2.4500 · **Model:** claude-sonnet-4-5 --- ## Turn 1 > add a health-check endpoint **Files touched:** `api/health.go`, `api/health_test.go` Added /health handler plus a table-driven test — passes locally. Wired into the router with the standard JSON middleware. *cost: $0.0423 · 8.2k tokens* ## Turn 2 ... Implementation (internal/render/exec.go): - ExecMarkdown walks session messages in wire order and segments them into turnBlocks anchored by KindUserPrompt / KindCommand. Compact boundaries emit a "— context compaction —" divider inline between turns. - extractEditedFiles scans assistant tool_use blocks for Edit / Write / MultiEdit / NotebookEdit / Create and pulls file_path / notebook_path / path. Dedupes via map, sorts alphabetically for deterministic output. - lastAssistantText walks the turn's assistants in reverse and returns the last one with a non-empty text block — the "final summary" a reader actually wants. - Per-turn cost/token footer reads from parser.ComputeTurnStats (reused from #2). When cost is 0 and tokens are 0 the footer is omitted entirely. - Empty turns (no user text, no edits, no summary) are dropped so the output isn't padded by meta markers. Export plumbing: - render/export.go: Export() dispatches "exec" / "exec-md" to ExecMarkdown directly, bypassing the Brief + format pipeline (exec is structurally a report, not a rendered transcript). - cmd/export.go: formatToExt maps "exec" / "exec-md" to "-exec.md" so the default output filename is session-<id>-exec.md. Tests (14 new, render pkg 80.8% coverage): - extractEditedFiles: dedupes+sorts, ignores non-edit tools, handles notebook_path/path field variants, empty on no assistants. - lastAssistantText: final summary with tool_use blocks sprinkled, empty on tool-only turns. - ExecMarkdown: empty session header-only, single-turn full render, multi-turn ordering, empty-turn dropping, per-turn cost emission. - Export dispatch: "exec" and "exec-md" aliases produce identical output. - formatDurationShort / formatTokenCountShort helpers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Claude Code and Codex share the abstractions the timeline rail and per-turn cost UI need (turns, assistant responses, tool calls, token usage) — but Codex's wire format is different enough that ccx was returning nil Usage on every Codex assistant message. That made the per-turn spend section empty and the timeline rail heat flat for all Codex sessions. Fix: extend MessageUsage for GPT-5's reasoning field, add GPT-5/5.4 pricing, and attribute Codex token_count deltas to the nearest assistant in the full parse. Shape unification: - parser.MessageUsage gains a ReasoningTokens field. Claude sessions leave it 0; Codex populates it from reasoning_output_tokens. - parser.MessageUsage.Total() now sums all 5 categories. - ComputeCost includes ReasoningTokens at the output rate (OpenAI bills GPT-5 thinking tokens as regular output). GPT-5 / GPT-5.4 pricing (parser/pricing.go): - Three new tiers: costGPT54_10_80 ($10/$80), costGPT54Mini_025_2 ($0.25/$2), costGPT54Nano_005_04 ($0.05/$0.4). Cache reads at ~50% of input per OpenAI's cached-prompt pricing. Cache writes stay 0 — OpenAI doesn't split cache creation the way Anthropic does. - pricingTable adds gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano rows. - LookupPricing extends the ordered substring match to include the GPT-5 family. nano wins before mini wins before plain gpt-5 so "gpt-5-mini-2025-07-07" doesn't fall through to the more-expensive full-sized tier. UNVERIFIED: rates are pinned from publicly-known GPT-5 list pricing circa late 2025. There's no equivalent of Claude Code's modelCost.ts in the Codex reference checkout, so `make verify-pricing` cannot cross-check these rates yet. If OpenAI rebases, update the tier constants manually. Tracked as a known limitation. Codex backend per-message attribution (codex/backend.go): - tokenUsageTotals struct extended with ReasoningOutputTokens. - Full-parse token_count handler now tracks previousTotals, computes delta on each event, and attaches the delta as MessageUsage to the most-recent assistant message without Usage via latestAssistantWithoutUsage(). Rough per-turn attribution — Codex reports running totals, not per-message counts, so delta-since- last-event is the best available without a protocol change. - clampNonNegative() guards against total regressions (rare, but wire format doesn't forbid it). - Previous total wrapped in a stack-local var at full-parse scope so each ParseSession call starts fresh. Tests: parser/pricing_test.go: (4 new on top of the prior pricing suite) - TestLookupPricing_UnknownFamilyReturnsNil (renamed from NonClaudeFamily): gpt-4, grok, llama, mistral still return nil — we explicitly only match Claude and GPT-5 families. - TestLookupPricing_GPT5Family: every model/canonical pair (gpt-5, gpt-5.4, gpt-5-mini, gpt-5.4-nano, uppercase variant) resolves to the correct tier. Match-order correctness: mini/nano win before plain gpt-5. - TestComputeCost_ReasoningTokensBilledAsOutput: 1M reasoning at $80/Mtok adds $80 to cost. codex/backend_usage_test.go: (5 new) - TestLatestAssistantWithoutUsage_SkipsTaggedMessages: walks backward, returns first untagged assistant. - TestLatestAssistantWithoutUsage_ReturnsNilWhenAllTagged: no double-attribution. - TestLatestAssistantWithoutUsage_ReturnsNilWhenNoAssistants. - TestLatestAssistantWithoutUsage_PicksLatestNotFirst: multiple untagged assistants → the latest one is picked. - TestClampNonNegative: negative deltas zero out. Downstream effects (automatic): - #2 per-turn spend breakdown now populates for Codex sessions. - #5 timeline rail heat colors work for Codex ticks. - #7 exec export shows Codex cost footers. - `make verify-pricing` still clean on the Claude side; Codex verification is an acknowledged follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…th canonicalization Addresses review round 1 findings from linus-code-reviewer on the 10-commit v0.next stack. Two HIGH severity bugs were ship-blockers. HIGH #1: Pricing match pollution for GPT-5 variants. Previous LookupPricing used plain strings.Contains with no delimiter awareness. That matched "gpt-5" inside "gpt-51" (which would be a hypothetical GPT-5.1), and — critically — matched "gpt-5" inside "openai/gpt-5-codex-mini" WITHOUT also matching the mini suffix, resolving it to the full-tier rate (~40x overcharge for real Codex sessions that use that model string). Fix: delimited substring matching via two helpers: - containsDelimited(name, family): requires the next character after family to be one of "-./_ :" (or end of string). Rejects "gpt-5" inside "gpt-51" or "claude-opus-4" inside "claude-opus-42". - hasVariantToken(name, variant): checks for variant tokens (mini, nano) surrounded by delimiters on both sides. Matches "-mini-", "-mini/", "-mini" at end of string. Does NOT match "cuminary" or "terminal". The GPT-5 match now follows a family-then-variant pattern: - Check if the name contains gpt-5.4 or gpt-5 (delimited) - Within the family, check for nano/mini tokens (delimited) - Fall through to the full tier Added adversarial test cases from the review: openai/gpt-5-codex-mini → gpt-5.4-mini tier (was full tier) gpt-5.4-turbo-mini-v2 → gpt-5.4-mini tier anthropic/gpt-5.4-nano-preview → gpt-5.4-nano tier gpt-51 → nil (was full tier) gpt-5o → nil claude-opus-42 → nil (was claude-opus-4 tier) claude-sonnet-40 → nil Also applied the delimited pattern to the Claude family, since the same class of bug could appear for any future hypothetical "claude-opus-4-7" or "claude-sonnet-4-7" variant that wasn't in the table. HIGH #2: Codex per-message attribution lost multi-assistant turns. Previous latestAssistantWithoutUsage returned only the single most-recent untagged assistant, dumping the entire delta on it. For turns that produced multiple agent_message events between token_count checkpoints (reasoning + partial reply + tool + partial reply is standard GPT-5 shape), earlier messages stayed at $0.00 forever and the last one got everything. Fix: distributeCodexDelta splits the delta evenly across every untagged assistant in messages[usageWatermark:]. usageWatermark is a high-water-mark index that advances on every token_count event, so previous bursts aren't re-attributed. - Even split by count (not by per-message output size, because Codex doesn't expose that — best available without a protocol change). The per-turn UI aggregation reassembles correctly. - Integer-division remainder is absorbed by the first message so sum(per-message) == delta exactly. Verified by TestDistributeCodexDelta_RemainderGoesToFirst. - Guards: silently drops delta when no untagged targets (no panic), skips already-tagged messages on subsequent events (no double-attribution). Also fixed in the same token_count handler: running-total regression accounting. Previously clampNonNegative swallowed negative deltas but left previousTotals at the inflated value, so a regression-then-recovery would double-count (100→200→150→250 would emit deltas 100, 0, 100 — wrong, should be 100, 0, 50). Now previousTotals uses max() as a floor via maxInt helper. MEDIUM: Tests for Codex attribution were shape-assertions only. Added 5 tests that exercise the actual behavior: - TestDistributeCodexDelta_EvenSplitAcrossUntagged (reasoning + agent_message both get half the delta, both get non-zero cost) - TestDistributeCodexDelta_RemainderGoesToFirst (delta preserved under integer division) - TestDistributeCodexDelta_SkipsTaggedMessages (prior attribution not overwritten) - TestDistributeCodexDelta_SilentlyDropsWithNoUntaggedTargets - TestMaxInt MEDIUM: In-memory LRU key was not canonicalized. Multi.ParseSession now passes absPath (computed once at the top) to cache.getOrLoad instead of the raw filePath. Two callers with different representations of the same file ("./s.jsonl" vs "/wd/s.jsonl") now hit the same cache slot. Correctness was preserved before the fix — just a cache-hit-rate regression — but worth fixing. MEDIUM: gofmt for backend.go. The tokenUsageTotals struct field alignment broke after adding ReasoningOutputTokens (my commit, not pre-existing). Fixed. Passing gofmt -l cleanly now. All existing tests still pass. 9 new adversarial/coverage tests added. Full suite green. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Round 2 review noted that round 1's own pricing_test.go (and four other stack-added files) had comment-alignment drift from gofmt's preferred layout. Ran gofmt -w on only the files this stack introduced; pre-existing gofmt-dirty files (internal/cmd/search.go, internal/cmd/view.go, internal/provider/codex/helpers_test.go, internal/provider/codex/parse_branches_test.go) are left alone so the diff stays scoped to v0.next work. Files reformatted: cmd/ccx-verify-pricing/main.go internal/parser/pricing_test.go internal/parser/session_test.go internal/provider/disk_cache_test.go internal/web/markdown_test.go All tests still pass. No behavior changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- ignore Codex-only rows in Claude pricing verification - keep early token deltas, reasoning tokens, and sidechain work in the parent turn

lroolle · 2026-04-15T04:15:39Z

Follow-up for the round 2 review is on v0.next in b9b7c96.

Fixed:

verifier now ignores Codex-only pricing rows during Claude drift checks
Codex keeps token_count deltas when the event arrives before assistant output, and session cost is recomputed from attributed messages
sidechain prompts stay inside the parent turn, reasoning tokens count toward turn totals, and exec export uses the same turn boundary

Verification:

make verify-pricing
go test ./...

lroolle and others added 13 commits April 14, 2026 01:44

fix: correct Codex cost accounting and turn totals

b9b7c96

- ignore Codex-only rows in Claude pricing verification - keep early token deltas, reasoning tokens, and sidechain work in the parent turn

docs: release v0.4.0

16e30a8

lroolle mentioned this pull request Apr 15, 2026

v0.next+1: product pass — Exchange/Workspace reframe, rail taste, export shapes #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.next: timeline rail + per-turn cost + Codex support + Opus pricing fix#7

v0.next: timeline rail + per-turn cost + Codex support + Opus pricing fix#7
lroolle wants to merge 14 commits intomainfrom
v0.next

lroolle commented Apr 14, 2026

Uh oh!

lroolle commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant