feat: usage dashboard with token stats and cost tracking#15
Merged
Conversation
…ng indexing Add TokenStatRow struct and Database::replace_token_stats() to write per-(date, model) token usage aggregates to session_token_stats table. Compute token stats from parsed messages during both full reindex (Indexer) and single-source sync (SourceSyncService), grouping by date extracted from message timestamps and model name.
Adds a Tauri command that queries session_token_stats joined with sessions to return aggregated usage statistics including totals, daily usage by provider, per-model costs, per-project costs, and recent session cost breakdowns. Uses two-step queries for project and session costs to ensure accurate per-model pricing.
Dashboard panel with range selector (7d/30d/90d/All), toggleable provider chips, cost summary hero, CSS-only stacked daily bar chart, and sortable tables for cost-by-model, cost-by-project, and recent sessions.
This updates the usage pipeline to match verified provider log behavior more closely, especially for Claude and Codex. The data path now deduplicates Claude usage entries, groups by local-day boundaries, persists provider-scoped usage costs, and derives Codex token deltas from cumulative token_count events. The UI pass also fixes the clear-usage confirmation flow, adds scoped usage clearing, improves chart inspection, and makes usage caveats visible where the pricing table is still best-effort. Constraint: Existing sessions.db data was produced by older aggregation rules and requires a rebuild to reflect corrected usage rows Constraint: Codex reference output separates cached and reasoning token display, while the current app UI still shows a simplified aggregate view Rejected: Keep using UTC date slicing for daily grouping | did not match verified local-day report output Rejected: Pull pricing live at runtime from external sources | adds instability and network dependency to a desktop index view Confidence: high Scope-risk: moderate Reversibility: clean Directive: If usage totals drift again, validate raw provider logs first and compare against a trusted external report before changing UI math Tested: TypeScript typecheck, ESLint, cargo test, cargo clippy -D warnings, provider/reference parity checks for Claude totals and single-project Claude totals Not-tested: End-to-end visual verification after local database rebuild in the desktop app
This switches pricing application from hand-maintained local compatibility rules to cached remote model-rate catalogs, and moves catalog refresh control into the usage workflow where users actually verify cost output. The usage surface now exposes pricing freshness directly and offers a single refresh-usage action that clears cached usage rows and immediately rebuilds the index so updated prices take effect without extra navigation. Constraint: Stored usage costs are persisted in the local database, so catalog updates only affect already-indexed sessions after a rebuild Constraint: Remote pricing availability can fail at runtime, so the app must tolerate an empty catalog and mark affected models as unpriced Rejected: Keep refresh-pricing only in Settings | disconnected from the place users validate cost output Rejected: Continue relying on embedded compatibility prices as fallback | user requested direct catalog-driven pricing instead Confidence: high Scope-risk: moderate Reversibility: clean Directive: When changing pricing behavior, verify both catalog fetch/storage and post-rebuild usage totals; do not silently mix stale and fresh price sources Tested: TypeScript typecheck, ESLint, cargo test, cargo clippy -D warnings Not-tested: Manual desktop-app interaction after refreshing catalog and refreshing usage on a live database
This shifts index rebuild and usage refresh flows to background maintenance jobs with event notifications so the desktop UI no longer blocks on long maintenance actions. The app now surfaces running-state banners, toast-based lifecycle feedback, automatic usage refreshes after maintenance completes, and uses the same async path for the manual rebuild shortcut. It also tightens destructive affordances by moving index clearing onto the app's confirm dialog and clarifies index controls with shortcut guidance. Constraint: Maintenance tasks update shared local state, so concurrent rebuild/refresh actions must be serialized Constraint: Usage panels should stay responsive even when full reindex work is expensive on large local histories Rejected: Keep foreground await-based rebuild/refresh interactions | caused visible UI stalls and poor feedback during long-running maintenance Rejected: Introduce a full queued job manager now | unnecessary overhead for the current single-maintenance-lane need Confidence: high Scope-risk: moderate Reversibility: clean Directive: Preserve the maintenance-status event contract if you expand background jobs; UI refresh logic now depends on these phases Tested: TypeScript typecheck, ESLint, cargo test, cargo clippy -D warnings Not-tested: Extended manual UX soak test with multiple rapid maintenance actions in the packaged desktop app
Incremental usage refresh now shares the same cross-file usage dedup path as full reindexing, and pricing lookup uses a unified, provider-agnostic matching flow instead of region-specific prefix heuristics. The pricing status badge now reports upstream model counts rather than alias-expanded cache entries so the dashboard reflects what was actually fetched. Constraint: Preserve existing cached pricing catalog format in SQLite Rejected: Keep DOMESTIC_PREFIX_MAP for targeted fixes | bakes region-specific policy into generic lookup Rejected: Derive displayed model count from alias-expanded catalog | overcounts by design Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep alias precedence separate from runtime lookup heuristics; they solve different problems Tested: cargo fmt; cargo clippy --all-targets --all-features -- -D warnings; cargo test; npx tsc --noEmit; npm run lint Not-tested: Manual UI verification of usage dashboard refresh flow
Usage aggregation now carries parser-level usage hashes through the shared message model so cross-file dedup has stable metadata across providers. Recent session queries also exclude child sessions so the usage dashboard and editor recents stay focused on top-level threads. Constraint: Parser message structs must stay backward-compatible for serialization Rejected: Add provider-specific dedup state outside Message | duplicates bookkeeping across parsers Confidence: high Scope-risk: narrow Reversibility: clean Directive: When adding a new provider parser, initialize usage_hash explicitly and attach it where provider logs expose stable usage IDs Tested: cargo test Not-tested: Manual verification of recents UI with nested sidechains
The usage dashboard now surfaces pricing freshness and usage refresh state in the toolbar instead of hiding them in a transient banner. Project paths are shortened for readability, timestamps render in a stable numeric format, and the table/card styling is tuned to better match the intended hierarchy without changing the underlying data. Constraint: Keep the panel compatible with existing maintenance event wiring Rejected: Leave refresh status in a dismissive banner | easy to miss once the page gets dense Rejected: Keep locale-dependent datetime formatting | made status chips jitter across environments Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep status metadata concise in the toolbar; avoid reintroducing long prose that pushes controls below the fold Tested: npx tsc --noEmit; npm run lint Not-tested: Manual visual QA against the usage dashboard mockup
- Remove trivial `provider_to_str()` wrapper, inline `provider.key()` - Remove unused `compute_token_stats_with_catalog()` intermediate layer - Remove unreachable `conditions.is_empty()` branch in `build_usage_where` - Remove unused `clear_usage_stats_for_providers` and its test - Remove unused `PRICING_SOURCE_LABEL` and `source_url` field - Replace anonymous tuple types with named structs for usage query rows - Tighten `pub` to `pub(crate)` on internal query methods
…-only turns Two bugs caused ~7% undercount in Claude usage statistics: 1. Cross-file dedup order: subagent sessions were processed before parents (filesystem order), so overlapping usage entries were attributed to subagents and the parent's stats were zeroed out. Now parents are always processed first. 2. Thinking-only assistant turns: when a turn produced only thinking content (System role) or empty text, there was no non-System message to attach token_usage to, silently dropping ~5.5% of usage entries. Now a minimal Assistant message is inserted to carry the usage.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
tyql688
added a commit
that referenced
this pull request
Apr 10, 2026
* feat(usage): create session_token_stats table with cascade trigger * feat(usage): add pricing module with LiteLLM-based cost lookup * feat: add replace_token_stats DB method and populate token stats during indexing Add TokenStatRow struct and Database::replace_token_stats() to write per-(date, model) token usage aggregates to session_token_stats table. Compute token stats from parsed messages during both full reindex (Indexer) and single-source sync (SourceSyncService), grouping by date extracted from message timestamps and model name. * feat(usage): add get_usage_stats backend command with SQL aggregation Adds a Tauri command that queries session_token_stats joined with sessions to return aggregated usage statistics including totals, daily usage by provider, per-model costs, per-project costs, and recent session cost breakdowns. Uses two-step queries for project and session costs to ensure accurate per-model pricing. * feat(usage): add frontend types and Tauri binding for usage stats * feat(usage): add Usage icon to ActivityBar * feat(usage): add i18n strings for usage panel * feat(usage): add UsagePanel component with CSS and App wiring Dashboard panel with range selector (7d/30d/90d/All), toggleable provider chips, cost summary hero, CSS-only stacked daily bar chart, and sortable tables for cost-by-model, cost-by-project, and recent sessions. * Align usage accounting with verified log semantics This updates the usage pipeline to match verified provider log behavior more closely, especially for Claude and Codex. The data path now deduplicates Claude usage entries, groups by local-day boundaries, persists provider-scoped usage costs, and derives Codex token deltas from cumulative token_count events. The UI pass also fixes the clear-usage confirmation flow, adds scoped usage clearing, improves chart inspection, and makes usage caveats visible where the pricing table is still best-effort. Constraint: Existing sessions.db data was produced by older aggregation rules and requires a rebuild to reflect corrected usage rows Constraint: Codex reference output separates cached and reasoning token display, while the current app UI still shows a simplified aggregate view Rejected: Keep using UTC date slicing for daily grouping | did not match verified local-day report output Rejected: Pull pricing live at runtime from external sources | adds instability and network dependency to a desktop index view Confidence: high Scope-risk: moderate Reversibility: clean Directive: If usage totals drift again, validate raw provider logs first and compare against a trusted external report before changing UI math Tested: TypeScript typecheck, ESLint, cargo test, cargo clippy -D warnings, provider/reference parity checks for Claude totals and single-project Claude totals Not-tested: End-to-end visual verification after local database rebuild in the desktop app * Use cached remote pricing catalogs for usage costs This switches pricing application from hand-maintained local compatibility rules to cached remote model-rate catalogs, and moves catalog refresh control into the usage workflow where users actually verify cost output. The usage surface now exposes pricing freshness directly and offers a single refresh-usage action that clears cached usage rows and immediately rebuilds the index so updated prices take effect without extra navigation. Constraint: Stored usage costs are persisted in the local database, so catalog updates only affect already-indexed sessions after a rebuild Constraint: Remote pricing availability can fail at runtime, so the app must tolerate an empty catalog and mark affected models as unpriced Rejected: Keep refresh-pricing only in Settings | disconnected from the place users validate cost output Rejected: Continue relying on embedded compatibility prices as fallback | user requested direct catalog-driven pricing instead Confidence: high Scope-risk: moderate Reversibility: clean Directive: When changing pricing behavior, verify both catalog fetch/storage and post-rebuild usage totals; do not silently mix stale and fresh price sources Tested: TypeScript typecheck, ESLint, cargo test, cargo clippy -D warnings Not-tested: Manual desktop-app interaction after refreshing catalog and refreshing usage on a live database * Keep index and usage refresh work off the foreground path This shifts index rebuild and usage refresh flows to background maintenance jobs with event notifications so the desktop UI no longer blocks on long maintenance actions. The app now surfaces running-state banners, toast-based lifecycle feedback, automatic usage refreshes after maintenance completes, and uses the same async path for the manual rebuild shortcut. It also tightens destructive affordances by moving index clearing onto the app's confirm dialog and clarifies index controls with shortcut guidance. Constraint: Maintenance tasks update shared local state, so concurrent rebuild/refresh actions must be serialized Constraint: Usage panels should stay responsive even when full reindex work is expensive on large local histories Rejected: Keep foreground await-based rebuild/refresh interactions | caused visible UI stalls and poor feedback during long-running maintenance Rejected: Introduce a full queued job manager now | unnecessary overhead for the current single-maintenance-lane need Confidence: high Scope-risk: moderate Reversibility: clean Directive: Preserve the maintenance-status event contract if you expand background jobs; UI refresh logic now depends on these phases Tested: TypeScript typecheck, ESLint, cargo test, cargo clippy -D warnings Not-tested: Extended manual UX soak test with multiple rapid maintenance actions in the packaged desktop app * fix: prevent usage pricing mismatches during refresh Incremental usage refresh now shares the same cross-file usage dedup path as full reindexing, and pricing lookup uses a unified, provider-agnostic matching flow instead of region-specific prefix heuristics. The pricing status badge now reports upstream model counts rather than alias-expanded cache entries so the dashboard reflects what was actually fetched. Constraint: Preserve existing cached pricing catalog format in SQLite Rejected: Keep DOMESTIC_PREFIX_MAP for targeted fixes | bakes region-specific policy into generic lookup Rejected: Derive displayed model count from alias-expanded catalog | overcounts by design Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep alias precedence separate from runtime lookup heuristics; they solve different problems Tested: cargo fmt; cargo clippy --all-targets --all-features -- -D warnings; cargo test; npx tsc --noEmit; npm run lint Not-tested: Manual UI verification of usage dashboard refresh flow * Preserve usage context across parsers and recent-session listings Usage aggregation now carries parser-level usage hashes through the shared message model so cross-file dedup has stable metadata across providers. Recent session queries also exclude child sessions so the usage dashboard and editor recents stay focused on top-level threads. Constraint: Parser message structs must stay backward-compatible for serialization Rejected: Add provider-specific dedup state outside Message | duplicates bookkeeping across parsers Confidence: high Scope-risk: narrow Reversibility: clean Directive: When adding a new provider parser, initialize usage_hash explicitly and attach it where provider logs expose stable usage IDs Tested: cargo test Not-tested: Manual verification of recents UI with nested sidechains * Make usage dashboard status easier to scan The usage dashboard now surfaces pricing freshness and usage refresh state in the toolbar instead of hiding them in a transient banner. Project paths are shortened for readability, timestamps render in a stable numeric format, and the table/card styling is tuned to better match the intended hierarchy without changing the underlying data. Constraint: Keep the panel compatible with existing maintenance event wiring Rejected: Leave refresh status in a dismissive banner | easy to miss once the page gets dense Rejected: Keep locale-dependent datetime formatting | made status chips jitter across environments Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep status metadata concise in the toolbar; avoid reintroducing long prose that pushes controls below the fold Tested: npx tsc --noEmit; npm run lint Not-tested: Manual visual QA against the usage dashboard mockup * refactor: remove dead code and tighten visibility in usage dashboard - Remove trivial `provider_to_str()` wrapper, inline `provider.key()` - Remove unused `compute_token_stats_with_catalog()` intermediate layer - Remove unreachable `conditions.is_empty()` branch in `build_usage_where` - Remove unused `clear_usage_stats_for_providers` and its test - Remove unused `PRICING_SOURCE_LABEL` and `source_url` field - Replace anonymous tuple types with named structs for usage query rows - Tighten `pub` to `pub(crate)` on internal query methods * fix: exclude child sessions from recent session queries * fix: correct usage token accounting for cross-file dedup and thinking-only turns Two bugs caused ~7% undercount in Claude usage statistics: 1. Cross-file dedup order: subagent sessions were processed before parents (filesystem order), so overlapping usage entries were attributed to subagents and the parent's stats were zeroed out. Now parents are always processed first. 2. Thinking-only assistant turns: when a turn produced only thinking content (System role) or empty text, there was no non-System message to attach token_usage to, silently dropping ~5.5% of usage entries. Now a minimal Assistant message is inserted to carry the usage. * style: fix prettier formatting in usage helpers * fix: make timezone test CI-agnostic by computing expected date dynamically
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
session_token_statstable with per-(session, date, model) token aggregates and costusage.tswith vitest coveragemessageId:requestIdhashKey changes
Backend (Rust)
pricing.rs— remote catalog parsing, fuzzy model matching, tiered cost calculationdb/queries.rs— usage SQL queries (daily, by-model, by-project, by-session) with dynamic provider/date filteringcommands/usage.rs—get_usage_statscommand aggregating all usage dimensionsindexer.rs— token stats computation with parent-first cross-file dedupproviders/claude/parser.rs— usage_hash dedup, thinking-only turn usage preservationproviders/codex/parser.rs— Codex-specific usage event extraction fromtoken_counteventsFrontend (Solid.js)
UsagePanel.tsx— full dashboard: hero cost card, KPI grid, stacked daily bar chart, sortable model/project/session tables with paginationusage.ts+usage.test.ts— extracted chart data builders, sort helpers, formattersusage.css(840 lines)Test plan
cargo test— 78 Rust tests pass (pricing, indexer dedup, DB sync, parsers)npm test— 71 frontend tests pass (usage helpers, formatters)cargo clippy+tsc --noEmitclean