Skip to content

feat(memory): record real LLM cost in sync audit (#3110)#3150

Merged
graycyrus merged 9 commits into
tinyhumansai:mainfrom
oxoxDev:feat/3110-real-llm-cost-audit
Jun 1, 2026
Merged

feat(memory): record real LLM cost in sync audit (#3110)#3150
graycyrus merged 9 commits into
tinyhumansai:mainfrom
oxoxDev:feat/3110-real-llm-cost-audit

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Jun 1, 2026

Summary

  • Record the real, provider-reported LLM charge in the sync audit log instead of only a token-÷-4 estimate.
  • New ChatProvider::chat_for_text_with_usage surfaces provider UsageInfo (additive; default impl returns (text, None) — no ripple for other providers).
  • SummaryOutput now carries input_tokens / output_tokens / charged_amount_usd; github sync + tree rebuild thread the real charge into the audit entry.
  • SyncAuditEntry.actual_charged_usd: Option<f64> (#[serde(default)]) with effective_cost_usd() / cost_is_actual() helpers; the existing estimate stays as a fallback.
  • Fully back-compatible: pre-existing audit lines (no actual_charged_usd key) still deserialize and render their estimate.

Problem

sync_audit.jsonl only stored estimated_cost_usd, computed from a token_count / 4 heuristic. There was no record of what the provider actually billed for a memory sync, so real spend could not be reconciled against the estimate — and the estimate drifts from reality whenever provider pricing or tokenization differs from the heuristic.

Solution

  1. src/openhuman/memory/chat.rs — add ChatProvider::chat_for_text_with_usage (default returns (text, None)); InferenceChatProvider routes through Provider::chat and parses the returned UsageInfo.
  2. src/openhuman/memory_tree/summarise.rs — thread real usage into SummaryOutput (input_tokens, output_tokens, charged_amount_usd); treat a zero charge as absent.
  3. src/openhuman/memory_sync/sources/audit.rs — add actual_charged_usd (#[serde(default)]) + effective_cost_usd() (prefers actual, falls back to estimate) and cost_is_actual().
  4. src/openhuman/memory_sync/sources/github.rs + rebuild.rs (+ memory_sources/sync.rs) — record the real charge on the github-sync and tree-rebuild paths; log cost_is_actual.

Design note: ingest.rs is intentionally left untouched — cost is read off SummaryOutput before ingest_summary, so threading it through SummaryIngestInput.token_count would be dead plumbing.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy — 20 unit tests across audit / summarise / chat incl. failure/edge cases (provider reports no usage → None fallback, zero-charge treated as absent, pre-feat(memory): surface actual LLM usage/cost from inference API in sync audit #3110 entry deserializes).
  • Diff coverage ≥ 80% — changed lines (the shared RealCostAccumulator, the two source-pipeline call sites, and the chat.rs fail-fast path) are exercised by the new accumulator_* unit tests plus existing audit/chat suites; cargo-llvm-cov not run locally (heavy), the CI Coverage Gate enforces the threshold.
  • Coverage matrix updated — N/A: additive cost field + helpers, no user-facing feature added/removed/renamed.
  • All affected feature IDs from the matrix are listed under ## Related — N/A: no feature behaviour added or moved.
  • No new external network dependencies introduced — uses the existing inference provider; no new endpoints.
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: observability/audit-logging change, not a release-cut UI surface.
  • Linked issue closed via Closes #NNN in the ## Related section.

Impact

  • Desktop/CLI core: sync audit entries now show the real billed cost when the provider reports it; otherwise the estimate is shown exactly as before.
  • No migration: #[serde(default)] makes the new field optional — old sync_audit.jsonl lines load unchanged.
  • No behaviour change for providers that don't return usage (the None branch is the prior behaviour).
  • Staging verification: the None/estimate-fallback branch + back-compat were confirmed live on staging (real audit entry written, deserializes clean, full sync→seal pipeline ran without regression). The Some/real-charge branch is unit-covered; observing it live requires a billing-enabled summarizer backend (staging returns no charge).

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/3110-real-llm-cost-audit
  • Commit SHA: 447767743014bc1249a1186b0617f627f2b6c654

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: no frontend files changed; cargo fmt --all --check clean.
  • pnpm typecheck — N/A: Rust-only change, no TS touched.
  • Focused tests: cargo test --lib openhuman::memory_sync::sources::audit | memory_tree::summarise | memory::chat → 20 passed.
  • Rust fmt/check (if changed): cargo check --lib clean, cargo clippy --lib --no-deps 0 new warnings on changed files.
  • Tauri fmt/check (if changed): N/A — app/src-tauri not touched.

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: sync audit records the real provider charge when available; estimate retained as fallback.
  • User-visible effect: actual_charged_usd now present in sync_audit.jsonl / sync-audit panel for billed syncs; estimate-only behaviour unchanged otherwise.

Summary by CodeRabbit

  • New Features
    • Added capture of provider-reported token usage and actual billing charges in data synchronization operations.
    • Audit logs and sync outcomes now display both estimated and actual costs from data providers, with fallback to estimates when actual charges are unavailable.

oxoxDev added 5 commits June 1, 2026 17:00
…sai#3110)

Add ChatProvider::chat_for_text_with_usage returning (String, Option<UsageInfo>) with a default impl reporting None so test doubles and external impls keep compiling. InferenceChatProvider routes through Provider::chat (which parses usage via compatible::extract_usage) instead of chat_with_history, which discarded it.
…nsai#3110)

Add input_tokens, output_tokens, charged_amount_usd to SummaryOutput and populate them from the provider response in summarise(). Zero charge is treated as absent (None) so callers fall back to the estimate. fallback_summary reports no usage.
…3110)

New Option<f64> field with #[serde(default)] so pre-tinyhumansai#3110 audit lines still deserialize. Adds effective_cost_usd()/cost_is_actual() helpers; keeps estimated_cost_usd + estimate_cost_usd() as the fallback.
…3110)

github sync and rebuild now accumulate real provider token counts + charged_amount_usd across summarise() batches, recording them in the audit entry when any batch reported usage and otherwise falling back to the len/4 estimate. RebuildOutcome carries actual_charged_usd through to callers.
…3110)

Set actual_charged_usd: None on the zero-cost audit entries written when no LLM call happens, and log the real charge (falling back to the estimate) on rebuild completion.
@oxoxDev oxoxDev requested a review from a team June 1, 2026 12:39
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR threads actual LLM token counts and billing amounts from the inference provider through the memory chat layer into sync audit logs. It introduces a new ChatProvider::chat_for_text_with_usage method that surfaces usage data, extends SummaryOutput to carry provider usage fields, and updates GitHub sync and rebuild operations to track both estimated and real token counts, conditionally recording actual backend charges in audit entries.

Changes

Provider usage threading and audit instrumentation

Layer / File(s) Summary
Chat provider usage interface
src/openhuman/memory/chat.rs
The ChatProvider trait gains chat_for_text_with_usage method with default implementation returning None usage. InferenceChatProvider overrides it to call the provider's chat API and extract both response text and optional UsageInfo; logging includes usage presence and token details.
Audit entry schema and cost accessors
src/openhuman/memory_sync/sources/audit.rs
SyncAuditEntry adds optional actual_charged_usd field with backward-compatible deserialization. New effective_cost_usd() and cost_is_actual() methods provide displayable cost and presence indicators, preferring actual charges when present.
Summary output usage and billing fields
src/openhuman/memory_tree/summarise.rs
SummaryOutput extends with input_tokens, output_tokens, charged_amount_usd fields and derives Default. summarise switches to chat_for_text_with_usage to thread provider usage. fallback_summary populates new fields as "no provider usage" (zero/None).
GitHub sync token and cost accounting
src/openhuman/memory_sync/sources/github.rs
run_github_sync maintains separate estimated and real token accumulators. Post-processing selects real totals when any provider usage exists, computes estimated cost from estimates, and conditionally sets actual_charged_usd when real charges observed. Audit and messages use the chosen cost.
Rebuild token and cost accounting
src/openhuman/memory_sync/sources/rebuild.rs
RebuildOutcome gains optional actual_charged_usd and derives Default. Rebuild function parallels GitHub sync: tracks estimated tokens, folds real usage when present, selects appropriate counts for audit, and records actual charges when observed.
Sync dispatcher audit entry recording
src/openhuman/memory_sources/sync.rs
Manual sync and rebuild audit entries explicitly set actual_charged_usd: None (no inference call). Rebuild logging uses effective_cost_usd() and cost_is_actual() for display.

Sequence Diagram(s)

sequenceDiagram
  participant ChatProvider
  participant InferenceChatProvider
  participant InferenceProvider
  participant summarise
  participant SyncAudit
  ChatProvider->>InferenceChatProvider: chat_for_text_with_usage()
  InferenceChatProvider->>InferenceProvider: chat(ChatRequest)
  InferenceProvider-->>InferenceChatProvider: ChatResponse{text, usage}
  InferenceChatProvider-->>ChatProvider: (String, Option<UsageInfo>)
  ChatProvider->>summarise: with usage data
  summarise-->>SyncAudit: SummaryOutput{tokens, charge}
  SyncAudit->>SyncAudit: effective_cost_usd()
  SyncAudit->>SyncAudit: cost_is_actual()
Loading

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

feature, memory, rust-core

Suggested reviewers

  • graycyrus
  • M3gA-Mind

🐇 Through chat and cost we now thread,
Real tokens dance where heuristics fled,
Provider usage blooms at every stage,
Audit logs write truth upon the page.
Estimated fades when real costs show,
Backward compat—legacy logs still glow!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the primary change: recording real LLM cost in sync audit instead of estimates.
Linked Issues check ✅ Passed All acceptance criteria from #3110 are met: sync audit entries now show provider-reported token counts, actual charged amounts, and backward compatibility is preserved for old entries.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the objectives: threading usage through ChatProvider, extending SummaryOutput with usage fields, and updating audit logging across sync paths.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. labels Jun 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_sync/sources/github.rs`:
- Around line 254-265: The current aggregation adds per-batch real token/charge
values into real_input_tokens/real_output_tokens/real_charged_usd whenever any
batch reports non-zero, causing mixed batches to produce partial "real" rollups;
change the logic to only use the real_* totals when every batch in the run
provided real usage/charge. Concretely, inside the loop that inspects
output.input_tokens, output.output_tokens and output.charged_amount_usd, set
boolean flags like saw_any_real_token, saw_all_real_token (and analogously for
charge) by initializing all-real flags true and clearing them if you encounter a
fallback/no-usage batch (input_tokens==0 && output_tokens==0 or
charged_amount_usd.is_none()), sum real_* as you do but at the end only
replace/interpret estimated totals with
real_input_tokens/real_output_tokens/real_charged_usd if the corresponding
saw_all_real_* flags remain true; apply the same change for the similar block
referenced around the 305-325 region so rollups are only labeled "real" when all
batches reported real values (use identifiers output.input_tokens,
output.output_tokens, output.charged_amount_usd, real_input_tokens,
real_output_tokens, real_charged_usd, saw_real_charge as anchors).

In `@src/openhuman/memory_sync/sources/rebuild.rs`:
- Around line 219-227: The current logic promotes per-batch provider
usage/charge into run-level "real" totals even when only some batches report
provider values, causing underreporting; change it to only accept
provider-reported usage/charge when coverage is complete across all batches:
introduce counters (e.g., total_batches and batches_with_usage and
batches_with_charge) and update them when
output.input_tokens/output.output_tokens and output.charged_amount_usd are
present, accumulate into real_input_tokens/real_output_tokens/real_charged_usd
as now, but after iterating all batches replace the run-level totals only if
batches_with_usage == total_batches (for tokens) and batches_with_charge ==
total_batches (for charges); apply the same guarded logic for the other
identical block later (the section referencing real_* at lines ~264-283) so
partial coverage never promotes partial provider values to "actual."

In `@src/openhuman/memory/chat.rs`:
- Around line 111-112: The code currently masks missing provider text by using
response.text.unwrap_or_default(); instead, fail fast when response.text is
None: replace the unwrap_or_default usage with an explicit check on
response.text (e.g., match or if let Some) and return/propagate an error (or
trigger the existing fallback) when it's None so callers don’t receive an empty
string; update the handling around the variable names response, text (and keep
usage = response.usage) to ensure the function returns a Result/Error path when
text is missing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 42eb39b6-e740-4869-85ef-62e7ac170426

📥 Commits

Reviewing files that changed from the base of the PR and between 4b26267 and 4477677.

📒 Files selected for processing (6)
  • src/openhuman/memory/chat.rs
  • src/openhuman/memory_sources/sync.rs
  • src/openhuman/memory_sync/sources/audit.rs
  • src/openhuman/memory_sync/sources/github.rs
  • src/openhuman/memory_sync/sources/rebuild.rs
  • src/openhuman/memory_tree/summarise.rs

Comment thread src/openhuman/memory_sync/sources/github.rs Outdated
Comment thread src/openhuman/memory_sync/sources/rebuild.rs Outdated
Comment thread src/openhuman/memory/chat.rs Outdated
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oxoxDev heads up — CI is failing on this PR (Rust Core Coverage, Rust E2E mock backend, and PR Submission Checklist), so I'll hold off on a full approval until those are sorted out. I did spot a couple of things while going through the diff:

  1. Missing tests for the multi-batch accumulation logic — the loops in github.rs and rebuild.rs that fold SummaryOutput into the real_* accumulators have no unit tests. That's the most complex new code in this PR and the exact site where the partial-rollup issue was flagged. Even with the partial-rollup fix applied, those code paths need regression coverage — a test with two batches where one returns usage and one doesn't would catch the undercount.

  2. Duplicated accumulation logic — the est_*/real_*/saw_real_charge accumulation block is copy-pasted verbatim between github.rs and rebuild.rs. Any fix (including the partial-rollup one) has to land in both files. Worth pulling into a shared struct or helper so they can't drift apart.

Fix the CI and address these two, and I'll come back for a proper pass.

Comment thread src/openhuman/memory_sync/sources/github.rs
Comment thread src/openhuman/memory_sync/sources/rebuild.rs
oxoxDev added 3 commits June 1, 2026 19:34
…sai#3110)

Add RealCostAccumulator: provider-reported tokens/charges are promoted
to the run-level audit figure only when every batch carried that signal.
A mixed run (some batches report usage, some fall back) kept a partial
real total that undercounts the run versus the estimate. Centralises the
accounting so github.rs and rebuild.rs can't drift apart.
…ator (tinyhumansai#3110)

Replace the duplicated per-batch accumulation in both source pipelines
with the shared accumulator. Fixes the partial-rollup undercount and
removes the copy-pasted est_*/real_*/saw_real_charge block so a future
fix can't land in one file and miss the other.
…ai#3110)

Replace response.text.unwrap_or_default() with an explicit None check
that returns an error. An empty summary would otherwise be ingested and
counted against the run's real charge as if it were valid output; the
caller's fallback_summary path is the correct recovery and only runs on
Err.
@oxoxDev
Copy link
Copy Markdown
Contributor Author

oxoxDev commented Jun 1, 2026

@graycyrus thanks — both points addressed in 139278a:

  1. Missing tests for the accumulation logic — extracted the loop body into a shared RealCostAccumulator (memory_sync/sources/audit.rs) and added unit tests, including the exact two-batch case you called out (one batch reports usage, one doesn't): accumulator_mixed_usage_falls_back_to_estimate asserts the run keeps the full-coverage estimate rather than the partial real total. Also covers usage-complete-but-charge-partial and the empty-run case.

  2. Duplicated accumulation block — the copy-pasted est_*/real_*/saw_real_charge block is gone from both github.rs and rebuild.rs; both now call add_batch and read the accumulator's getters, so they can't drift.

This also fixes the partial-rollup correctness bug both you and CodeRabbit flagged: provider tokens/charge are only promoted to the run-level figure when every batch reported them.

On CI — the failing Rust Core Coverage job is a pre-existing flake on main (the credentials_profile_store_recovers_dropped_entries... e2e test in config_auth_app_state_connectivity_e2e.rs, unrelated to this PR's files; main's own PR CI run is red on the same job). The PR Submission Checklist red was an unchecked coverage item, now resolved. Pushing the fix re-runs CI.

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 1, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oxoxDev all prior changes addressed. the two things i flagged — missing multi-batch tests and the duplicated accumulation block — are both resolved cleanly.

the RealCostAccumulator is the right abstraction: all-or-nothing promotion logic is in one place, both source pipelines route through it, and the four unit tests directly exercise the mixed-batch and partial-charge edge cases i was worried about. the fail-fast on empty provider text is also in.

code is clean. holding off on approval until CI is fully green (Rust Core Coverage, Rust E2E, Playwright, and Frontend Coverage are still pending). no new issues found in the follow-up commits.

memory_sources_validation_and_sync_classification_edges asserted
toolkit_from_slug(" MICROSOFT_TEAMS_SEND ") == Some("microsoft"), but the
slug map deliberately yields "microsoft_teams" (tool_scope.rs:98) and the
function's own unit test (tool_scope.rs) already asserts "microsoft_teams".
Align the e2e assertion with documented behavior; pre-existing failure on
main, unrelated to this PR's cost-audit changes.
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oxoxDev the toolkit_from_slug assertion fix in the latest commit looks correct — "microsoft_teams" is the right expected value per the slug map. That should unblock the Rust Core Coverage gate.

At the time of this check, CI is still showing failures on Rust Core Coverage and Playwright lane 1/4 — they may still be in-flight against the new HEAD. Everything else is green. Once the full suite is green I'll approve.

@graycyrus graycyrus merged commit 4187561 into tinyhumansai:main Jun 1, 2026
18 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(memory): surface actual LLM usage/cost from inference API in sync audit

2 participants