feat(memory): record real LLM cost in sync audit (#3110) by oxoxDev · Pull Request #3150 · tinyhumansai/openhuman

oxoxDev · 2026-06-01T12:39:48Z

Summary

Record the real, provider-reported LLM charge in the sync audit log instead of only a token-÷-4 estimate.
New ChatProvider::chat_for_text_with_usage surfaces provider UsageInfo (additive; default impl returns (text, None) — no ripple for other providers).
SummaryOutput now carries input_tokens / output_tokens / charged_amount_usd; github sync + tree rebuild thread the real charge into the audit entry.
SyncAuditEntry.actual_charged_usd: Option<f64> (#[serde(default)]) with effective_cost_usd() / cost_is_actual() helpers; the existing estimate stays as a fallback.
Fully back-compatible: pre-existing audit lines (no actual_charged_usd key) still deserialize and render their estimate.

Problem

sync_audit.jsonl only stored estimated_cost_usd, computed from a token_count / 4 heuristic. There was no record of what the provider actually billed for a memory sync, so real spend could not be reconciled against the estimate — and the estimate drifts from reality whenever provider pricing or tokenization differs from the heuristic.

Solution

src/openhuman/memory/chat.rs — add ChatProvider::chat_for_text_with_usage (default returns (text, None)); InferenceChatProvider routes through Provider::chat and parses the returned UsageInfo.
src/openhuman/memory_tree/summarise.rs — thread real usage into SummaryOutput (input_tokens, output_tokens, charged_amount_usd); treat a zero charge as absent.
src/openhuman/memory_sync/sources/audit.rs — add actual_charged_usd (#[serde(default)]) + effective_cost_usd() (prefers actual, falls back to estimate) and cost_is_actual().
src/openhuman/memory_sync/sources/github.rs + rebuild.rs (+ memory_sources/sync.rs) — record the real charge on the github-sync and tree-rebuild paths; log cost_is_actual.

Design note: ingest.rs is intentionally left untouched — cost is read off SummaryOutput before ingest_summary, so threading it through SummaryIngestInput.token_count would be dead plumbing.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy — 20 unit tests across audit / summarise / chat incl. failure/edge cases (provider reports no usage → None fallback, zero-charge treated as absent, pre-feat(memory): surface actual LLM usage/cost from inference API in sync audit #3110 entry deserializes).
Diff coverage ≥ 80% — changed lines (the shared RealCostAccumulator, the two source-pipeline call sites, and the chat.rs fail-fast path) are exercised by the new accumulator_* unit tests plus existing audit/chat suites; cargo-llvm-cov not run locally (heavy), the CI Coverage Gate enforces the threshold.
Coverage matrix updated — N/A: additive cost field + helpers, no user-facing feature added/removed/renamed.
All affected feature IDs from the matrix are listed under ## Related — N/A: no feature behaviour added or moved.
No new external network dependencies introduced — uses the existing inference provider; no new endpoints.
Manual smoke checklist updated if this touches release-cut surfaces — N/A: observability/audit-logging change, not a release-cut UI surface.
Linked issue closed via Closes #NNN in the ## Related section.

Impact

Desktop/CLI core: sync audit entries now show the real billed cost when the provider reports it; otherwise the estimate is shown exactly as before.
No migration: #[serde(default)] makes the new field optional — old sync_audit.jsonl lines load unchanged.
No behaviour change for providers that don't return usage (the None branch is the prior behaviour).
Staging verification: the None/estimate-fallback branch + back-compat were confirmed live on staging (real audit entry written, deserializes clean, full sync→seal pipeline ran without regression). The Some/real-charge branch is unit-covered; observing it live requires a billing-enabled summarizer backend (staging returns no charge).

Closes feat(memory): surface actual LLM usage/cost from inference API in sync audit #3110
Follow-up PR(s)/TODOs: feat: sync budget controls + agent autonomy level picker with cost estimates #3117 (budget controls + autonomy picker) stacks on this branch.
Unrelated prep (commit 06cd47e3): fixed a stale assertion in tests/memory_raw_coverage_e2e.rs (toolkit_from_slug(" MICROSOFT_TEAMS_SEND ") expected "microsoft" but the slug map and the function's own unit test yield "microsoft_teams"). This was failing the Rust Core Coverage gate on main for every PR, unrelated to this PR's cost-audit changes; included here to unblock CI.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: feat/3110-real-llm-cost-audit
Commit SHA: 447767743014bc1249a1186b0617f627f2b6c654

Validation Run

pnpm --filter openhuman-app format:check — N/A: no frontend files changed; cargo fmt --all --check clean.
pnpm typecheck — N/A: Rust-only change, no TS touched.
Focused tests: cargo test --lib openhuman::memory_sync::sources::audit | memory_tree::summarise | memory::chat → 20 passed.
Rust fmt/check (if changed): cargo check --lib clean, cargo clippy --lib --no-deps 0 new warnings on changed files.
Tauri fmt/check (if changed): N/A — app/src-tauri not touched.

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: sync audit records the real provider charge when available; estimate retained as fallback.
User-visible effect: actual_charged_usd now present in sync_audit.jsonl / sync-audit panel for billed syncs; estimate-only behaviour unchanged otherwise.

Summary by CodeRabbit

New Features
- Added capture of provider-reported token usage and actual billing charges in data synchronization operations.
- Audit logs and sync outcomes now display both estimated and actual costs from data providers, with fallback to estimates when actual charges are unavailable.

…sai#3110) Add ChatProvider::chat_for_text_with_usage returning (String, Option<UsageInfo>) with a default impl reporting None so test doubles and external impls keep compiling. InferenceChatProvider routes through Provider::chat (which parses usage via compatible::extract_usage) instead of chat_with_history, which discarded it.

…nsai#3110) Add input_tokens, output_tokens, charged_amount_usd to SummaryOutput and populate them from the provider response in summarise(). Zero charge is treated as absent (None) so callers fall back to the estimate. fallback_summary reports no usage.

…3110) New Option<f64> field with #[serde(default)] so pre-tinyhumansai#3110 audit lines still deserialize. Adds effective_cost_usd()/cost_is_actual() helpers; keeps estimated_cost_usd + estimate_cost_usd() as the fallback.

…3110) github sync and rebuild now accumulate real provider token counts + charged_amount_usd across summarise() batches, recording them in the audit entry when any batch reported usage and otherwise falling back to the len/4 estimate. RebuildOutcome carries actual_charged_usd through to callers.

…3110) Set actual_charged_usd: None on the zero-cost audit entries written when no LLM call happens, and log the real charge (falling back to the estimate) on rebuild completion.

coderabbitai · 2026-06-01T12:40:06Z

📝 Walkthrough

Walkthrough

This PR threads actual LLM token counts and billing amounts from the inference provider through the memory chat layer into sync audit logs. It introduces a new ChatProvider::chat_for_text_with_usage method that surfaces usage data, extends SummaryOutput to carry provider usage fields, and updates GitHub sync and rebuild operations to track both estimated and real token counts, conditionally recording actual backend charges in audit entries.

Changes

Provider usage threading and audit instrumentation

Layer / File(s)	Summary
Chat provider usage interface `src/openhuman/memory/chat.rs`	The `ChatProvider` trait gains `chat_for_text_with_usage` method with default implementation returning `None` usage. `InferenceChatProvider` overrides it to call the provider's `chat` API and extract both response text and optional `UsageInfo`; logging includes usage presence and token details.
Audit entry schema and cost accessors `src/openhuman/memory_sync/sources/audit.rs`	`SyncAuditEntry` adds optional `actual_charged_usd` field with backward-compatible deserialization. New `effective_cost_usd()` and `cost_is_actual()` methods provide displayable cost and presence indicators, preferring actual charges when present.
Summary output usage and billing fields `src/openhuman/memory_tree/summarise.rs`	`SummaryOutput` extends with `input_tokens`, `output_tokens`, `charged_amount_usd` fields and derives `Default`. `summarise` switches to `chat_for_text_with_usage` to thread provider usage. `fallback_summary` populates new fields as "no provider usage" (zero/`None`).
GitHub sync token and cost accounting `src/openhuman/memory_sync/sources/github.rs`	`run_github_sync` maintains separate estimated and real token accumulators. Post-processing selects real totals when any provider usage exists, computes estimated cost from estimates, and conditionally sets `actual_charged_usd` when real charges observed. Audit and messages use the chosen cost.
Rebuild token and cost accounting `src/openhuman/memory_sync/sources/rebuild.rs`	`RebuildOutcome` gains optional `actual_charged_usd` and derives `Default`. Rebuild function parallels GitHub sync: tracks estimated tokens, folds real usage when present, selects appropriate counts for audit, and records actual charges when observed.
Sync dispatcher audit entry recording `src/openhuman/memory_sources/sync.rs`	Manual sync and rebuild audit entries explicitly set `actual_charged_usd: None` (no inference call). Rebuild logging uses `effective_cost_usd()` and `cost_is_actual()` for display.

Sequence Diagram(s)

sequenceDiagram
  participant ChatProvider
  participant InferenceChatProvider
  participant InferenceProvider
  participant summarise
  participant SyncAudit
  ChatProvider->>InferenceChatProvider: chat_for_text_with_usage()
  InferenceChatProvider->>InferenceProvider: chat(ChatRequest)
  InferenceProvider-->>InferenceChatProvider: ChatResponse{text, usage}
  InferenceChatProvider-->>ChatProvider: (String, Option<UsageInfo>)
  ChatProvider->>summarise: with usage data
  summarise-->>SyncAudit: SummaryOutput{tokens, charge}
  SyncAudit->>SyncAudit: effective_cost_usd()
  SyncAudit->>SyncAudit: cost_is_actual()

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

feature, memory, rust-core

Suggested reviewers

graycyrus
M3gA-Mind

🐇 Through chat and cost we now thread,
Real tokens dance where heuristics fled,
Provider usage blooms at every stage,
Audit logs write truth upon the page.
Estimated fades when real costs show,
Backward compat—legacy logs still glow!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the primary change: recording real LLM cost in sync audit instead of estimates.
Linked Issues check	✅ Passed	All acceptance criteria from `#3110` are met: sync audit entries now show provider-reported token counts, actual charged amounts, and backward compatibility is preserved for old entries.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the objectives: threading usage through ChatProvider, extending SummaryOutput with usage fields, and updating audit logging across sync paths.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_sync/sources/github.rs`:
- Around line 254-265: The current aggregation adds per-batch real token/charge
values into real_input_tokens/real_output_tokens/real_charged_usd whenever any
batch reports non-zero, causing mixed batches to produce partial "real" rollups;
change the logic to only use the real_* totals when every batch in the run
provided real usage/charge. Concretely, inside the loop that inspects
output.input_tokens, output.output_tokens and output.charged_amount_usd, set
boolean flags like saw_any_real_token, saw_all_real_token (and analogously for
charge) by initializing all-real flags true and clearing them if you encounter a
fallback/no-usage batch (input_tokens==0 && output_tokens==0 or
charged_amount_usd.is_none()), sum real_* as you do but at the end only
replace/interpret estimated totals with
real_input_tokens/real_output_tokens/real_charged_usd if the corresponding
saw_all_real_* flags remain true; apply the same change for the similar block
referenced around the 305-325 region so rollups are only labeled "real" when all
batches reported real values (use identifiers output.input_tokens,
output.output_tokens, output.charged_amount_usd, real_input_tokens,
real_output_tokens, real_charged_usd, saw_real_charge as anchors).

In `@src/openhuman/memory_sync/sources/rebuild.rs`:
- Around line 219-227: The current logic promotes per-batch provider
usage/charge into run-level "real" totals even when only some batches report
provider values, causing underreporting; change it to only accept
provider-reported usage/charge when coverage is complete across all batches:
introduce counters (e.g., total_batches and batches_with_usage and
batches_with_charge) and update them when
output.input_tokens/output.output_tokens and output.charged_amount_usd are
present, accumulate into real_input_tokens/real_output_tokens/real_charged_usd
as now, but after iterating all batches replace the run-level totals only if
batches_with_usage == total_batches (for tokens) and batches_with_charge ==
total_batches (for charges); apply the same guarded logic for the other
identical block later (the section referencing real_* at lines ~264-283) so
partial coverage never promotes partial provider values to "actual."

In `@src/openhuman/memory/chat.rs`:
- Around line 111-112: The code currently masks missing provider text by using
response.text.unwrap_or_default(); instead, fail fast when response.text is
None: replace the unwrap_or_default usage with an explicit check on
response.text (e.g., match or if let Some) and return/propagate an error (or
trigger the existing fallback) when it's None so callers don’t receive an empty
string; update the handling around the variable names response, text (and keep
usage = response.usage) to ensure the function returns a Result/Error path when
text is missing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 42eb39b6-e740-4869-85ef-62e7ac170426

📥 Commits

Reviewing files that changed from the base of the PR and between 4b26267 and 4477677.

📒 Files selected for processing (6)

src/openhuman/memory/chat.rs
src/openhuman/memory_sources/sync.rs
src/openhuman/memory_sync/sources/audit.rs
src/openhuman/memory_sync/sources/github.rs
src/openhuman/memory_sync/sources/rebuild.rs
src/openhuman/memory_tree/summarise.rs

graycyrus

@oxoxDev heads up — CI is failing on this PR (Rust Core Coverage, Rust E2E mock backend, and PR Submission Checklist), so I'll hold off on a full approval until those are sorted out. I did spot a couple of things while going through the diff:

Missing tests for the multi-batch accumulation logic — the loops in github.rs and rebuild.rs that fold SummaryOutput into the real_* accumulators have no unit tests. That's the most complex new code in this PR and the exact site where the partial-rollup issue was flagged. Even with the partial-rollup fix applied, those code paths need regression coverage — a test with two batches where one returns usage and one doesn't would catch the undercount.
Duplicated accumulation logic — the est_*/real_*/saw_real_charge accumulation block is copy-pasted verbatim between github.rs and rebuild.rs. Any fix (including the partial-rollup one) has to land in both files. Worth pulling into a shared struct or helper so they can't drift apart.

Fix the CI and address these two, and I'll come back for a proper pass.

…sai#3110) Add RealCostAccumulator: provider-reported tokens/charges are promoted to the run-level audit figure only when every batch carried that signal. A mixed run (some batches report usage, some fall back) kept a partial real total that undercounts the run versus the estimate. Centralises the accounting so github.rs and rebuild.rs can't drift apart.

…ator (tinyhumansai#3110) Replace the duplicated per-batch accumulation in both source pipelines with the shared accumulator. Fixes the partial-rollup undercount and removes the copy-pasted est_*/real_*/saw_real_charge block so a future fix can't land in one file and miss the other.

…ai#3110) Replace response.text.unwrap_or_default() with an explicit None check that returns an error. An empty summary would otherwise be ingested and counted against the run's real charge as if it were valid output; the caller's fallback_summary path is the correct recovery and only runs on Err.

oxoxDev · 2026-06-01T14:09:35Z

@graycyrus thanks — both points addressed in 139278a:

Missing tests for the accumulation logic — extracted the loop body into a shared RealCostAccumulator (memory_sync/sources/audit.rs) and added unit tests, including the exact two-batch case you called out (one batch reports usage, one doesn't): accumulator_mixed_usage_falls_back_to_estimate asserts the run keeps the full-coverage estimate rather than the partial real total. Also covers usage-complete-but-charge-partial and the empty-run case.
Duplicated accumulation block — the copy-pasted est_*/real_*/saw_real_charge block is gone from both github.rs and rebuild.rs; both now call add_batch and read the accumulator's getters, so they can't drift.

This also fixes the partial-rollup correctness bug both you and CodeRabbit flagged: provider tokens/charge are only promoted to the run-level figure when every batch reported them.

On CI — the failing Rust Core Coverage job is a pre-existing flake on main (the credentials_profile_store_recovers_dropped_entries... e2e test in config_auth_app_state_connectivity_e2e.rs, unrelated to this PR's files; main's own PR CI run is red on the same job). The PR Submission Checklist red was an unchecked coverage item, now resolved. Pushing the fix re-runs CI.

graycyrus

@oxoxDev all prior changes addressed. the two things i flagged — missing multi-batch tests and the duplicated accumulation block — are both resolved cleanly.

the RealCostAccumulator is the right abstraction: all-or-nothing promotion logic is in one place, both source pipelines route through it, and the four unit tests directly exercise the mixed-batch and partial-charge edge cases i was worried about. the fail-fast on empty provider text is also in.

code is clean. holding off on approval until CI is fully green (Rust Core Coverage, Rust E2E, Playwright, and Frontend Coverage are still pending). no new issues found in the follow-up commits.

memory_sources_validation_and_sync_classification_edges asserted toolkit_from_slug(" MICROSOFT_TEAMS_SEND ") == Some("microsoft"), but the slug map deliberately yields "microsoft_teams" (tool_scope.rs:98) and the function's own unit test (tool_scope.rs) already asserts "microsoft_teams". Align the e2e assertion with documented behavior; pre-existing failure on main, unrelated to this PR's cost-audit changes.

graycyrus

@oxoxDev the toolkit_from_slug assertion fix in the latest commit looks correct — "microsoft_teams" is the right expected value per the slug map. That should unblock the Rust Core Coverage gate.

At the time of this check, CI is still showing failures on Rust Core Coverage and Playwright lane 1/4 — they may still be in-flight against the new HEAD. Everything else is green. Once the full suite is green I'll approve.

oxoxDev added 5 commits June 1, 2026 17:00

feat(memory): surface real rebuild charge in sync logs (tinyhumansai#…

4477677

…3110) Set actual_charged_usd: None on the zero-cost audit entries written when no LLM call happens, and log the real charge (falling back to the estimate) on rebuild completion.

oxoxDev requested a review from a team June 1, 2026 12:39

coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. labels Jun 1, 2026

coderabbitai Bot requested changes Jun 1, 2026

View reviewed changes

Comment thread src/openhuman/memory_sync/sources/github.rs Outdated

Comment thread src/openhuman/memory_sync/sources/rebuild.rs Outdated

Comment thread src/openhuman/memory/chat.rs Outdated

graycyrus reviewed Jun 1, 2026

View reviewed changes

Comment thread src/openhuman/memory_sync/sources/github.rs

Comment thread src/openhuman/memory_sync/sources/rebuild.rs

oxoxDev added 3 commits June 1, 2026 19:34

coderabbitai Bot previously approved these changes Jun 1, 2026

View reviewed changes

graycyrus reviewed Jun 1, 2026

View reviewed changes

oxoxDev dismissed coderabbitai[bot]’s stale review via 06cd47e June 1, 2026 14:35

This was referenced Jun 1, 2026

fix(agent): refresh orchestrator integration context on mid-session OAuth (#3044) #3153

Merged

observability(sentry): attach user id to Rust-core events (#3135) #3136

Open

graycyrus reviewed Jun 1, 2026

View reviewed changes

graycyrus approved these changes Jun 1, 2026

View reviewed changes

graycyrus merged commit 4187561 into tinyhumansai:main Jun 1, 2026
18 of 22 checks passed

oxoxDev mentioned this pull request Jun 1, 2026

feat(chat): ArtifactCard + Tauri download + backend artifact events (#2779) #3017

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): record real LLM cost in sync audit (#3110)#3150

feat(memory): record real LLM cost in sync audit (#3110)#3150
graycyrus merged 9 commits into
tinyhumansai:mainfrom
oxoxDev:feat/3110-real-llm-cost-audit

oxoxDev commented Jun 1, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graycyrus left a comment

Uh oh!

Uh oh!

Uh oh!

oxoxDev commented Jun 1, 2026

Uh oh!

graycyrus left a comment

Uh oh!

graycyrus left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oxoxDev commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

oxoxDev commented Jun 1, 2026

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oxoxDev commented Jun 1, 2026 •

edited

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading