feat(cache): wire-level prompt/response dumps + JSONL into session_raw/#640
Conversation
…n usage tracking - Updated the `write_transcript` function to accept an optional `last_assistant_turn_usage` parameter, allowing for the inclusion of per-message token and cost figures in the JSONL output. - Modified the `persist_session_transcript` method to capture and pass the last turn's usage data to the transcript writer, ensuring accurate tracking of resource usage. - Improved documentation to clarify the new functionality and its impact on transcript persistence. This change enhances the fidelity of session transcripts by associating usage metrics with the last assistant message, improving overall transparency in resource consumption.
…ed on file extension - Renamed `read_transcript` function to clarify its purpose and updated its logic to first check the file extension. - Added handling for legacy `.md` files, ensuring they are processed with the appropriate parser. - Enhanced fallback mechanism to check for the existence of a `.md` sibling file if the primary JSONL file is not found. This change improves the robustness of transcript reading by accommodating legacy formats while maintaining compatibility with new JSONL files.
- Added functionality to capture the final assistant reply in the transcript snapshot, ensuring that both the prompt and response are persisted in the JSONL output. - This enhancement improves the completeness of session transcripts by including the assistant's final message alongside the user prompts. This change enhances the fidelity of session transcripts, providing a more comprehensive record of interactions.
…ion_raw/
Two related changes for KV-cache debugging and on-disk clarity:
1. Wire-level dump (`providers/compatible.rs`)
When OPENHUMAN_PROMPT_DUMP_DIR is set, every chat completion writes
the outgoing NativeChatRequest body and the aggregated ApiChatResponse
to paired timestamped JSON files. Covers both the non-streaming and
SSE streaming paths. Unset → zero overhead, no behavior change.
Why: the prompt-level transcript captures what the dispatcher
assembled, but the actual wire bytes (including
prompt_tokens_details.cached_tokens and the openhuman meta block)
only surface at the HTTP layer. Pairing request + response per turn
lets us diff consecutive turns to confirm cache-hit accounting and
diagnose cache-miss causes.
2. JSONL source-of-truth moves to session_raw/ (`agent/harness/session/transcript.rs`)
JSONL transcript files now live under workspace/session_raw/DDMMYYYY/
instead of workspace/sessions/. The human-readable .md companion
stays in workspace/sessions/DDMMYYYY/ so the structured source-of-
truth and the readable debug view are cleanly separated on disk.
- resolve_new_transcript_path returns session_raw/ paths and
advances the index past both session_raw/*.jsonl and legacy
sessions/*.md so indices stay unique across the migration.
- find_latest_transcript prefers session_raw/ and falls back to the
legacy sessions/ dir for pre-migration .md files (one-release compat).
- md_companion_path derives the sessions/ md path from the
session_raw/ jsonl path via path-component rewrite, with a
sibling-file fallback for tests that use flat tempdirs.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 47 minutes and 50 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR refactors session transcript persistence from Markdown format to JSONL, adds per-turn provider usage tracking with timestamps, and introduces optional prompt/response dump diagnostics. It updates transcript read/write signatures, adds Changes
Sequence DiagramsequenceDiagram
participant Agent
participant Turn
participant Transcript as Transcript Handler
participant Persist as File System
Note over Agent,Persist: Turn Completion & Persistence
Agent->>Turn: execute_turn(..., messages)
Turn->>Turn: capture usage from<br/>provider response
Turn->>Turn: create TurnUsage<br/>(model, tokens, cost, ts)
Turn->>Agent: return with final<br/>assistant message
Agent->>Agent: persist_session_transcript<br/>(..., turn_usage)
Note over Agent,Persist: JSONL Write with Metadata
Agent->>Transcript: write_transcript<br/>(jsonl_path, messages,<br/>last_assistant_turn_usage)
Transcript->>Persist: write _meta line<br/>(session config)
Transcript->>Persist: write message lines<br/>(role, content JSON)
Transcript->>Persist: attach TurnUsage<br/>to last assistant<br/>message
Transcript->>Persist: re-render companion<br/>.md file
Transcript->>Agent: success
Note over Agent,Persist: Legacy Fallback Path
alt JSONL exists
Transcript->>Persist: read from .jsonl
else JSONL missing
Transcript->>Persist: read from legacy .md
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/agent/harness/session/turn.rs (1)
406-425:⚠️ Potential issue | 🟠 MajorClear
last_turn_usagewhen a response omits usage.If iteration N-1 had
resp.usage = Some(...)and the final iteration returnsusage = None, this variable keeps the previous iteration’s numbers andpersist_session_transcript()will attach them to the final assistant message. That misattributes token/cost data in the JSONL source of truth.Proposed fix
- if let Some(ref usage) = resp.usage { + last_turn_usage = None; + if let Some(ref usage) = resp.usage { self.context.record_usage(usage); cumulative_input_tokens += usage.input_tokens; cumulative_output_tokens += usage.output_tokens; cumulative_cached_input_tokens += usage.cached_input_tokens; cumulative_charged_usd += usage.charged_amount_usd; // Snapshot this turn's usage so the transcript // writer can attribute it to the last assistant // message. last_turn_usage = Some(transcript::TurnUsage { model: effective_model.clone(), usage: transcript::MessageUsage { input: usage.input_tokens, output: usage.output_tokens, cached_input: usage.cached_input_tokens, cost_usd: usage.charged_amount_usd, }, ts: chrono::Utc::now().to_rfc3339(), }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/session/turn.rs` around lines 406 - 425, The problem is that last_turn_usage retains the previous iteration's values when resp.usage is None, causing misattribution in persist_session_transcript(); update the handling around the usage check so that when resp.usage is Some(...) you perform context.record_usage(...) and set last_turn_usage as now, and when resp.usage is None you explicitly clear last_turn_usage = None (and avoid touching cumulative_* only when usage exists); modify the block that currently matches resp.usage (the if let Some(ref usage) { ... } handling around record_usage and last_turn_usage) to add an else branch that clears last_turn_usage to ensure the final assistant message has no stale usage attached.
🧹 Nitpick comments (3)
src/openhuman/providers/compatible.rs (1)
71-75: Demote prompt-dump success logs todebugortrace.These are opt-in diagnostics behind
OPENHUMAN_PROMPT_DUMP_DIR, so emitting oneinfolog per request and per response will get noisy fast during dump sessions.As per coding guidelines
src/**/*.rs: "Use log/tracing at debug or trace level for development-oriented diagnostics in Rust."Also applies to: 117-121
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/providers/compatible.rs` around lines 71 - 75, The prompt-dump success logs currently use log::info! and should be lowered to a development-level log; replace the log::info! calls that print "[prompt_dump] wrote response {} bytes -> {}" (the call that passes payload.len() and path.display()) with log::debug! (or log::trace! if you prefer more verbosity), and do the same for the other occurrence mentioned (the similar info log around lines 117-121) so both dump success messages are emitted at debug/trace level instead of info.src/openhuman/agent/harness/session/transcript.rs (2)
177-183: Nit:rpositionreads cleaner here.♻️ Proposed refactor
- let last_assistant_idx = messages.iter().enumerate().rev().find_map(|(i, m)| { - if m.role == "assistant" { - Some(i) - } else { - None - } - }); + let last_assistant_idx = messages.iter().rposition(|m| m.role == "assistant");While you're in here, the
last_assistant_turn_usage.unwrap()on line 190 can also be collapsed by pattern-matching on bothOptions together (e.g. the sameif let (Some(idx), Some(tu)) = ...shape you already use on line 228), which removes the guard + unwrap duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/session/transcript.rs` around lines 177 - 183, Replace the manual reverse-enumeration using messages.iter().enumerate().rev().find_map(...) with the clearer rposition call to compute last_assistant_idx (e.g. messages.rposition(|m| m.role == "assistant") conceptually) and then collapse the separate guard + unwrap by pattern-matching both Option values together (e.g. if let (Some(last_assistant_idx), Some(last_assistant_turn_usage)) = (...)) so you remove the explicit unwrap and duplicate guard; update references to the variables last_assistant_idx and last_assistant_turn_usage accordingly in the surrounding logic.
301-336: Prefer a positional rule over substring matching for the_metaline.Using
line.contains("\"_meta\"")as a classifier works today because JSON escapes any"_meta"insidecontent(it becomes\"_meta\"), but it is still a fragile heuristic:
- If a future schema bump ever puts a
_meta-named key into a message line (e.g. via the_extraflatten map populated by someone hand-editing or by a forward-compat writer), that line will be mis-routed to the meta parser, the parse will fail, and a real message will be silently dropped with only a warning.- It couples the parser to an incidental string pattern instead of the documented invariant ("meta is always line 1") stated in the module doc.
Since the writer always emits the meta line as the first non-empty line, consider enforcing that positionally: parse the first non-empty line strictly as
MetaLine(error on failure), then parse every subsequent non-empty line strictly asMessageLine. That also lets you drop theif meta.is_none() { return Err(...) }re-entry guard.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/session/transcript.rs` around lines 301 - 336, The current code uses line.contains("\"_meta\"") to detect meta lines; change this to a positional rule: treat the first non-empty line encountered as the MetaLine (attempt serde_json::from_str::<MetaLine>(line) and return an error if it fails) and only parse all subsequent non-empty lines as MessageLine; remove the substring check, the meta.is_none() re-entry guard and the special-case warning that drops the first message, and update handling around the meta variable (where MetaLine and TranscriptMeta are constructed) to occur only once when meta is still None and for the first non-empty line, using path.display() and line_no for contextual errors as currently done.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/agent/harness/session/transcript.rs`:
- Around line 232-247: The companion .md write currently runs
fs::create_dir_all(...) and fs::write(...) in write_transcript (uses
md_companion_path and render_markdown) and returns Err on failure; change that
so any error from creating the companion directory or writing the markdown is
caught and logged with log::warn! (including error details and
md_path.display()) and does NOT propagate the error — after logging continue to
return Ok(()) because the JSONL flush is the source of truth; keep the existing
debug log on success.
In `@src/openhuman/providers/compatible.rs`:
- Around line 79-128: The sequence reservation must be atomic: replace the
non-atomic peek_dump_seq() with a reservation that increments PROMPT_DUMP_SEQ in
one step and use that reserved value when writing dumps. Concretely, change
peek_dump_seq() to perform PROMPT_DUMP_SEQ.fetch_add(1, Ordering::Relaxed)
(rename to reserve_dump_seq() or similar) and update call sites that currently
call peek_dump_seq() before dump_prompt_if_enabled() so they call the new
reservation function and then pass or correlate that reserved seq with
dump_prompt_if_enabled(); ensure PROMPT_DUMP_SEQ is the single source of truth
for sequence allocation.
---
Outside diff comments:
In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 406-425: The problem is that last_turn_usage retains the previous
iteration's values when resp.usage is None, causing misattribution in
persist_session_transcript(); update the handling around the usage check so that
when resp.usage is Some(...) you perform context.record_usage(...) and set
last_turn_usage as now, and when resp.usage is None you explicitly clear
last_turn_usage = None (and avoid touching cumulative_* only when usage exists);
modify the block that currently matches resp.usage (the if let Some(ref usage) {
... } handling around record_usage and last_turn_usage) to add an else branch
that clears last_turn_usage to ensure the final assistant message has no stale
usage attached.
---
Nitpick comments:
In `@src/openhuman/agent/harness/session/transcript.rs`:
- Around line 177-183: Replace the manual reverse-enumeration using
messages.iter().enumerate().rev().find_map(...) with the clearer rposition call
to compute last_assistant_idx (e.g. messages.rposition(|m| m.role ==
"assistant") conceptually) and then collapse the separate guard + unwrap by
pattern-matching both Option values together (e.g. if let
(Some(last_assistant_idx), Some(last_assistant_turn_usage)) = (...)) so you
remove the explicit unwrap and duplicate guard; update references to the
variables last_assistant_idx and last_assistant_turn_usage accordingly in the
surrounding logic.
- Around line 301-336: The current code uses line.contains("\"_meta\"") to
detect meta lines; change this to a positional rule: treat the first non-empty
line encountered as the MetaLine (attempt serde_json::from_str::<MetaLine>(line)
and return an error if it fails) and only parse all subsequent non-empty lines
as MessageLine; remove the substring check, the meta.is_none() re-entry guard
and the special-case warning that drops the first message, and update handling
around the meta variable (where MetaLine and TranscriptMeta are constructed) to
occur only once when meta is still None and for the first non-empty line, using
path.display() and line_no for contextual errors as currently done.
In `@src/openhuman/providers/compatible.rs`:
- Around line 71-75: The prompt-dump success logs currently use log::info! and
should be lowered to a development-level log; replace the log::info! calls that
print "[prompt_dump] wrote response {} bytes -> {}" (the call that passes
payload.len() and path.display()) with log::debug! (or log::trace! if you prefer
more verbosity), and do the same for the other occurrence mentioned (the similar
info log around lines 117-121) so both dump success messages are emitted at
debug/trace level instead of info.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2d53fd2d-4d2a-4817-8da9-17013b1078c6
📒 Files selected for processing (4)
src/openhuman/agent/harness/session/transcript.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/subagent_runner.rssrc/openhuman/providers/compatible.rs
- transcript: .md companion write is now best-effort — failures are logged with log::warn! and Ok(()) is returned. The JSONL above is the source of truth; a readable-log hiccup must not take down state persistence. - compatible: replace peek_dump_seq() (non-atomic) with reserve_dump_seq() that fetch_adds PROMPT_DUMP_SEQ in one step and returns the reserved value. dump_prompt_if_enabled now takes seq as a parameter, so request/response files cannot be misaligned under concurrent requests. - turn: explicitly clear last_turn_usage when resp.usage is None so the final assistant message can't inherit a prior iteration's stale numbers during a tool-using turn. - transcript (nit): last-assistant lookup uses rposition; the emit loop pattern-matches (Option, Option) together so there's no separate unwrap. - transcript (nit): JSONL meta parsing is positional — first non-empty line must be _meta or we error out. Replaces the substring-based heuristic that could false-positive on message content. - compatible (nit): lower prompt/response dump success logs from info to debug — they fire on every turn when the env var is set.
…w/ (tinyhumansai#640) * feat(transcript): enhance session transcript persistence with per-turn usage tracking - Updated the `write_transcript` function to accept an optional `last_assistant_turn_usage` parameter, allowing for the inclusion of per-message token and cost figures in the JSONL output. - Modified the `persist_session_transcript` method to capture and pass the last turn's usage data to the transcript writer, ensuring accurate tracking of resource usage. - Improved documentation to clarify the new functionality and its impact on transcript persistence. This change enhances the fidelity of session transcripts by associating usage metrics with the last assistant message, improving overall transparency in resource consumption. * refactor(transcript): improve transcript reading logic by routing based on file extension - Renamed `read_transcript` function to clarify its purpose and updated its logic to first check the file extension. - Added handling for legacy `.md` files, ensuring they are processed with the appropriate parser. - Enhanced fallback mechanism to check for the existence of a `.md` sibling file if the primary JSONL file is not found. This change improves the robustness of transcript reading by accommodating legacy formats while maintaining compatibility with new JSONL files. * feat(transcript): mirror final assistant reply in session transcript - Added functionality to capture the final assistant reply in the transcript snapshot, ensuring that both the prompt and response are persisted in the JSONL output. - This enhancement improves the completeness of session transcripts by including the assistant's final message alongside the user prompts. This change enhances the fidelity of session transcripts, providing a more comprehensive record of interactions. * feat(cache): wire-level prompt/response dumps + split JSONL into session_raw/ Two related changes for KV-cache debugging and on-disk clarity: 1. Wire-level dump (`providers/compatible.rs`) When OPENHUMAN_PROMPT_DUMP_DIR is set, every chat completion writes the outgoing NativeChatRequest body and the aggregated ApiChatResponse to paired timestamped JSON files. Covers both the non-streaming and SSE streaming paths. Unset → zero overhead, no behavior change. Why: the prompt-level transcript captures what the dispatcher assembled, but the actual wire bytes (including prompt_tokens_details.cached_tokens and the openhuman meta block) only surface at the HTTP layer. Pairing request + response per turn lets us diff consecutive turns to confirm cache-hit accounting and diagnose cache-miss causes. 2. JSONL source-of-truth moves to session_raw/ (`agent/harness/session/transcript.rs`) JSONL transcript files now live under workspace/session_raw/DDMMYYYY/ instead of workspace/sessions/. The human-readable .md companion stays in workspace/sessions/DDMMYYYY/ so the structured source-of- truth and the readable debug view are cleanly separated on disk. - resolve_new_transcript_path returns session_raw/ paths and advances the index past both session_raw/*.jsonl and legacy sessions/*.md so indices stay unique across the migration. - find_latest_transcript prefers session_raw/ and falls back to the legacy sessions/ dir for pre-migration .md files (one-release compat). - md_companion_path derives the sessions/ md path from the session_raw/ jsonl path via path-component rewrite, with a sibling-file fallback for tests that use flat tempdirs. * fix(cache,transcript): address PR review findings - transcript: .md companion write is now best-effort — failures are logged with log::warn! and Ok(()) is returned. The JSONL above is the source of truth; a readable-log hiccup must not take down state persistence. - compatible: replace peek_dump_seq() (non-atomic) with reserve_dump_seq() that fetch_adds PROMPT_DUMP_SEQ in one step and returns the reserved value. dump_prompt_if_enabled now takes seq as a parameter, so request/response files cannot be misaligned under concurrent requests. - turn: explicitly clear last_turn_usage when resp.usage is None so the final assistant message can't inherit a prior iteration's stale numbers during a tool-using turn. - transcript (nit): last-assistant lookup uses rposition; the emit loop pattern-matches (Option, Option) together so there's no separate unwrap. - transcript (nit): JSONL meta parsing is positional — first non-empty line must be _meta or we error out. Replaces the substring-based heuristic that could false-positive on message content. - compatible (nit): lower prompt/response dump success logs from info to debug — they fire on every turn when the env var is set.
Summary
Vec<ChatMessage>as{workspace}/session_raw/DDMMYYYY/{agent}_{index}.jsonl, with a_metaheader line carrying cumulative tokens, cache boundary, and charged USD..mdcompanion re-rendered on every write under{workspace}/sessions/DDMMYYYY/{agent}_{index}.md(humans only — never read back).model,usage {input, output, cached_input, cost_usd}, andtspulled from the provider'sUsageInfo.OPENHUMAN_PROMPT_DUMP_DIR. When set, every chat completion writes the outgoingNativeChatRequestand the aggregatedApiChatResponse(usage + cached_tokens + openhuman meta) to paired timestamped JSON files. Covers streaming and non-streaming paths..mdfallback infind_latest_transcriptandread_transcriptso pre-migration sessions resume cleanly for one release.Problem
Session transcripts were stored as HTML-comment-wrapped
.mdfiles, mixing structured state (messages, metadata, usage) with human-readable rendering. Diffing across turns to debug KV-cache instability was painful because escaping edge cases and prose formatting sat next to the actual prompt bytes. Separately, cache-hit accounting coming back from the backend (prompt_tokens_details.cached_tokens,openhumanmeta) was only visible in logs — nothing persisted the wire payload for post-hoc diffing.Solution
session_raw/, markdown undersessions/. Structured diffs againstsession_raw/*.jsonl; readable review againstsessions/*.md.turn.rs). Usage attributes to the last assistant message in the persisted snapshot.seqprefix so pairs sort together.Verified end-to-end against a real workspace: 5 turns across two threads, cache hits confirmed (
prompt_cached_tokens: 3696consistently from the shared global prefix). Diff between consecutive JSONL transcripts shows byte-identical prefixes with only the assistant-reply and new-user-message appended — the KV-cache-stable shape.Submission Checklist
cargo test -p openhuman --lib transcript), covering JSONL round-trip with usage, md-companion rendering, legacy-md fallback,session_raw/sessionspath split, index-collision avoidance across dirs, and forward-compat on unknown JSONL fieldsmd_companion_pathcomponent rewrite, whylast_provider_messagesgets mirrored at the terminal branch)//!doc ontranscript.rsdocuments the JSONL schema, storage layout, and the.mdcompanion contractImpact
sessions/*.mdfiles remain readable via the legacy-.mdfallback for one release; new sessions write JSONL undersession_raw/.next_indexscans both locations so indices don't collide.OPENHUMAN_PROMPT_DUMP_DIRis unset — the dump helper reads the env var once per call and short-circuits.write_transcriptsignature gained alast_assistant_turn_usage: Option<&TurnUsage>parameter; in-tree call sites updated (turn.rs,subagent_runner.rs).Related
openhumancustom meta block in responses is stillnullon the wire — if cache-hit accounting should also surface via the custom block (not justprompt_tokens_details.cached_tokens), that's a separate backend change.Summary by CodeRabbit
New Features
Improvements