feat(cache): wire-level prompt/response dumps + JSONL into session_raw/ by senamakel · Pull Request #640 · tinyhumansai/openhuman

senamakel · 2026-04-17T19:50:24Z

Summary

JSONL as structured source-of-truth for session transcripts — each turn persists the exact provider-facing Vec<ChatMessage> as {workspace}/session_raw/DDMMYYYY/{agent}_{index}.jsonl, with a _meta header line carrying cumulative tokens, cache boundary, and charged USD.
Human-readable .md companion re-rendered on every write under {workspace}/sessions/DDMMYYYY/{agent}_{index}.md (humans only — never read back).
Per-message usage attribution: the last assistant line on each turn carries model, usage {input, output, cached_input, cost_usd}, and ts pulled from the provider's UsageInfo.
Final assistant reply is now captured in the transcript snapshot — previously the no-tool-call terminal branch persisted the pre-call prompt, dropping the reply.
Wire-level prompt + response dumps gated by OPENHUMAN_PROMPT_DUMP_DIR. When set, every chat completion writes the outgoing NativeChatRequest and the aggregated ApiChatResponse (usage + cached_tokens + openhuman meta) to paired timestamped JSON files. Covers streaming and non-streaming paths.
Legacy .md fallback in find_latest_transcript and read_transcript so pre-migration sessions resume cleanly for one release.

Problem

Session transcripts were stored as HTML-comment-wrapped .md files, mixing structured state (messages, metadata, usage) with human-readable rendering. Diffing across turns to debug KV-cache instability was painful because escaping edge cases and prose formatting sat next to the actual prompt bytes. Separately, cache-hit accounting coming back from the backend (prompt_tokens_details.cached_tokens, openhuman meta) was only visible in logs — nothing persisted the wire payload for post-hoc diffing.

Solution

Split source-of-truth from readable view. JSONL under session_raw/, markdown under sessions/. Structured diffs against session_raw/*.jsonl; readable review against sessions/*.md.
Attach per-message usage at the exact site the response arrives (turn.rs). Usage attributes to the last assistant message in the persisted snapshot.
Dump at the HTTP layer, not the dispatcher layer — catches the exact bytes the backend sees. Request and response share a seq prefix so pairs sort together.

Verified end-to-end against a real workspace: 5 turns across two threads, cache hits confirmed (prompt_cached_tokens: 3696 consistently from the shared global prefix). Diff between consecutive JSONL transcripts shows byte-identical prefixes with only the assistant-reply and new-user-message appended — the KV-cache-stable shape.

Submission Checklist

Unit tests — 17 transcript tests pass (cargo test -p openhuman --lib transcript), covering JSONL round-trip with usage, md-companion rendering, legacy-md fallback, session_raw/sessions path split, index-collision avoidance across dirs, and forward-compat on unknown JSONL fields
Inline comments — on the non-obvious bits (escape/legacy fallback rationale, md_companion_path component rewrite, why last_provider_messages gets mirrored at the terminal branch)
Doc comments — module-level //! doc on transcript.rs documents the JSONL schema, storage layout, and the .md companion contract
E2E / integration — not added; KV-cache stability is hard to assert deterministically in E2E. Manual validation done against the real backend (see Summary).

Impact

Storage migration: existing sessions/*.md files remain readable via the legacy-.md fallback for one release; new sessions write JSONL under session_raw/. next_index scans both locations so indices don't collide.
No runtime overhead when OPENHUMAN_PROMPT_DUMP_DIR is unset — the dump helper reads the env var once per call and short-circuits.
Platform: desktop/CLI (core sidecar). No UI or Tauri changes.
Compatibility: public write_transcript signature gained a last_assistant_turn_usage: Option<&TurnUsage> parameter; in-tree call sites updated (turn.rs, subagent_runner.rs).

Issue(s): n/a
Follow-up PR(s)/TODOs:
- The openhuman custom meta block in responses is still null on the wire — if cache-hit accounting should also surface via the custom block (not just prompt_tokens_details.cached_tokens), that's a separate backend change.
- The cached-token count observed in validation was pinned at the global prefix size across turns; extending the cache boundary to cover the per-conversation prefix is a follow-up.

Summary by CodeRabbit

New Features
- Optional debug logging support to capture provider prompts and responses via environment configuration.
- Enhanced per-turn usage tracking with model identification, token counts, and cost details with timestamps.
Improvements
- Upgraded transcript persistence to JSONL format for improved reliability and data integrity.

…n usage tracking - Updated the `write_transcript` function to accept an optional `last_assistant_turn_usage` parameter, allowing for the inclusion of per-message token and cost figures in the JSONL output. - Modified the `persist_session_transcript` method to capture and pass the last turn's usage data to the transcript writer, ensuring accurate tracking of resource usage. - Improved documentation to clarify the new functionality and its impact on transcript persistence. This change enhances the fidelity of session transcripts by associating usage metrics with the last assistant message, improving overall transparency in resource consumption.

…ed on file extension - Renamed `read_transcript` function to clarify its purpose and updated its logic to first check the file extension. - Added handling for legacy `.md` files, ensuring they are processed with the appropriate parser. - Enhanced fallback mechanism to check for the existence of a `.md` sibling file if the primary JSONL file is not found. This change improves the robustness of transcript reading by accommodating legacy formats while maintaining compatibility with new JSONL files.

- Added functionality to capture the final assistant reply in the transcript snapshot, ensuring that both the prompt and response are persisted in the JSONL output. - This enhancement improves the completeness of session transcripts by including the assistant's final message alongside the user prompts. This change enhances the fidelity of session transcripts, providing a more comprehensive record of interactions.

…ion_raw/ Two related changes for KV-cache debugging and on-disk clarity: 1. Wire-level dump (`providers/compatible.rs`) When OPENHUMAN_PROMPT_DUMP_DIR is set, every chat completion writes the outgoing NativeChatRequest body and the aggregated ApiChatResponse to paired timestamped JSON files. Covers both the non-streaming and SSE streaming paths. Unset → zero overhead, no behavior change. Why: the prompt-level transcript captures what the dispatcher assembled, but the actual wire bytes (including prompt_tokens_details.cached_tokens and the openhuman meta block) only surface at the HTTP layer. Pairing request + response per turn lets us diff consecutive turns to confirm cache-hit accounting and diagnose cache-miss causes. 2. JSONL source-of-truth moves to session_raw/ (`agent/harness/session/transcript.rs`) JSONL transcript files now live under workspace/session_raw/DDMMYYYY/ instead of workspace/sessions/. The human-readable .md companion stays in workspace/sessions/DDMMYYYY/ so the structured source-of- truth and the readable debug view are cleanly separated on disk. - resolve_new_transcript_path returns session_raw/ paths and advances the index past both session_raw/*.jsonl and legacy sessions/*.md so indices stay unique across the migration. - find_latest_transcript prefers session_raw/ and falls back to the legacy sessions/ dir for pre-migration .md files (one-release compat). - md_companion_path derives the sessions/ md path from the session_raw/ jsonl path via path-component rewrite, with a sibling-file fallback for tests that use flat tempdirs.

coderabbitai · 2026-04-17T19:50:42Z

Warning

Rate limit exceeded

@senamakel has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 47 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 47 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e95de773-fa00-4025-b716-777eab00ce59

📥 Commits

Reviewing files that changed from the base of the PR and between 24a9a60 and 5850391.

📒 Files selected for processing (3)

src/openhuman/agent/harness/session/transcript.rs
src/openhuman/agent/harness/session/turn.rs
src/openhuman/providers/compatible.rs

📝 Walkthrough

Walkthrough

This PR refactors session transcript persistence from Markdown format to JSONL, adds per-turn provider usage tracking with timestamps, and introduces optional prompt/response dump diagnostics. It updates transcript read/write signatures, adds TurnUsage and MessageUsage types, maintains legacy migration support, and extends the provider to optionally capture and log raw API exchanges.

Changes

Cohort / File(s)	Summary
Transcript format migration `src/openhuman/agent/harness/session/transcript.rs`, `src/openhuman/agent/harness/session/turn.rs`, `src/openhuman/agent/harness/subagent_runner.rs`	Replaced `.md` HTML-comment format with JSONL as the persistence source of truth. Added `MessageUsage` and `TurnUsage` types for token/cost tracking. Updated `write_transcript` signature (`path` → `jsonl_path`, added `last_assistant_turn_usage` parameter). Implemented dual-format read logic (JSONL primary, fallback to legacy `.md`). Path discovery/resolution logic updated to handle new `session_raw/{DDMMYYYY}/` layout. Turn-level usage capture integrated into transcript write calls.
Provider dump diagnostics `src/openhuman/providers/compatible.rs`	Added optional prompt/response logging when `OPENHUMAN_PROMPT_DUMP_DIR` is set, with global `PROMPT_DUMP_SEQ` sequencing counter. Updated `stream_native_chat` with `dump_seq` parameter and modified both streaming and non-streaming paths to capture raw response bytes, dump them as pretty-printed JSON (or raw bytes if unparseable), then deserialize. Dump functions emit timestamped `.json` and `.response.json` files.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Turn
    participant Transcript as Transcript Handler
    participant Persist as File System
    
    Note over Agent,Persist: Turn Completion & Persistence
    Agent->>Turn: execute_turn(..., messages)
    Turn->>Turn: capture usage from<br/>provider response
    Turn->>Turn: create TurnUsage<br/>(model, tokens, cost, ts)
    
    Turn->>Agent: return with final<br/>assistant message
    Agent->>Agent: persist_session_transcript<br/>(..., turn_usage)
    
    Note over Agent,Persist: JSONL Write with Metadata
    Agent->>Transcript: write_transcript<br/>(jsonl_path, messages,<br/>last_assistant_turn_usage)
    Transcript->>Persist: write _meta line<br/>(session config)
    Transcript->>Persist: write message lines<br/>(role, content JSON)
    Transcript->>Persist: attach TurnUsage<br/>to last assistant<br/>message
    Transcript->>Persist: re-render companion<br/>.md file
    Transcript->>Agent: success
    
    Note over Agent,Persist: Legacy Fallback Path
    alt JSONL exists
        Transcript->>Persist: read from .jsonl
    else JSONL missing
        Transcript->>Persist: read from legacy .md
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: session transcripts for LLM KV cache stability #512: Introduced the original markdown-based transcript persistence functions and path logic that this PR directly replaces with JSONL format while maintaining migration compatibility.

Poem

🐰 From markdowns old to JSONL's grace,
Each message finds its rightful place,
With turn usage and timestamps true,
Transcripts migrate—the old, the new!
Diagnostics logged for all to see,
What finer code could ever be? 📝

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: JSONL migration for transcripts and wire-level prompt/response dump feature. Both are significant architectural changes reflected in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/openhuman/agent/harness/session/turn.rs (1)

406-425: ⚠️ Potential issue | 🟠 Major

Clear last_turn_usage when a response omits usage.

If iteration N-1 had resp.usage = Some(...) and the final iteration returns usage = None, this variable keeps the previous iteration’s numbers and persist_session_transcript() will attach them to the final assistant message. That misattributes token/cost data in the JSONL source of truth.

Proposed fix

-                        if let Some(ref usage) = resp.usage {
+                        last_turn_usage = None;
+                        if let Some(ref usage) = resp.usage {
                             self.context.record_usage(usage);
                             cumulative_input_tokens += usage.input_tokens;
                             cumulative_output_tokens += usage.output_tokens;
                             cumulative_cached_input_tokens += usage.cached_input_tokens;
                             cumulative_charged_usd += usage.charged_amount_usd;
                             // Snapshot this turn's usage so the transcript
                             // writer can attribute it to the last assistant
                             // message.
                             last_turn_usage = Some(transcript::TurnUsage {
                                 model: effective_model.clone(),
                                 usage: transcript::MessageUsage {
                                     input: usage.input_tokens,
                                     output: usage.output_tokens,
                                     cached_input: usage.cached_input_tokens,
                                     cost_usd: usage.charged_amount_usd,
                                 },
                                 ts: chrono::Utc::now().to_rfc3339(),
                             });
                         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/turn.rs` around lines 406 - 425, The
problem is that last_turn_usage retains the previous iteration's values when
resp.usage is None, causing misattribution in persist_session_transcript();
update the handling around the usage check so that when resp.usage is Some(...)
you perform context.record_usage(...) and set last_turn_usage as now, and when
resp.usage is None you explicitly clear last_turn_usage = None (and avoid
touching cumulative_* only when usage exists); modify the block that currently
matches resp.usage (the if let Some(ref usage) { ... } handling around
record_usage and last_turn_usage) to add an else branch that clears
last_turn_usage to ensure the final assistant message has no stale usage
attached.

🧹 Nitpick comments (3)

src/openhuman/providers/compatible.rs (1)
71-75: Demote prompt-dump success logs to debug or trace.

These are opt-in diagnostics behind OPENHUMAN_PROMPT_DUMP_DIR, so emitting one info log per request and per response will get noisy fast during dump sessions.

As per coding guidelines src/**/*.rs: "Use log/tracing at debug or trace level for development-oriented diagnostics in Rust."

Also applies to: 117-121
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/providers/compatible.rs` around lines 71 - 75, The prompt-dump
success logs currently use log::info! and should be lowered to a
development-level log; replace the log::info! calls that print "[prompt_dump]
wrote response {} bytes -> {}" (the call that passes payload.len() and
path.display()) with log::debug! (or log::trace! if you prefer more verbosity),
and do the same for the other occurrence mentioned (the similar info log around
lines 117-121) so both dump success messages are emitted at debug/trace level
instead of info.
src/openhuman/agent/harness/session/transcript.rs (2)
177-183: Nit: rposition reads cleaner here.
♻️ Proposed refactor
-    let last_assistant_idx = messages.iter().enumerate().rev().find_map(|(i, m)| {
-        if m.role == "assistant" {
-            Some(i)
-        } else {
-            None
-        }
-    });
+    let last_assistant_idx = messages.iter().rposition(|m| m.role == "assistant");
While you're in here, the last_assistant_turn_usage.unwrap() on line 190 can also be collapsed by pattern-matching on both Options together (e.g. the same if let (Some(idx), Some(tu)) = ... shape you already use on line 228), which removes the guard + unwrap duplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/transcript.rs` around lines 177 - 183,
Replace the manual reverse-enumeration using
messages.iter().enumerate().rev().find_map(...) with the clearer rposition call
to compute last_assistant_idx (e.g. messages.rposition(|m| m.role ==
"assistant") conceptually) and then collapse the separate guard + unwrap by
pattern-matching both Option values together (e.g. if let
(Some(last_assistant_idx), Some(last_assistant_turn_usage)) = (...)) so you
remove the explicit unwrap and duplicate guard; update references to the
variables last_assistant_idx and last_assistant_turn_usage accordingly in the
surrounding logic.
301-336: Prefer a positional rule over substring matching for the _meta line.

Using line.contains("\"_meta\"") as a classifier works today because JSON escapes any "_meta" inside content (it becomes \"_meta\"), but it is still a fragile heuristic:

If a future schema bump ever puts a _meta-named key into a message line (e.g. via the _extra flatten map populated by someone hand-editing or by a forward-compat writer), that line will be mis-routed to the meta parser, the parse will fail, and a real message will be silently dropped with only a warning.

It couples the parser to an incidental string pattern instead of the documented invariant ("meta is always line 1") stated in the module doc.

Since the writer always emits the meta line as the first non-empty line, consider enforcing that positionally: parse the first non-empty line strictly as MetaLine (error on failure), then parse every subsequent non-empty line strictly as MessageLine. That also lets you drop the if meta.is_none() { return Err(...) } re-entry guard.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/transcript.rs` around lines 301 - 336,
The current code uses line.contains("\"_meta\"") to detect meta lines; change
this to a positional rule: treat the first non-empty line encountered as the
MetaLine (attempt serde_json::from_str::<MetaLine>(line) and return an error if
it fails) and only parse all subsequent non-empty lines as MessageLine; remove
the substring check, the meta.is_none() re-entry guard and the special-case
warning that drops the first message, and update handling around the meta
variable (where MetaLine and TranscriptMeta are constructed) to occur only once
when meta is still None and for the first non-empty line, using path.display()
and line_no for contextual errors as currently done.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/harness/session/transcript.rs`:
- Around line 232-247: The companion .md write currently runs
fs::create_dir_all(...) and fs::write(...) in write_transcript (uses
md_companion_path and render_markdown) and returns Err on failure; change that
so any error from creating the companion directory or writing the markdown is
caught and logged with log::warn! (including error details and
md_path.display()) and does NOT propagate the error — after logging continue to
return Ok(()) because the JSONL flush is the source of truth; keep the existing
debug log on success.

In `@src/openhuman/providers/compatible.rs`:
- Around line 79-128: The sequence reservation must be atomic: replace the
non-atomic peek_dump_seq() with a reservation that increments PROMPT_DUMP_SEQ in
one step and use that reserved value when writing dumps. Concretely, change
peek_dump_seq() to perform PROMPT_DUMP_SEQ.fetch_add(1, Ordering::Relaxed)
(rename to reserve_dump_seq() or similar) and update call sites that currently
call peek_dump_seq() before dump_prompt_if_enabled() so they call the new
reservation function and then pass or correlate that reserved seq with
dump_prompt_if_enabled(); ensure PROMPT_DUMP_SEQ is the single source of truth
for sequence allocation.

---

Outside diff comments:
In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 406-425: The problem is that last_turn_usage retains the previous
iteration's values when resp.usage is None, causing misattribution in
persist_session_transcript(); update the handling around the usage check so that
when resp.usage is Some(...) you perform context.record_usage(...) and set
last_turn_usage as now, and when resp.usage is None you explicitly clear
last_turn_usage = None (and avoid touching cumulative_* only when usage exists);
modify the block that currently matches resp.usage (the if let Some(ref usage) {
... } handling around record_usage and last_turn_usage) to add an else branch
that clears last_turn_usage to ensure the final assistant message has no stale
usage attached.

---

Nitpick comments:
In `@src/openhuman/agent/harness/session/transcript.rs`:
- Around line 177-183: Replace the manual reverse-enumeration using
messages.iter().enumerate().rev().find_map(...) with the clearer rposition call
to compute last_assistant_idx (e.g. messages.rposition(|m| m.role ==
"assistant") conceptually) and then collapse the separate guard + unwrap by
pattern-matching both Option values together (e.g. if let
(Some(last_assistant_idx), Some(last_assistant_turn_usage)) = (...)) so you
remove the explicit unwrap and duplicate guard; update references to the
variables last_assistant_idx and last_assistant_turn_usage accordingly in the
surrounding logic.
- Around line 301-336: The current code uses line.contains("\"_meta\"") to
detect meta lines; change this to a positional rule: treat the first non-empty
line encountered as the MetaLine (attempt serde_json::from_str::<MetaLine>(line)
and return an error if it fails) and only parse all subsequent non-empty lines
as MessageLine; remove the substring check, the meta.is_none() re-entry guard
and the special-case warning that drops the first message, and update handling
around the meta variable (where MetaLine and TranscriptMeta are constructed) to
occur only once when meta is still None and for the first non-empty line, using
path.display() and line_no for contextual errors as currently done.

In `@src/openhuman/providers/compatible.rs`:
- Around line 71-75: The prompt-dump success logs currently use log::info! and
should be lowered to a development-level log; replace the log::info! calls that
print "[prompt_dump] wrote response {} bytes -> {}" (the call that passes
payload.len() and path.display()) with log::debug! (or log::trace! if you prefer
more verbosity), and do the same for the other occurrence mentioned (the similar
info log around lines 117-121) so both dump success messages are emitted at
debug/trace level instead of info.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2d53fd2d-4d2a-4817-8da9-17013b1078c6

📥 Commits

Reviewing files that changed from the base of the PR and between c4ef143 and 24a9a60.

📒 Files selected for processing (4)

src/openhuman/agent/harness/session/transcript.rs
src/openhuman/agent/harness/session/turn.rs
src/openhuman/agent/harness/subagent_runner.rs
src/openhuman/providers/compatible.rs

- transcript: .md companion write is now best-effort — failures are logged with log::warn! and Ok(()) is returned. The JSONL above is the source of truth; a readable-log hiccup must not take down state persistence. - compatible: replace peek_dump_seq() (non-atomic) with reserve_dump_seq() that fetch_adds PROMPT_DUMP_SEQ in one step and returns the reserved value. dump_prompt_if_enabled now takes seq as a parameter, so request/response files cannot be misaligned under concurrent requests. - turn: explicitly clear last_turn_usage when resp.usage is None so the final assistant message can't inherit a prior iteration's stale numbers during a tool-using turn. - transcript (nit): last-assistant lookup uses rposition; the emit loop pattern-matches (Option, Option) together so there's no separate unwrap. - transcript (nit): JSONL meta parsing is positional — first non-empty line must be _meta or we error out. Replaces the substring-based heuristic that could false-positive on message content. - compatible (nit): lower prompt/response dump success logs from info to debug — they fire on every turn when the env var is set.

…w/ (tinyhumansai#640) * feat(transcript): enhance session transcript persistence with per-turn usage tracking - Updated the `write_transcript` function to accept an optional `last_assistant_turn_usage` parameter, allowing for the inclusion of per-message token and cost figures in the JSONL output. - Modified the `persist_session_transcript` method to capture and pass the last turn's usage data to the transcript writer, ensuring accurate tracking of resource usage. - Improved documentation to clarify the new functionality and its impact on transcript persistence. This change enhances the fidelity of session transcripts by associating usage metrics with the last assistant message, improving overall transparency in resource consumption. * refactor(transcript): improve transcript reading logic by routing based on file extension - Renamed `read_transcript` function to clarify its purpose and updated its logic to first check the file extension. - Added handling for legacy `.md` files, ensuring they are processed with the appropriate parser. - Enhanced fallback mechanism to check for the existence of a `.md` sibling file if the primary JSONL file is not found. This change improves the robustness of transcript reading by accommodating legacy formats while maintaining compatibility with new JSONL files. * feat(transcript): mirror final assistant reply in session transcript - Added functionality to capture the final assistant reply in the transcript snapshot, ensuring that both the prompt and response are persisted in the JSONL output. - This enhancement improves the completeness of session transcripts by including the assistant's final message alongside the user prompts. This change enhances the fidelity of session transcripts, providing a more comprehensive record of interactions. * feat(cache): wire-level prompt/response dumps + split JSONL into session_raw/ Two related changes for KV-cache debugging and on-disk clarity: 1. Wire-level dump (`providers/compatible.rs`) When OPENHUMAN_PROMPT_DUMP_DIR is set, every chat completion writes the outgoing NativeChatRequest body and the aggregated ApiChatResponse to paired timestamped JSON files. Covers both the non-streaming and SSE streaming paths. Unset → zero overhead, no behavior change. Why: the prompt-level transcript captures what the dispatcher assembled, but the actual wire bytes (including prompt_tokens_details.cached_tokens and the openhuman meta block) only surface at the HTTP layer. Pairing request + response per turn lets us diff consecutive turns to confirm cache-hit accounting and diagnose cache-miss causes. 2. JSONL source-of-truth moves to session_raw/ (`agent/harness/session/transcript.rs`) JSONL transcript files now live under workspace/session_raw/DDMMYYYY/ instead of workspace/sessions/. The human-readable .md companion stays in workspace/sessions/DDMMYYYY/ so the structured source-of- truth and the readable debug view are cleanly separated on disk. - resolve_new_transcript_path returns session_raw/ paths and advances the index past both session_raw/*.jsonl and legacy sessions/*.md so indices stay unique across the migration. - find_latest_transcript prefers session_raw/ and falls back to the legacy sessions/ dir for pre-migration .md files (one-release compat). - md_companion_path derives the sessions/ md path from the session_raw/ jsonl path via path-component rewrite, with a sibling-file fallback for tests that use flat tempdirs. * fix(cache,transcript): address PR review findings - transcript: .md companion write is now best-effort — failures are logged with log::warn! and Ok(()) is returned. The JSONL above is the source of truth; a readable-log hiccup must not take down state persistence. - compatible: replace peek_dump_seq() (non-atomic) with reserve_dump_seq() that fetch_adds PROMPT_DUMP_SEQ in one step and returns the reserved value. dump_prompt_if_enabled now takes seq as a parameter, so request/response files cannot be misaligned under concurrent requests. - turn: explicitly clear last_turn_usage when resp.usage is None so the final assistant message can't inherit a prior iteration's stale numbers during a tool-using turn. - transcript (nit): last-assistant lookup uses rposition; the emit loop pattern-matches (Option, Option) together so there's no separate unwrap. - transcript (nit): JSONL meta parsing is positional — first non-empty line must be _meta or we error out. Replaces the substring-based heuristic that could false-positive on message content. - compatible (nit): lower prompt/response dump success logs from info to debug — they fire on every turn when the env var is set.

senamakel added 4 commits April 17, 2026 10:52

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread src/openhuman/agent/harness/session/transcript.rs

Comment thread src/openhuman/providers/compatible.rs Outdated

senamakel merged commit 1569444 into tinyhumansai:main Apr 17, 2026
8 checks passed

coderabbitai Bot mentioned this pull request Apr 18, 2026

fix(prompts): v2 prompt pipeline, integrations_agent, and subagent session plumbing #642

Merged

5 tasks

This was referenced May 2, 2026

refactor(session): flatten session_raw/, switch md to YYYY_MM_DD #1098

Merged

feat(transcript): emit thread_id + fix orchestrator missing cost #1169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache): wire-level prompt/response dumps + JSONL into session_raw/#640

feat(cache): wire-level prompt/response dumps + JSONL into session_raw/#640
senamakel merged 5 commits into
tinyhumansai:mainfrom
senamakel:feat/caching-2

senamakel commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading