Skip to content

feat(cache): wire-level prompt/response dumps + JSONL into session_raw/#640

Merged
senamakel merged 5 commits into
tinyhumansai:mainfrom
senamakel:feat/caching-2
Apr 17, 2026
Merged

feat(cache): wire-level prompt/response dumps + JSONL into session_raw/#640
senamakel merged 5 commits into
tinyhumansai:mainfrom
senamakel:feat/caching-2

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 17, 2026

Summary

  • JSONL as structured source-of-truth for session transcripts — each turn persists the exact provider-facing Vec<ChatMessage> as {workspace}/session_raw/DDMMYYYY/{agent}_{index}.jsonl, with a _meta header line carrying cumulative tokens, cache boundary, and charged USD.
  • Human-readable .md companion re-rendered on every write under {workspace}/sessions/DDMMYYYY/{agent}_{index}.md (humans only — never read back).
  • Per-message usage attribution: the last assistant line on each turn carries model, usage {input, output, cached_input, cost_usd}, and ts pulled from the provider's UsageInfo.
  • Final assistant reply is now captured in the transcript snapshot — previously the no-tool-call terminal branch persisted the pre-call prompt, dropping the reply.
  • Wire-level prompt + response dumps gated by OPENHUMAN_PROMPT_DUMP_DIR. When set, every chat completion writes the outgoing NativeChatRequest and the aggregated ApiChatResponse (usage + cached_tokens + openhuman meta) to paired timestamped JSON files. Covers streaming and non-streaming paths.
  • Legacy .md fallback in find_latest_transcript and read_transcript so pre-migration sessions resume cleanly for one release.

Problem

Session transcripts were stored as HTML-comment-wrapped .md files, mixing structured state (messages, metadata, usage) with human-readable rendering. Diffing across turns to debug KV-cache instability was painful because escaping edge cases and prose formatting sat next to the actual prompt bytes. Separately, cache-hit accounting coming back from the backend (prompt_tokens_details.cached_tokens, openhuman meta) was only visible in logs — nothing persisted the wire payload for post-hoc diffing.

Solution

  1. Split source-of-truth from readable view. JSONL under session_raw/, markdown under sessions/. Structured diffs against session_raw/*.jsonl; readable review against sessions/*.md.
  2. Attach per-message usage at the exact site the response arrives (turn.rs). Usage attributes to the last assistant message in the persisted snapshot.
  3. Dump at the HTTP layer, not the dispatcher layer — catches the exact bytes the backend sees. Request and response share a seq prefix so pairs sort together.

Verified end-to-end against a real workspace: 5 turns across two threads, cache hits confirmed (prompt_cached_tokens: 3696 consistently from the shared global prefix). Diff between consecutive JSONL transcripts shows byte-identical prefixes with only the assistant-reply and new-user-message appended — the KV-cache-stable shape.

Submission Checklist

  • Unit tests — 17 transcript tests pass (cargo test -p openhuman --lib transcript), covering JSONL round-trip with usage, md-companion rendering, legacy-md fallback, session_raw/sessions path split, index-collision avoidance across dirs, and forward-compat on unknown JSONL fields
  • Inline comments — on the non-obvious bits (escape/legacy fallback rationale, md_companion_path component rewrite, why last_provider_messages gets mirrored at the terminal branch)
  • Doc comments — module-level //! doc on transcript.rs documents the JSONL schema, storage layout, and the .md companion contract
  • E2E / integration — not added; KV-cache stability is hard to assert deterministically in E2E. Manual validation done against the real backend (see Summary).

Impact

  • Storage migration: existing sessions/*.md files remain readable via the legacy-.md fallback for one release; new sessions write JSONL under session_raw/. next_index scans both locations so indices don't collide.
  • No runtime overhead when OPENHUMAN_PROMPT_DUMP_DIR is unset — the dump helper reads the env var once per call and short-circuits.
  • Platform: desktop/CLI (core sidecar). No UI or Tauri changes.
  • Compatibility: public write_transcript signature gained a last_assistant_turn_usage: Option<&TurnUsage> parameter; in-tree call sites updated (turn.rs, subagent_runner.rs).

Related

  • Issue(s): n/a
  • Follow-up PR(s)/TODOs:
    • The openhuman custom meta block in responses is still null on the wire — if cache-hit accounting should also surface via the custom block (not just prompt_tokens_details.cached_tokens), that's a separate backend change.
    • The cached-token count observed in validation was pinned at the global prefix size across turns; extending the cache boundary to cover the per-conversation prefix is a follow-up.

Summary by CodeRabbit

  • New Features

    • Optional debug logging support to capture provider prompts and responses via environment configuration.
    • Enhanced per-turn usage tracking with model identification, token counts, and cost details with timestamps.
  • Improvements

    • Upgraded transcript persistence to JSONL format for improved reliability and data integrity.

…n usage tracking

- Updated the `write_transcript` function to accept an optional `last_assistant_turn_usage` parameter, allowing for the inclusion of per-message token and cost figures in the JSONL output.
- Modified the `persist_session_transcript` method to capture and pass the last turn's usage data to the transcript writer, ensuring accurate tracking of resource usage.
- Improved documentation to clarify the new functionality and its impact on transcript persistence.

This change enhances the fidelity of session transcripts by associating usage metrics with the last assistant message, improving overall transparency in resource consumption.
…ed on file extension

- Renamed `read_transcript` function to clarify its purpose and updated its logic to first check the file extension.
- Added handling for legacy `.md` files, ensuring they are processed with the appropriate parser.
- Enhanced fallback mechanism to check for the existence of a `.md` sibling file if the primary JSONL file is not found.

This change improves the robustness of transcript reading by accommodating legacy formats while maintaining compatibility with new JSONL files.
- Added functionality to capture the final assistant reply in the transcript snapshot, ensuring that both the prompt and response are persisted in the JSONL output.
- This enhancement improves the completeness of session transcripts by including the assistant's final message alongside the user prompts.

This change enhances the fidelity of session transcripts, providing a more comprehensive record of interactions.
…ion_raw/

Two related changes for KV-cache debugging and on-disk clarity:

1. Wire-level dump (`providers/compatible.rs`)
   When OPENHUMAN_PROMPT_DUMP_DIR is set, every chat completion writes
   the outgoing NativeChatRequest body and the aggregated ApiChatResponse
   to paired timestamped JSON files. Covers both the non-streaming and
   SSE streaming paths. Unset → zero overhead, no behavior change.

   Why: the prompt-level transcript captures what the dispatcher
   assembled, but the actual wire bytes (including
   prompt_tokens_details.cached_tokens and the openhuman meta block)
   only surface at the HTTP layer. Pairing request + response per turn
   lets us diff consecutive turns to confirm cache-hit accounting and
   diagnose cache-miss causes.

2. JSONL source-of-truth moves to session_raw/ (`agent/harness/session/transcript.rs`)
   JSONL transcript files now live under workspace/session_raw/DDMMYYYY/
   instead of workspace/sessions/. The human-readable .md companion
   stays in workspace/sessions/DDMMYYYY/ so the structured source-of-
   truth and the readable debug view are cleanly separated on disk.

   - resolve_new_transcript_path returns session_raw/ paths and
     advances the index past both session_raw/*.jsonl and legacy
     sessions/*.md so indices stay unique across the migration.
   - find_latest_transcript prefers session_raw/ and falls back to the
     legacy sessions/ dir for pre-migration .md files (one-release compat).
   - md_companion_path derives the sessions/ md path from the
     session_raw/ jsonl path via path-component rewrite, with a
     sibling-file fallback for tests that use flat tempdirs.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Warning

Rate limit exceeded

@senamakel has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 47 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 47 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e95de773-fa00-4025-b716-777eab00ce59

📥 Commits

Reviewing files that changed from the base of the PR and between 24a9a60 and 5850391.

📒 Files selected for processing (3)
  • src/openhuman/agent/harness/session/transcript.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/providers/compatible.rs
📝 Walkthrough

Walkthrough

This PR refactors session transcript persistence from Markdown format to JSONL, adds per-turn provider usage tracking with timestamps, and introduces optional prompt/response dump diagnostics. It updates transcript read/write signatures, adds TurnUsage and MessageUsage types, maintains legacy migration support, and extends the provider to optionally capture and log raw API exchanges.

Changes

Cohort / File(s) Summary
Transcript format migration
src/openhuman/agent/harness/session/transcript.rs, src/openhuman/agent/harness/session/turn.rs, src/openhuman/agent/harness/subagent_runner.rs
Replaced .md HTML-comment format with JSONL as the persistence source of truth. Added MessageUsage and TurnUsage types for token/cost tracking. Updated write_transcript signature (pathjsonl_path, added last_assistant_turn_usage parameter). Implemented dual-format read logic (JSONL primary, fallback to legacy .md). Path discovery/resolution logic updated to handle new session_raw/{DDMMYYYY}/ layout. Turn-level usage capture integrated into transcript write calls.
Provider dump diagnostics
src/openhuman/providers/compatible.rs
Added optional prompt/response logging when OPENHUMAN_PROMPT_DUMP_DIR is set, with global PROMPT_DUMP_SEQ sequencing counter. Updated stream_native_chat with dump_seq parameter and modified both streaming and non-streaming paths to capture raw response bytes, dump them as pretty-printed JSON (or raw bytes if unparseable), then deserialize. Dump functions emit timestamped .json and .response.json files.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Turn
    participant Transcript as Transcript Handler
    participant Persist as File System
    
    Note over Agent,Persist: Turn Completion & Persistence
    Agent->>Turn: execute_turn(..., messages)
    Turn->>Turn: capture usage from<br/>provider response
    Turn->>Turn: create TurnUsage<br/>(model, tokens, cost, ts)
    
    Turn->>Agent: return with final<br/>assistant message
    Agent->>Agent: persist_session_transcript<br/>(..., turn_usage)
    
    Note over Agent,Persist: JSONL Write with Metadata
    Agent->>Transcript: write_transcript<br/>(jsonl_path, messages,<br/>last_assistant_turn_usage)
    Transcript->>Persist: write _meta line<br/>(session config)
    Transcript->>Persist: write message lines<br/>(role, content JSON)
    Transcript->>Persist: attach TurnUsage<br/>to last assistant<br/>message
    Transcript->>Persist: re-render companion<br/>.md file
    Transcript->>Agent: success
    
    Note over Agent,Persist: Legacy Fallback Path
    alt JSONL exists
        Transcript->>Persist: read from .jsonl
    else JSONL missing
        Transcript->>Persist: read from legacy .md
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 From markdowns old to JSONL's grace,
Each message finds its rightful place,
With turn usage and timestamps true,
Transcripts migrate—the old, the new!
Diagnostics logged for all to see,
What finer code could ever be? 📝

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: JSONL migration for transcripts and wire-level prompt/response dump feature. Both are significant architectural changes reflected in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/agent/harness/session/turn.rs (1)

406-425: ⚠️ Potential issue | 🟠 Major

Clear last_turn_usage when a response omits usage.

If iteration N-1 had resp.usage = Some(...) and the final iteration returns usage = None, this variable keeps the previous iteration’s numbers and persist_session_transcript() will attach them to the final assistant message. That misattributes token/cost data in the JSONL source of truth.

Proposed fix
-                        if let Some(ref usage) = resp.usage {
+                        last_turn_usage = None;
+                        if let Some(ref usage) = resp.usage {
                             self.context.record_usage(usage);
                             cumulative_input_tokens += usage.input_tokens;
                             cumulative_output_tokens += usage.output_tokens;
                             cumulative_cached_input_tokens += usage.cached_input_tokens;
                             cumulative_charged_usd += usage.charged_amount_usd;
                             // Snapshot this turn's usage so the transcript
                             // writer can attribute it to the last assistant
                             // message.
                             last_turn_usage = Some(transcript::TurnUsage {
                                 model: effective_model.clone(),
                                 usage: transcript::MessageUsage {
                                     input: usage.input_tokens,
                                     output: usage.output_tokens,
                                     cached_input: usage.cached_input_tokens,
                                     cost_usd: usage.charged_amount_usd,
                                 },
                                 ts: chrono::Utc::now().to_rfc3339(),
                             });
                         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/turn.rs` around lines 406 - 425, The
problem is that last_turn_usage retains the previous iteration's values when
resp.usage is None, causing misattribution in persist_session_transcript();
update the handling around the usage check so that when resp.usage is Some(...)
you perform context.record_usage(...) and set last_turn_usage as now, and when
resp.usage is None you explicitly clear last_turn_usage = None (and avoid
touching cumulative_* only when usage exists); modify the block that currently
matches resp.usage (the if let Some(ref usage) { ... } handling around
record_usage and last_turn_usage) to add an else branch that clears
last_turn_usage to ensure the final assistant message has no stale usage
attached.
🧹 Nitpick comments (3)
src/openhuman/providers/compatible.rs (1)

71-75: Demote prompt-dump success logs to debug or trace.

These are opt-in diagnostics behind OPENHUMAN_PROMPT_DUMP_DIR, so emitting one info log per request and per response will get noisy fast during dump sessions.

As per coding guidelines src/**/*.rs: "Use log/tracing at debug or trace level for development-oriented diagnostics in Rust."

Also applies to: 117-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/providers/compatible.rs` around lines 71 - 75, The prompt-dump
success logs currently use log::info! and should be lowered to a
development-level log; replace the log::info! calls that print "[prompt_dump]
wrote response {} bytes -> {}" (the call that passes payload.len() and
path.display()) with log::debug! (or log::trace! if you prefer more verbosity),
and do the same for the other occurrence mentioned (the similar info log around
lines 117-121) so both dump success messages are emitted at debug/trace level
instead of info.
src/openhuman/agent/harness/session/transcript.rs (2)

177-183: Nit: rposition reads cleaner here.

♻️ Proposed refactor
-    let last_assistant_idx = messages.iter().enumerate().rev().find_map(|(i, m)| {
-        if m.role == "assistant" {
-            Some(i)
-        } else {
-            None
-        }
-    });
+    let last_assistant_idx = messages.iter().rposition(|m| m.role == "assistant");

While you're in here, the last_assistant_turn_usage.unwrap() on line 190 can also be collapsed by pattern-matching on both Options together (e.g. the same if let (Some(idx), Some(tu)) = ... shape you already use on line 228), which removes the guard + unwrap duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/transcript.rs` around lines 177 - 183,
Replace the manual reverse-enumeration using
messages.iter().enumerate().rev().find_map(...) with the clearer rposition call
to compute last_assistant_idx (e.g. messages.rposition(|m| m.role ==
"assistant") conceptually) and then collapse the separate guard + unwrap by
pattern-matching both Option values together (e.g. if let
(Some(last_assistant_idx), Some(last_assistant_turn_usage)) = (...)) so you
remove the explicit unwrap and duplicate guard; update references to the
variables last_assistant_idx and last_assistant_turn_usage accordingly in the
surrounding logic.

301-336: Prefer a positional rule over substring matching for the _meta line.

Using line.contains("\"_meta\"") as a classifier works today because JSON escapes any "_meta" inside content (it becomes \"_meta\"), but it is still a fragile heuristic:

  • If a future schema bump ever puts a _meta-named key into a message line (e.g. via the _extra flatten map populated by someone hand-editing or by a forward-compat writer), that line will be mis-routed to the meta parser, the parse will fail, and a real message will be silently dropped with only a warning.
  • It couples the parser to an incidental string pattern instead of the documented invariant ("meta is always line 1") stated in the module doc.

Since the writer always emits the meta line as the first non-empty line, consider enforcing that positionally: parse the first non-empty line strictly as MetaLine (error on failure), then parse every subsequent non-empty line strictly as MessageLine. That also lets you drop the if meta.is_none() { return Err(...) } re-entry guard.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/transcript.rs` around lines 301 - 336,
The current code uses line.contains("\"_meta\"") to detect meta lines; change
this to a positional rule: treat the first non-empty line encountered as the
MetaLine (attempt serde_json::from_str::<MetaLine>(line) and return an error if
it fails) and only parse all subsequent non-empty lines as MessageLine; remove
the substring check, the meta.is_none() re-entry guard and the special-case
warning that drops the first message, and update handling around the meta
variable (where MetaLine and TranscriptMeta are constructed) to occur only once
when meta is still None and for the first non-empty line, using path.display()
and line_no for contextual errors as currently done.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/harness/session/transcript.rs`:
- Around line 232-247: The companion .md write currently runs
fs::create_dir_all(...) and fs::write(...) in write_transcript (uses
md_companion_path and render_markdown) and returns Err on failure; change that
so any error from creating the companion directory or writing the markdown is
caught and logged with log::warn! (including error details and
md_path.display()) and does NOT propagate the error — after logging continue to
return Ok(()) because the JSONL flush is the source of truth; keep the existing
debug log on success.

In `@src/openhuman/providers/compatible.rs`:
- Around line 79-128: The sequence reservation must be atomic: replace the
non-atomic peek_dump_seq() with a reservation that increments PROMPT_DUMP_SEQ in
one step and use that reserved value when writing dumps. Concretely, change
peek_dump_seq() to perform PROMPT_DUMP_SEQ.fetch_add(1, Ordering::Relaxed)
(rename to reserve_dump_seq() or similar) and update call sites that currently
call peek_dump_seq() before dump_prompt_if_enabled() so they call the new
reservation function and then pass or correlate that reserved seq with
dump_prompt_if_enabled(); ensure PROMPT_DUMP_SEQ is the single source of truth
for sequence allocation.

---

Outside diff comments:
In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 406-425: The problem is that last_turn_usage retains the previous
iteration's values when resp.usage is None, causing misattribution in
persist_session_transcript(); update the handling around the usage check so that
when resp.usage is Some(...) you perform context.record_usage(...) and set
last_turn_usage as now, and when resp.usage is None you explicitly clear
last_turn_usage = None (and avoid touching cumulative_* only when usage exists);
modify the block that currently matches resp.usage (the if let Some(ref usage) {
... } handling around record_usage and last_turn_usage) to add an else branch
that clears last_turn_usage to ensure the final assistant message has no stale
usage attached.

---

Nitpick comments:
In `@src/openhuman/agent/harness/session/transcript.rs`:
- Around line 177-183: Replace the manual reverse-enumeration using
messages.iter().enumerate().rev().find_map(...) with the clearer rposition call
to compute last_assistant_idx (e.g. messages.rposition(|m| m.role ==
"assistant") conceptually) and then collapse the separate guard + unwrap by
pattern-matching both Option values together (e.g. if let
(Some(last_assistant_idx), Some(last_assistant_turn_usage)) = (...)) so you
remove the explicit unwrap and duplicate guard; update references to the
variables last_assistant_idx and last_assistant_turn_usage accordingly in the
surrounding logic.
- Around line 301-336: The current code uses line.contains("\"_meta\"") to
detect meta lines; change this to a positional rule: treat the first non-empty
line encountered as the MetaLine (attempt serde_json::from_str::<MetaLine>(line)
and return an error if it fails) and only parse all subsequent non-empty lines
as MessageLine; remove the substring check, the meta.is_none() re-entry guard
and the special-case warning that drops the first message, and update handling
around the meta variable (where MetaLine and TranscriptMeta are constructed) to
occur only once when meta is still None and for the first non-empty line, using
path.display() and line_no for contextual errors as currently done.

In `@src/openhuman/providers/compatible.rs`:
- Around line 71-75: The prompt-dump success logs currently use log::info! and
should be lowered to a development-level log; replace the log::info! calls that
print "[prompt_dump] wrote response {} bytes -> {}" (the call that passes
payload.len() and path.display()) with log::debug! (or log::trace! if you prefer
more verbosity), and do the same for the other occurrence mentioned (the similar
info log around lines 117-121) so both dump success messages are emitted at
debug/trace level instead of info.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2d53fd2d-4d2a-4817-8da9-17013b1078c6

📥 Commits

Reviewing files that changed from the base of the PR and between c4ef143 and 24a9a60.

📒 Files selected for processing (4)
  • src/openhuman/agent/harness/session/transcript.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/subagent_runner.rs
  • src/openhuman/providers/compatible.rs

Comment thread src/openhuman/agent/harness/session/transcript.rs
Comment thread src/openhuman/providers/compatible.rs Outdated
- transcript: .md companion write is now best-effort — failures are
  logged with log::warn! and Ok(()) is returned. The JSONL above is
  the source of truth; a readable-log hiccup must not take down state
  persistence.
- compatible: replace peek_dump_seq() (non-atomic) with
  reserve_dump_seq() that fetch_adds PROMPT_DUMP_SEQ in one step and
  returns the reserved value. dump_prompt_if_enabled now takes seq as
  a parameter, so request/response files cannot be misaligned under
  concurrent requests.
- turn: explicitly clear last_turn_usage when resp.usage is None so
  the final assistant message can't inherit a prior iteration's stale
  numbers during a tool-using turn.
- transcript (nit): last-assistant lookup uses rposition; the emit
  loop pattern-matches (Option, Option) together so there's no
  separate unwrap.
- transcript (nit): JSONL meta parsing is positional — first non-empty
  line must be _meta or we error out. Replaces the substring-based
  heuristic that could false-positive on message content.
- compatible (nit): lower prompt/response dump success logs from info
  to debug — they fire on every turn when the env var is set.
@senamakel senamakel merged commit 1569444 into tinyhumansai:main Apr 17, 2026
8 checks passed
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
…w/ (tinyhumansai#640)

* feat(transcript): enhance session transcript persistence with per-turn usage tracking

- Updated the `write_transcript` function to accept an optional `last_assistant_turn_usage` parameter, allowing for the inclusion of per-message token and cost figures in the JSONL output.
- Modified the `persist_session_transcript` method to capture and pass the last turn's usage data to the transcript writer, ensuring accurate tracking of resource usage.
- Improved documentation to clarify the new functionality and its impact on transcript persistence.

This change enhances the fidelity of session transcripts by associating usage metrics with the last assistant message, improving overall transparency in resource consumption.

* refactor(transcript): improve transcript reading logic by routing based on file extension

- Renamed `read_transcript` function to clarify its purpose and updated its logic to first check the file extension.
- Added handling for legacy `.md` files, ensuring they are processed with the appropriate parser.
- Enhanced fallback mechanism to check for the existence of a `.md` sibling file if the primary JSONL file is not found.

This change improves the robustness of transcript reading by accommodating legacy formats while maintaining compatibility with new JSONL files.

* feat(transcript): mirror final assistant reply in session transcript

- Added functionality to capture the final assistant reply in the transcript snapshot, ensuring that both the prompt and response are persisted in the JSONL output.
- This enhancement improves the completeness of session transcripts by including the assistant's final message alongside the user prompts.

This change enhances the fidelity of session transcripts, providing a more comprehensive record of interactions.

* feat(cache): wire-level prompt/response dumps + split JSONL into session_raw/

Two related changes for KV-cache debugging and on-disk clarity:

1. Wire-level dump (`providers/compatible.rs`)
   When OPENHUMAN_PROMPT_DUMP_DIR is set, every chat completion writes
   the outgoing NativeChatRequest body and the aggregated ApiChatResponse
   to paired timestamped JSON files. Covers both the non-streaming and
   SSE streaming paths. Unset → zero overhead, no behavior change.

   Why: the prompt-level transcript captures what the dispatcher
   assembled, but the actual wire bytes (including
   prompt_tokens_details.cached_tokens and the openhuman meta block)
   only surface at the HTTP layer. Pairing request + response per turn
   lets us diff consecutive turns to confirm cache-hit accounting and
   diagnose cache-miss causes.

2. JSONL source-of-truth moves to session_raw/ (`agent/harness/session/transcript.rs`)
   JSONL transcript files now live under workspace/session_raw/DDMMYYYY/
   instead of workspace/sessions/. The human-readable .md companion
   stays in workspace/sessions/DDMMYYYY/ so the structured source-of-
   truth and the readable debug view are cleanly separated on disk.

   - resolve_new_transcript_path returns session_raw/ paths and
     advances the index past both session_raw/*.jsonl and legacy
     sessions/*.md so indices stay unique across the migration.
   - find_latest_transcript prefers session_raw/ and falls back to the
     legacy sessions/ dir for pre-migration .md files (one-release compat).
   - md_companion_path derives the sessions/ md path from the
     session_raw/ jsonl path via path-component rewrite, with a
     sibling-file fallback for tests that use flat tempdirs.

* fix(cache,transcript): address PR review findings

- transcript: .md companion write is now best-effort — failures are
  logged with log::warn! and Ok(()) is returned. The JSONL above is
  the source of truth; a readable-log hiccup must not take down state
  persistence.
- compatible: replace peek_dump_seq() (non-atomic) with
  reserve_dump_seq() that fetch_adds PROMPT_DUMP_SEQ in one step and
  returns the reserved value. dump_prompt_if_enabled now takes seq as
  a parameter, so request/response files cannot be misaligned under
  concurrent requests.
- turn: explicitly clear last_turn_usage when resp.usage is None so
  the final assistant message can't inherit a prior iteration's stale
  numbers during a tool-using turn.
- transcript (nit): last-assistant lookup uses rposition; the emit
  loop pattern-matches (Option, Option) together so there's no
  separate unwrap.
- transcript (nit): JSONL meta parsing is positional — first non-empty
  line must be _meta or we error out. Replaces the substring-based
  heuristic that could false-positive on message content.
- compatible (nit): lower prompt/response dump success logs from info
  to debug — they fire on every turn when the env var is set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant