Skip to content

feat(memory): batch agent experience consolidation#2201

Open
huangruiteng wants to merge 20 commits into
volcengine:mainfrom
huangruiteng:feat/batch-experience-consolidation
Open

feat(memory): batch agent experience consolidation#2201
huangruiteng wants to merge 20 commits into
volcengine:mainfrom
huangruiteng:feat/batch-experience-consolidation

Conversation

@huangruiteng
Copy link
Copy Markdown
Contributor

@huangruiteng huangruiteng commented May 22, 2026

Summary

Adds an opt-in batch mode for agent experience consolidation after trajectory extraction, plus low-cardinality phase telemetry for agent-memory extraction.

Today agent memory extraction writes trajectories first, then runs one experience-consolidation LLM pass per newly written trajectory. That keeps the flow simple, but corpus preparation becomes slow when a session produces several trajectories. This PR keeps the existing per-trajectory behavior as the default and adds a bounded batch mode:

  • memory.agent_experience_consolidation_mode = "batch"
  • memory.agent_experience_batch_max_trajectories = 5

The quality target is parity with the existing experience granularity. Batch mode is not intended to force one experience per source trajectory.

Source Attribution Safety

Batch mode must not reuse the single-trajectory fallback, because that can attach an entire mixed batch to one experience card. This PR adds a temporary source_trajectory_ids field only for the batch extraction schema. The provider resolves it into concrete trajectory URIs before apply and strips the field so it is not persisted.

If a batch output omits source attribution for a written/edited experience, the system skips appending source trajectories instead of attaching the whole batch. A single experience may still cite multiple source trajectories; the attribution field is lineage, not a split instruction.

Defaults and Compatibility

Default behavior remains unchanged:

  • agent_experience_consolidation_mode defaults to per_trajectory
  • existing per-trajectory source fallback remains intact
  • batch mode is opt-in and bounded by agent_experience_batch_max_trajectories
  • telemetry is additive and uses fixed phase buckets rather than URI/task-specific metric keys

Validation Signal

Small TAU-2 airline corpus-prep remeasure on the same cached train transcripts. This is not a benchmark-score claim; it only validates write-time consolidation behavior on realistic multi-step sessions.

Setup: 8 airline train tasks, success-only commit policy; 7 successful sessions committed and 1 failed session skipped. Counts exclude .abstract.md / .overview.md.

mode wall traj exp read
per-trajectory 744s 12 11 10 single experience consolidation calls, ~165s
batch max5 661s 14 10 4 batch + 3 single consolidation calls, ~135s

Quality read: the batch output stayed close in durable experience count on this sample and did not show obvious whole-batch-to-one-card source misattribution. The remaining quality risk is mild granularity drift between related cards, so the feature stays opt-in and bounded while telemetry makes the write bottleneck observable.

TAU-2 benchmark config in this branch defaults corpus preparation to batch mode and records the expected server memory config in run artifacts. --strict-preflight checks OPENVIKING_CONFIG_FILE / ~/.openviking/ov.conf so evidence runs fail fast if the server is still using per-trajectory consolidation.

Tests

  • uv run ruff format --check openviking/session/compressor_v2.py openviking/session/memory/agent_experience_context_provider.py openviking/session/memory/batch_agent_experience_context_provider.py openviking/telemetry/operation.py openviking_cli/utils/config/memory_config.py tests/session/memory/test_agent_experience_context_provider.py tests/session/memory/test_compressor_v2.py tests/session/test_session_commit.py tests/test_telemetry_runtime.py
  • uv run ruff check openviking/session/compressor_v2.py openviking/session/memory/agent_experience_context_provider.py openviking/session/memory/batch_agent_experience_context_provider.py openviking/telemetry/operation.py openviking_cli/utils/config/memory_config.py tests/session/memory/test_agent_experience_context_provider.py tests/session/memory/test_compressor_v2.py tests/session/test_session_commit.py tests/test_telemetry_runtime.py
  • uv run --group dev --with pytest-asyncio --with pytest-cov pytest tests/session/memory/test_agent_experience_context_provider.py tests/session/memory/test_compressor_v2.py::TestCompressorV2::test_extract_phase_runs_post_apply_before_lock_release tests/session/memory/test_compressor_v2.py::TestCompressorV2::test_agent_memory_batch_experience_respects_batch_size tests/session/memory/test_compressor_v2.py::TestCompressorV2::test_agent_memory_batch_experience_keeps_sources_separate_per_experience tests/session/memory/test_compressor_v2.py::TestCompressorV2::test_agent_memory_batch_experience_skips_source_append_without_attribution tests/session/test_session_commit.py::TestCommit::test_commit_extracts_memories tests/test_telemetry_runtime.py::test_telemetry_summary_includes_agent_memory_phase_metrics -q

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

PR Reviewer Guide 🔍

(Review updated until commit 1ae63ba)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@huangruiteng huangruiteng marked this pull request as ready for review May 23, 2026 17:20
@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 1ae63ba

@qin-ctx qin-ctx requested a review from chenjw May 25, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant