feat(memory): batch agent experience consolidation#2201
Open
huangruiteng wants to merge 20 commits into
Open
Conversation
PR Reviewer Guide 🔍(Review updated until commit 1ae63ba)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
|
Persistent review updated to latest commit 1ae63ba |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in batch mode for agent experience consolidation after trajectory extraction, plus low-cardinality phase telemetry for agent-memory extraction.
Today agent memory extraction writes trajectories first, then runs one experience-consolidation LLM pass per newly written trajectory. That keeps the flow simple, but corpus preparation becomes slow when a session produces several trajectories. This PR keeps the existing per-trajectory behavior as the default and adds a bounded batch mode:
memory.agent_experience_consolidation_mode = "batch"memory.agent_experience_batch_max_trajectories = 5The quality target is parity with the existing experience granularity. Batch mode is not intended to force one experience per source trajectory.
Source Attribution Safety
Batch mode must not reuse the single-trajectory fallback, because that can attach an entire mixed batch to one experience card. This PR adds a temporary
source_trajectory_idsfield only for the batch extraction schema. The provider resolves it into concrete trajectory URIs before apply and strips the field so it is not persisted.If a batch output omits source attribution for a written/edited experience, the system skips appending source trajectories instead of attaching the whole batch. A single experience may still cite multiple source trajectories; the attribution field is lineage, not a split instruction.
Defaults and Compatibility
Default behavior remains unchanged:
agent_experience_consolidation_modedefaults toper_trajectoryagent_experience_batch_max_trajectoriesValidation Signal
Small TAU-2 airline corpus-prep remeasure on the same cached train transcripts. This is not a benchmark-score claim; it only validates write-time consolidation behavior on realistic multi-step sessions.
Setup: 8 airline train tasks, success-only commit policy; 7 successful sessions committed and 1 failed session skipped. Counts exclude
.abstract.md/.overview.md.Quality read: the batch output stayed close in durable experience count on this sample and did not show obvious whole-batch-to-one-card source misattribution. The remaining quality risk is mild granularity drift between related cards, so the feature stays opt-in and bounded while telemetry makes the write bottleneck observable.
TAU-2 benchmark config in this branch defaults corpus preparation to batch mode and records the expected server memory config in run artifacts.
--strict-preflightchecksOPENVIKING_CONFIG_FILE/~/.openviking/ov.confso evidence runs fail fast if the server is still using per-trajectory consolidation.Tests
uv run ruff format --check openviking/session/compressor_v2.py openviking/session/memory/agent_experience_context_provider.py openviking/session/memory/batch_agent_experience_context_provider.py openviking/telemetry/operation.py openviking_cli/utils/config/memory_config.py tests/session/memory/test_agent_experience_context_provider.py tests/session/memory/test_compressor_v2.py tests/session/test_session_commit.py tests/test_telemetry_runtime.pyuv run ruff check openviking/session/compressor_v2.py openviking/session/memory/agent_experience_context_provider.py openviking/session/memory/batch_agent_experience_context_provider.py openviking/telemetry/operation.py openviking_cli/utils/config/memory_config.py tests/session/memory/test_agent_experience_context_provider.py tests/session/memory/test_compressor_v2.py tests/session/test_session_commit.py tests/test_telemetry_runtime.pyuv run --group dev --with pytest-asyncio --with pytest-cov pytest tests/session/memory/test_agent_experience_context_provider.py tests/session/memory/test_compressor_v2.py::TestCompressorV2::test_extract_phase_runs_post_apply_before_lock_release tests/session/memory/test_compressor_v2.py::TestCompressorV2::test_agent_memory_batch_experience_respects_batch_size tests/session/memory/test_compressor_v2.py::TestCompressorV2::test_agent_memory_batch_experience_keeps_sources_separate_per_experience tests/session/memory/test_compressor_v2.py::TestCompressorV2::test_agent_memory_batch_experience_skips_source_append_without_attribution tests/session/test_session_commit.py::TestCommit::test_commit_extracts_memories tests/test_telemetry_runtime.py::test_telemetry_summary_includes_agent_memory_phase_metrics -q