feat(memory): agent-scope two-phase trajectory/experience memory pipeline by yangxinxin-7 · Pull Request #1880 · volcengine/OpenViking

yangxinxin-7 · 2026-05-07T03:58:15Z

Overview

Introduces an agent-scope memory pipeline that runs alongside the existing user memory extraction. After each session is committed, two sequential phases extract and consolidate execution knowledge specific to the agent identity.

Architecture

Two-phase pipeline

Phase 1 — Trajectory extraction
Reads the full conversation and writes one (or rarely more) trajectory memory per business domain. Each trajectory is an immutable execution trace: goal, step-by-step actions with tool calls, result, and failure analysis. Filenames are timestamped to guarantee uniqueness (<name>_YYYYMMDDHHMMSS.md).

Phase 2 — Experience consolidation
For each newly written trajectory, retrieves the top-K most relevant existing experience memories (via vector search with directory-listing fallback), then lets the LLM choose one of four strategies:

Update — trajectory fits an existing experience; rewrite it in place.
Replace — related experience exists but its name no longer captures the broader pattern; create a new one and delete the old.
Create — no related experience exists; create a new one.
Skip — no transferable lesson; do nothing.

Each experience file stores a source_trajectories metadata list (capped at 5 entries) so Phase 2 can ground its decision in concrete past executions.

Isolation from user memory

New agent_only: true flag on MemoryTypeSchema. Schemas marked agent_only (trajectory, experience) are filtered out of SessionExtractContextProvider so they never appear in user-scope extraction. User memory and agent memory run concurrently via asyncio.gather.

Changes

Area	Change
`memory/experience.yaml`	New schema replacing `cases.yaml`
`memory/trajectory.yaml`	New schema replacing `patterns.yaml`
`core/directories.py`	Rename preset dirs: `cases→trajectories`, `patterns→experiences`
`compressor_v2.py`	`extract_agent_memories()`, `_run_extract_phase()`, `_append_trajectories_to_experiences()`
`agent_trajectory_context_provider.py`	Phase 1 provider (no prefetch, add_only)
`agent_experience_context_provider.py`	Phase 2 provider (search + prefetch candidates + source trajectories)
`session.py`	Concurrent gather with independent error handling per task
`memory_config.py`	`agent_memory_enabled` flag (default `false`)
`client/local.py`	Accept explicit `UserIdentifier` for embedded/test use
`extract_loop.py`	Track prefetched URIs in `_read_files`; guard empty tool schema

Configuration

Enable via ~/.openviking/ov.conf:

{
  "memory": {
    "version": "v2",
    "agent_memory_enabled": true
  }
}

Testing

End-to-end integration test in tests/integration/test_agent_memory_e2e.py:

Two sessions in the same booking-conflict domain
Round 1: asserts trajectory written + experience created
Round 2: asserts trajectory count grows + experience count stays at 1 (edit path, not duplicate create)
Verifies source_trajectories metadata links back to extracted trajectories

Run:

RUN_AGENT_MEMORY_TESTS=1 .venv/bin/pytest tests/integration/test_agent_memory_e2e.py -v -s -m integration

Add a two-phase agent memory pipeline with schema-driven trajectory and experience extraction, plus system-managed source trajectory tracking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Enable the agent memory pipeline behind config and invoke trajectory/experience extraction during session memory processing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ce merge, concurrent extraction - Trajectory filenames now include a timestamp suffix (via _stamp_trajectory_names in compressor_v2 before apply_operations), so trajectory_name in both the filename and MEMORY_FIELDS carries the full timestamped name - Experience extraction: add merge operation (write generalized + delete_uris), fix delete lock conflict (pass lock_handle to viking_fs.rm), and inherit source_trajectories from deleted experiences before merge - Near-duplicate trajectory dedup removed from memory_updater; delete moved before write to avoid AGFS sibling lock contention - session.py: restore user memory extraction and run user + agent memory concurrently via asyncio.gather (agent memory gated by agent_memory_enabled) - directories.py: trajectories and experiences directories added to agent memory preset with abstract/overview; cases and patterns removed - Simplify trajectory/experience YAML descriptions and instructions - extract_loop: skip refetch for add_only schemas; add logging for URI resolution and operation dispatch to aid diagnosis of duplicate experience writes - demo_agent_memory.py: replace three-round demo with two same-domain rounds to specifically test the experience edit path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ing trajectory/experience - Add `agent_only: true` to trajectory.yaml and experience.yaml schemas - Add `agent_only` field to `MemoryTypeSchema` dataclass - Parse `agent_only` from YAML in `MemoryTypeRegistry._parse_memory_type` - Filter out agent_only schemas in both `prefetch` and `get_memory_schemas` in `SessionExtractContextProvider`, so trajectory/experience are only processed by the agent memory extraction pipeline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eline - test_trajectory_and_experience_extraction: runs two same-domain sessions, asserts Round 1 creates the experience and Round 2 edits it (no duplicate), and verifies all trajectory filenames carry a timestamp suffix - test_no_agent_only_schemas_in_user_memory: unit-level check that trajectory/experience schemas are filtered out of SessionExtractContextProvider Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ce extraction Phase 1 (trajectory): extract execution summaries from conversation, one per business domain. Phase 2 (experience): prefetch top-5 candidate experiences + source trajectories, single no-tool LLM call to Update/Replace/Create/Skip. Key changes: - AgentExperienceContextProvider: rewrite as prefetch-all + single no-tool call; top-3 candidates include source_trajectories for grounding; prefetched_uris tracked to skip refetch check - AgentTrajectoryContextProvider: remove read tool (was causing hallucination); tighten instruction - ExtractLoop: fix prefetch URI tracking (old format was broken); guard tool_choice on empty tools - compressor_v2: deserialize trajectory content before passing to experience phase; restore user/agent memory concurrent execution in session.py - memory_updater: downgrade diff_match_patch ImportError from tracer.error to tracer.info - volcengine_vlm: trace tool calls and response content separately - experience/trajectory yaml: refine field descriptions and Reflect section wording - e2e test: add skipif guard, tracer init, two-iteration loop, persistent demo dir Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Resolve extract_loop prefetch tracking conflict by keeping both provider-declared prefetched_uris support and upstream's legacy tool_call_name JSON compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Drop the unused get_source_trajectories memory tool after phase-2 moved to prefetch-only context, and replace source_trajectory debug prints with tracer logs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Allow LocalClient to accept an explicit UserIdentifier and add an integration test covering user+agent agent-memory isolation in embedded mode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Resolved conflicts in 8 files, preserving agent-memory features (extract_agent_memories, two-phase pipeline, agent_memory_enabled config) while integrating upstream changes (overview_template, extraction_enabled, eager_prefetch, isolation_handler refactor, _build_memory_diff, bind_telemetry_stage, get_event_content). Key merge decisions: - dataclass.py: kept both agent_only and overview_template fields - memory_config.py: kept all fields from both branches - extract_loop.py: kept prefetch URI tracking; adopted upstream bind_telemetry_stage - memory_updater.py: adopted upstream apply_operations/apply_upsert refactor; kept get_timestamp_from_ranges alongside upstream get_event_content - uri.py: adopted upstream supplement_operation_uris with isolation_handler - session.py: merged concurrent user+agent extraction into upstream structure with extraction_enabled check and archive_uri param - compressor_v2.py: kept extract_agent_memories + helpers; added _build_memory_diff; updated _run_extract_phase to use new data model and isolation_handler Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… + cap source_trajectories - AgentExperienceContextProvider.prefetch now populates _read_file_contents for each candidate experience, fixing two issues on the Replace path: 1. resolve_operations could never find delete_file_contents → old file was never deleted 2. inherited_traj_uris was always empty → source_trajectories not inherited On the Update path this also eliminates the extra _check_unread_existing_files LLM round-trip that was previously triggered for every edit. - Move deserialize_content/deserialize_metadata imports from inline to module top. - AgentTrajectoryContextProvider.prefetch signature simplified (no unused args). - _append_trajectories_to_experiences: cap source_trajectories at 5 most recent URIs to prevent unbounded growth over many sessions (MAX_SOURCE_TRAJECTORIES = 5). - e2e test cleaned up: single focused test, remove redundant Replace-path tests, filter .abstract.md in _list_non_overview_entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ompts - experience.yaml: restructure content format from 4-section to 3-section (Situation / Approach / Reflect), rewrite rules to emphasize machine readability, mutual exclusivity between Approach and Reflect, and abstraction mandate for generalization. - trajectory.yaml: extend content format with explicit Trajectory steps (intent + actions + progress) and Fail reason field; add exhaustive tracking and tool-call formatting rules. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- session.py: use gather(return_exceptions=True) so user and agent memory tasks fail independently; each side logs its own error and falls back to [] instead of losing the other side's results - compressor_v2: remove redundant rm before write_file in _append_trajectories_to_experiences — agfs PUT is atomic overwrite, so the prior delete only added a data-loss window; also drop the duplicate ExtractContext/MemoryIsolationHandler construction in _run_extract_phase and fix its outdated docstring - extract_loop: remove stray blank line after prefetch tracking block - memory_updater: remove extra blank line inside class body - experience.yaml / trajectory.yaml: add missing trailing newlines Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-07T04:00:38Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 75
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Race condition in _append_trajectories_to_experiences _append_trajectories_to_experiences reads and writes experience files after the lock on the experience directory has been released. This can lead to lost updates if multiple agent memory extraction runs occur concurrently. async def _append_trajectories_to_experiences( self, exp_uris: List[str], traj_uris: List[str], ctx, viking_fs, ) -> None: """Append traj_uris to the source_trajectories list of each experience file. This is the system-side management of source_trajectories — the LLM never outputs this field; the pipeline appends the batch after a write or edit. """ from openviking.session.memory.utils.content import deserialize_full, serialize_with_metadata normalized_traj_uris = [uri for uri in traj_uris if uri] if not normalized_traj_uris: return for exp_uri in exp_uris: try: raw = await viking_fs.read_file(exp_uri, ctx=ctx) or "" file_content = deserialize_full(raw) plain_content = file_content.plain_content metadata = file_content.memory_fields or {} existing = metadata.get("source_trajectories", []) if isinstance(existing, list): uris = list(existing) elif isinstance(existing, str) and existing.strip(): uris = [line.strip() for line in existing.splitlines() if line.strip()] else: uris = [] changed = False for traj_uri in normalized_traj_uris: if traj_uri not in uris: uris.append(traj_uri) changed = True # Trim to the most recent N entries so the list doesn't grow unboundedly. if len(uris) > MAX_SOURCE_TRAJECTORIES: uris = uris[-MAX_SOURCE_TRAJECTORIES:] changed = True if changed: metadata["source_trajectories"] = uris metadata["content"] = plain_content new_raw = serialize_with_metadata(metadata) await viking_fs.write_file(exp_uri, new_raw, ctx=ctx) tracer.info( f"[source_traj] appended {len(normalized_traj_uris)} trajectories -> {exp_uri}" ) else: tracer.info(f"[source_traj] already present, skip: {exp_uri}") except Exception as e: logger.warning(f"Failed to append source trajectories to {exp_uri}: {e}") Malformed experience files due to missing content parameter in serialize_with_metadata In _append_trajectories_to_experiences, serialize_with_metadata is called without passing the content parameter, and content is instead set in metadata. This leads to content being stored in the YAML frontmatter instead of the file body, and an empty body. metadata["source_trajectories"] = uris metadata["content"] = plain_content new_raw = serialize_with_metadata(metadata) await viking_fs.write_file(exp_uri, new_raw, ctx=ctx)

github-actions · 2026-05-07T04:02:49Z

PR Code Suggestions ✨

No code suggestions found for the PR.

…ml→trajectories.yaml

…trajectories

yangxinxin-7 and others added 16 commits April 13, 2026 20:53

feat(memory): add agent trajectory and experience extraction

6055d46

Add a two-phase agent memory pipeline with schema-driven trajectory and experience extraction, plus system-managed source trajectory tracking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(memory): wire agent memory extraction into session flow

4dd74ac

Enable the agent memory pipeline behind config and invoke trajectory/experience extraction during session memory processing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: remove demo_agent_memory.py, replaced by integration test

2ccea5f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge upstream/main into feature/agent-memory

dd880d7

Resolve extract_loop prefetch tracking conflict by keeping both provider-declared prefetched_uris support and upstream's legacy tool_call_name JSON compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(memory): remove unused source trajectory tool and noisy prints

b02377c

Drop the unused get_source_trajectories memory tool after phase-2 moved to prefetch-only context, and replace source_trajectory debug prints with tracer logs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(client): support explicit embedded user for agent memory tests

05758ee

Allow LocalClient to accept an explicit UserIdentifier and add an integration test covering user+agent agent-memory isolation in embedded mode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat：test

bc90867

feat：fix agent memory test

6ea8922

github-project-automation Bot added this to OpenViking project May 7, 2026

github-project-automation Bot moved this to Backlog in OpenViking project May 7, 2026

yangxinxin-7 requested a review from chenjw May 7, 2026 04:00

yangxinxin-7 added 3 commits May 7, 2026 14:06

chore(memory): rename experience.yaml→experiences.yaml, trajectory.ya…

26068a2

…ml→trajectories.yaml

chore(memory): rename memory_type experience→experiences, trajectory→…

9f87d3e

…trajectories

chore(memory): remove dead _read_files tracking in extract_loop

27b88de

qin-ctx assigned chenjw May 7, 2026

chenjw approved these changes May 8, 2026

View reviewed changes

chenjw merged commit 5de357d into volcengine:main May 8, 2026
5 of 6 checks passed

github-project-automation Bot moved this from Backlog to Done in OpenViking project May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): agent-scope two-phase trajectory/experience memory pipeline#1880

feat(memory): agent-scope two-phase trajectory/experience memory pipeline#1880
chenjw merged 19 commits intovolcengine:mainfrom
yangxinxin-7:feature/agent-memory

yangxinxin-7 commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yangxinxin-7 commented May 7, 2026

Overview

Architecture

Two-phase pipeline

Isolation from user memory

Changes

Configuration

Testing

Uh oh!

github-actions Bot commented May 7, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented May 7, 2026

PR Code Suggestions ✨

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants