Skip to content

fix(reflect): scope delta mental model recall to new memories only#1192

Merged
nicoloboschi merged 1 commit intomainfrom
fix/delta-mode-time-scoped-recall
Apr 22, 2026
Merged

fix(reflect): scope delta mental model recall to new memories only#1192
nicoloboschi merged 1 commit intomainfrom
fix/delta-mode-time-scoped-recall

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

  • Delta mode mental model refresh was running a full recall across ALL memories (identical to full mode), then passing all facts to a second LLM call for delta ops. This caused content bloat, duplication (paragraphs repeated 4-5x), and made delta more expensive than full mode.
  • Now delta recall is scoped to memories created/updated since last_refreshed_at, using updated_at to also catch consolidation updates.
  • Delta prompt rewritten to preserve existing content, merge overlapping topics, and preserve concrete examples over abstract rules.
  • Reflect agent receives context during MM refresh with document name, stay-on-topic guidance, and example preservation instructions.

Changes

Recall pipeline (retrieval.py, link_expansion_retrieval.py, graph_retrieval.py):

  • created_after/created_before time range filter on updated_at threaded through all retrieval strategies (semantic, BM25, temporal, graph seeds)
  • Graph expansion intentionally unfiltered — seeds are the filter gate, expansion follows links to older related content for context

Reflect + tools (memory_engine.py, tools.py):

  • Time range params threaded through recall_async_search_with_retriesreflect_async → tool closures → tool_recall/tool_search_observations
  • _is_mental_model_stale now uses updated_at (catches consolidation updates)

Mental model refresh (memory_engine.py):

  • Delta refresh passes created_after=last_refreshed_at to reflect
  • No-new-facts short-circuit: skip delta LLM call when nothing new, preserve existing content
  • based_on accumulates across refreshes (merge previous + new, deduped by ID)
  • Context passed to reflect agent: document name, topic-focus guidance, example preservation

Delta prompt (prompts.py):

  • Rewritten: preserve existing content from prior refreshes, merge overlapping topics into existing sections, preserve concrete examples/samples over abstract rules
  • Remove ops gated on explicit contradiction/supersession by new facts (not "off-topic" judgment)

Test plan

  • All 58 existing unit tests pass (7 delta plumbing + 39 structured doc + 12 recall config)
  • New integration test (test_delta_editorial_fusion.py) with real SEO specialist + brand voice documents — verifies brand voice fuses organically with SEO guidance, no duplication, based_on accumulates
  • 3/3 stable passes on editorial fusion test
  • Lint clean (ruff check)
  • Manual test on running API with control plane

@nicoloboschi nicoloboschi force-pushed the fix/delta-mode-time-scoped-recall branch from 55c1d1f to 2f1a3e3 Compare April 22, 2026 10:18
Delta mode mental model refresh was running a full recall across ALL
memories (identical to full mode), then passing all facts to a second
LLM call for delta ops. This caused content bloat, duplication, and
made delta strictly more expensive than full mode.

Changes:
- Add created_after/created_before time range filter to the recall
  pipeline (retrieval.py, link_expansion_retrieval.py, graph_retrieval.py)
  threaded through recall_async -> reflect_async -> tool closures
- Delta refresh passes last_refreshed_at as created_after so the
  agentic loop only retrieves memories created/updated since the last
  refresh (uses updated_at to catch consolidation updates)
- Short-circuit delta when no new facts found (skip LLM call, preserve
  existing content)
- Accumulate based_on across delta refreshes (merge previous + new,
  deduped by ID)
- Pass context to reflect agent during MM refresh with document name,
  stay-on-topic guidance, and example preservation instructions
- Rewrite delta prompt: preserve existing content from prior refreshes,
  merge overlapping topics, preserve concrete examples over abstract
  rules
- Add recall time-range unit tests (8 tests)
- Add integration test verifying delta fusion quality
@nicoloboschi nicoloboschi force-pushed the fix/delta-mode-time-scoped-recall branch from 2f1a3e3 to 92b6a4a Compare April 22, 2026 10:30
@nicoloboschi nicoloboschi merged commit e90cfa4 into main Apr 22, 2026
52 of 54 checks passed
@canuysal
Copy link
Copy Markdown

Hi, do we need to set anything regarding mental models, retain options etc to activate delta retain/mental models or is it automatic? Yesterday I unleashed the hindsight beast, had a $30 bill across 500 conversation retains. (gemini-3-flash model) I was probably doing the whole thing wrong (rebuilding mental model on every new chat message I guess?) so trying to keep it as optimized as possible now, I only need per user profile to pass the agent. Any hints are appreciated! P.S: Loving the memory structure on this, spot on.

@nicoloboschi
Copy link
Copy Markdown
Collaborator Author

Hi, do we need to set anything regarding mental models, retain options etc to activate delta retain/mental models or is it automatic? Yesterday I unleashed the hindsight beast, had a $30 bill across 500 conversation retains. (gemini-3-flash model) I was probably doing the whole thing wrong (rebuilding mental model on every new chat message I guess?) so trying to keep it as optimized as possible now, I only need per user profile to pass the agent. Any hints are appreciated! P.S: Loving the memory structure on this, spot on.

yes you have to set mode=delta
https://hindsight.vectorize.io/developer/api/mental-models#trigger-settings
with auto refresh, the mental model refresh will still refresh but it will only append and search the new content

if you don't want real time and if you prefer batching the updates (e.g. hourly) you can set manual and trigger the refresh via API on your fav schedule

@canuysal
Copy link
Copy Markdown

Thanks! In node.js sdk it seems trigger mode is not present, only refresh_after_consolidation option is available, heads up! For now I'll update mental models with an API shim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants