Skip to content

fix(recall): bypass tag filter on expansion candidates (#142)#146

Merged
jack-arturo merged 5 commits into
mainfrom
fix/142-expansion-tag-filter
Apr 23, 2026
Merged

fix(recall): bypass tag filter on expansion candidates (#142)#146
jack-arturo merged 5 commits into
mainfrom
fix/142-expansion-tag-filter

Conversation

@jack-arturo

@jack-arturo jack-arturo commented Apr 23, 2026

Copy link
Copy Markdown
Member

Summary

Repro

curl -sS \
  -H 'Authorization: Bearer test-token' \
  --get 'http://localhost:8001/recall' \
  --data-urlencode 'query=rate limiter redis scan' \
  --data-urlencode 'tags=issue-142-scope' \
  --data-urlencode 'tag_match=exact' \
  --data-urlencode 'expand_relations=true' \
  --data-urlencode 'relation_limit=5' \
  --data-urlencode 'expansion_limit=10'

After the fix, the local repro returns:

  • expansion.expanded_count: 1
  • expansion.respect_tags: false

And the same request with expand_respect_tags=true returns expanded_count: 0.

Test plan

  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 .venv/bin/pytest tests/test_api_endpoints.py -q -k "expand_related_memories or relation_taxonomy"
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 .venv/bin/pytest tests/test_benchmark_backends.py -q
  • full LoCoMo regression via .venv/bin/python tests/benchmarks/test_locomo.py --base-url http://localhost:8001 --api-token test-token --recall-limit 10
  • full LongMemEval regression via .venv/bin/python tests/benchmarks/longmemeval/test_longmemeval.py --base-url http://localhost:8001 --api-token test-token --config baseline
  • live scoped-expansion repro against local API after restarting flask-api
  • make test is currently blocked locally because it bootstraps a fresh venv/ under Python 3.14 and hits unrelated spacy==3.8.7 / FastEmbed environment failures

Closes #142.

Copilot AI review requested due to automatic review settings April 23, 2026 00:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes scoped recall expansion so that relation/entity expansion candidates bypass inclusive tag filters by default (while still honoring time windows and exclude_tags), with an expand_respect_tags=true opt-in to restore the old behavior. It also refactors benchmark harness plumbing into a backend adapter layer, adds expansion/retrieval telemetry, and introduces a read-only production DB browser + documentation clarifying benchmark/evals repo boundaries.

Changes:

  • Update recall expansion to bypass tag filters for expansion candidates by default; add expand_respect_tags parameter and telemetry.
  • Add targeted regression tests for expansion behavior; add retrieval Recall@5 metrics to LongMemEval scoring/comparison.
  • Introduce benchmark backend adapters and a read-only DB browsing script; update docs around benchmark ownership/evals contract.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
automem/api/recall.py Adds expand_respect_tags and changes expansion filtering + response telemetry.
tests/test_api_endpoints.py Adds regression tests covering tag-bypass, opt-in tag-respect, exclude_tags, and time windows.
tests/benchmarks/backends.py Introduces benchmark backend abstraction + AutoMem adapter (scope tagging, ingest/search/cleanup).
tests/test_benchmark_backends.py Adds unit tests for the new benchmark backend adapter behavior + scorer metrics.
tests/benchmarks/test_locomo.py Refactors LoCoMo harness to use backend adapters, adds scope prefixing and record normalization.
tests/benchmarks/longmemeval/test_longmemeval.py Refactors LongMemEval harness to use backend adapters and adds Recall@5 hit tracking.
tests/benchmarks/longmemeval/evaluator.py Adds retrieval Recall@5 aggregation and reporting.
tests/benchmarks/longmemeval/configs.py Adds backend/work_dir configuration fields.
scripts/browse_memories.py Adds a CLI to browse/diagnose production FalkorDB + Qdrant (read-only).
scripts/bench/compare_results.py Extends LongMemEval comparison to include Recall@5.
docs/TESTING.md Documents benchmark ownership boundary (automem vs automem-evals).
docs/EVALS_CONTRACT.md Adds an explicit eval contract for external benchmark repos.
benchmarks/EXPERIMENT_LOG.md Records experiment notes for #142 scoped expansion fix.
README.md Notes benchmark ownership boundary in the project overview.
CLAUDE.md Documents benchmark ownership and the new DB browser script.
AGENTS.md Adds benchmark ownership boundary note.
.gitignore Ignores new benchmark comparison outputs and sweep artifacts.

Comment thread tests/benchmarks/longmemeval/test_longmemeval.py Outdated
Comment thread tests/test_api_endpoints.py Outdated
Comment thread tests/benchmarks/backends.py Outdated
Comment thread tests/benchmarks/longmemeval/test_longmemeval.py Outdated
Comment thread tests/benchmarks/test_locomo.py Outdated
Comment thread tests/benchmarks/test_locomo.py Outdated
Comment thread tests/test_api_endpoints.py
Comment thread tests/test_api_endpoints.py Outdated
Comment thread tests/test_api_endpoints.py Outdated
Comment thread scripts/browse_memories.py
Clarify that canonical benchmark harnesses and published baselines stay in automem while exploratory cross-backend work lives in automem-evals.
Refactor the canonical benchmark harnesses around a shared backend adapter and surface LongMemEval Recall@5 so retrieval changes can be measured before answer quality shifts.
Expose a local CLI for inspecting FalkorDB and Qdrant state so benchmark and recall regressions can be debugged without mutating production data.
Let scoped recall keep tag gating for seed selection while allowing relation and entity expansion to traverse cross-boundary memories unless callers explicitly opt back into tag-respecting expansion.
@jack-arturo jack-arturo force-pushed the fix/142-expansion-tag-filter branch from 4ebbbd5 to 3578a71 Compare April 23, 2026 10:56
- honor scoped tag_mode semantics in benchmark search

- persist and reuse deterministic LoCoMo scope prefixes

- align benchmark cleanup and ingest pacing with backend batching
@jack-arturo jack-arturo merged commit 4f0fcf8 into main Apr 23, 2026
6 checks passed
@jack-arturo jack-arturo deleted the fix/142-expansion-tag-filter branch April 23, 2026 12:55
@jack-arturo jack-arturo restored the fix/142-expansion-tag-filter branch April 23, 2026 13:24
@jack-arturo jack-arturo deleted the fix/142-expansion-tag-filter branch April 23, 2026 13:24
jack-arturo added a commit that referenced this pull request Apr 23, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.15.2](v0.15.1...v0.15.2)
(2026-04-23)


### Bug Fixes

* **benchmarks:** make LoCoMo judge runs reliable
([#149](#149))
([c22f2c9](c22f2c9))
* **recall:** bypass tag filter on expansion candidates
([#142](#142))
([#146](#146))
([4f0fcf8](4f0fcf8))
* **recall:** keyword scoring for vector results + softer adaptive floor
([#150](#150))
([591b2c7](591b2c7))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

recall: tag hard-filter is applied to expansion candidates, making expand_relations a no-op under scoped queries

2 participants