fix(recall): bypass tag filter on expansion candidates (#142) by jack-arturo · Pull Request #146 · verygoodplugins/automem

jack-arturo · 2026-04-23T00:57:38Z

Summary

move the benchmark prep work into clean commits on this branch so the repo boundary, benchmark adapter cleanup, and read-only browser land with the recall: tag hard-filter is applied to expansion candidates, making expand_relations a no-op under scoped queries #142 fix
bypass inclusive tag filters for relation and entity expansion candidates by default, while preserving time windows and exclude_tags
add the expand_respect_tags=true escape hatch plus focused regression tests and expansion telemetry

Repro

curl -sS \
  -H 'Authorization: Bearer test-token' \
  --get 'http://localhost:8001/recall' \
  --data-urlencode 'query=rate limiter redis scan' \
  --data-urlencode 'tags=issue-142-scope' \
  --data-urlencode 'tag_match=exact' \
  --data-urlencode 'expand_relations=true' \
  --data-urlencode 'relation_limit=5' \
  --data-urlencode 'expansion_limit=10'

After the fix, the local repro returns:

expansion.expanded_count: 1
expansion.respect_tags: false

And the same request with expand_respect_tags=true returns expanded_count: 0.

Test plan

PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 .venv/bin/pytest tests/test_api_endpoints.py -q -k "expand_related_memories or relation_taxonomy"
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 .venv/bin/pytest tests/test_benchmark_backends.py -q
full LoCoMo regression via .venv/bin/python tests/benchmarks/test_locomo.py --base-url http://localhost:8001 --api-token test-token --recall-limit 10
full LongMemEval regression via .venv/bin/python tests/benchmarks/longmemeval/test_longmemeval.py --base-url http://localhost:8001 --api-token test-token --config baseline
live scoped-expansion repro against local API after restarting flask-api
make test is currently blocked locally because it bootstraps a fresh venv/ under Python 3.14 and hits unrelated spacy==3.8.7 / FastEmbed environment failures

Closes #142.

Copilot

Pull request overview

This PR fixes scoped recall expansion so that relation/entity expansion candidates bypass inclusive tag filters by default (while still honoring time windows and exclude_tags), with an expand_respect_tags=true opt-in to restore the old behavior. It also refactors benchmark harness plumbing into a backend adapter layer, adds expansion/retrieval telemetry, and introduces a read-only production DB browser + documentation clarifying benchmark/evals repo boundaries.

Changes:

Update recall expansion to bypass tag filters for expansion candidates by default; add expand_respect_tags parameter and telemetry.
Add targeted regression tests for expansion behavior; add retrieval Recall@5 metrics to LongMemEval scoring/comparison.
Introduce benchmark backend adapters and a read-only DB browsing script; update docs around benchmark ownership/evals contract.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
automem/api/recall.py	Adds `expand_respect_tags` and changes expansion filtering + response telemetry.
tests/test_api_endpoints.py	Adds regression tests covering tag-bypass, opt-in tag-respect, exclude_tags, and time windows.
tests/benchmarks/backends.py	Introduces benchmark backend abstraction + AutoMem adapter (scope tagging, ingest/search/cleanup).
tests/test_benchmark_backends.py	Adds unit tests for the new benchmark backend adapter behavior + scorer metrics.
tests/benchmarks/test_locomo.py	Refactors LoCoMo harness to use backend adapters, adds scope prefixing and record normalization.
tests/benchmarks/longmemeval/test_longmemeval.py	Refactors LongMemEval harness to use backend adapters and adds Recall@5 hit tracking.
tests/benchmarks/longmemeval/evaluator.py	Adds retrieval Recall@5 aggregation and reporting.
tests/benchmarks/longmemeval/configs.py	Adds backend/work_dir configuration fields.
scripts/browse_memories.py	Adds a CLI to browse/diagnose production FalkorDB + Qdrant (read-only).
scripts/bench/compare_results.py	Extends LongMemEval comparison to include Recall@5.
docs/TESTING.md	Documents benchmark ownership boundary (automem vs automem-evals).
docs/EVALS_CONTRACT.md	Adds an explicit eval contract for external benchmark repos.
benchmarks/EXPERIMENT_LOG.md	Records experiment notes for #142 scoped expansion fix.
README.md	Notes benchmark ownership boundary in the project overview.
CLAUDE.md	Documents benchmark ownership and the new DB browser script.
AGENTS.md	Adds benchmark ownership boundary note.
.gitignore	Ignores new benchmark comparison outputs and sweep artifacts.

Clarify that canonical benchmark harnesses and published baselines stay in automem while exploratory cross-backend work lives in automem-evals.

Refactor the canonical benchmark harnesses around a shared backend adapter and surface LongMemEval Recall@5 so retrieval changes can be measured before answer quality shifts.

Expose a local CLI for inspecting FalkorDB and Qdrant state so benchmark and recall regressions can be debugged without mutating production data.

Let scoped recall keep tag gating for seed selection while allowing relation and entity expansion to traverse cross-boundary memories unless callers explicitly opt back into tag-respecting expansion.

- honor scoped tag_mode semantics in benchmark search - persist and reuse deterministic LoCoMo scope prefixes - align benchmark cleanup and ingest pacing with backend batching

🤖 I have created a release *beep* *boop* --- ## [0.15.2](v0.15.1...v0.15.2) (2026-04-23) ### Bug Fixes * **benchmarks:** make LoCoMo judge runs reliable ([#149](#149)) ([c22f2c9](c22f2c9)) * **recall:** bypass tag filter on expansion candidates ([#142](#142)) ([#146](#146)) ([4f0fcf8](4f0fcf8)) * **recall:** keyword scoring for vector results + softer adaptive floor ([#150](#150)) ([591b2c7](591b2c7)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

Copilot AI review requested due to automatic review settings April 23, 2026 00:57

Copilot started reviewing on behalf of jack-arturo April 23, 2026 00:58 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

jack-arturo added 4 commits April 23, 2026 11:52

docs(bench): establish automem and evals ownership boundary

c78dc1f

Clarify that canonical benchmark harnesses and published baselines stay in automem while exploratory cross-backend work lives in automem-evals.

test(bench): add backend abstraction and retrieval telemetry

84ea162

Refactor the canonical benchmark harnesses around a shared backend adapter and surface LongMemEval Recall@5 so retrieval changes can be measured before answer quality shifts.

feat(scripts): add read-only memory browser

a6abc98

Expose a local CLI for inspecting FalkorDB and Qdrant state so benchmark and recall regressions can be debugged without mutating production data.

fix(recall): bypass tag filters for expansion candidates (#142)

3578a71

Let scoped recall keep tag gating for seed selection while allowing relation and entity expansion to traverse cross-boundary memories unless callers explicitly opt back into tag-respecting expansion.

jack-arturo force-pushed the fix/142-expansion-tag-filter branch from 4ebbbd5 to 3578a71 Compare April 23, 2026 10:56

fix(benchmarks): address copilot review on PR #146

c932921

- honor scoped tag_mode semantics in benchmark search - persist and reuse deterministic LoCoMo scope prefixes - align benchmark cleanup and ingest pacing with backend batching

jack-arturo merged commit 4f0fcf8 into main Apr 23, 2026
6 checks passed

jack-arturo deleted the fix/142-expansion-tag-filter branch April 23, 2026 12:55

jack-arturo mentioned this pull request Apr 23, 2026

chore(main): release 0.15.2 #148

Merged

jack-arturo restored the fix/142-expansion-tag-filter branch April 23, 2026 13:24

jack-arturo deleted the fix/142-expansion-tag-filter branch April 23, 2026 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(recall): bypass tag filter on expansion candidates (#142)#146

fix(recall): bypass tag filter on expansion candidates (#142)#146
jack-arturo merged 5 commits into
mainfrom
fix/142-expansion-tag-filter

jack-arturo commented Apr 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jack-arturo commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Repro

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jack-arturo commented Apr 23, 2026 •

edited

Loading