fix(recall): bypass tag filter on expansion candidates (#142)#146
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes scoped recall expansion so that relation/entity expansion candidates bypass inclusive tag filters by default (while still honoring time windows and exclude_tags), with an expand_respect_tags=true opt-in to restore the old behavior. It also refactors benchmark harness plumbing into a backend adapter layer, adds expansion/retrieval telemetry, and introduces a read-only production DB browser + documentation clarifying benchmark/evals repo boundaries.
Changes:
- Update recall expansion to bypass tag filters for expansion candidates by default; add
expand_respect_tagsparameter and telemetry. - Add targeted regression tests for expansion behavior; add retrieval Recall@5 metrics to LongMemEval scoring/comparison.
- Introduce benchmark backend adapters and a read-only DB browsing script; update docs around benchmark ownership/evals contract.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| automem/api/recall.py | Adds expand_respect_tags and changes expansion filtering + response telemetry. |
| tests/test_api_endpoints.py | Adds regression tests covering tag-bypass, opt-in tag-respect, exclude_tags, and time windows. |
| tests/benchmarks/backends.py | Introduces benchmark backend abstraction + AutoMem adapter (scope tagging, ingest/search/cleanup). |
| tests/test_benchmark_backends.py | Adds unit tests for the new benchmark backend adapter behavior + scorer metrics. |
| tests/benchmarks/test_locomo.py | Refactors LoCoMo harness to use backend adapters, adds scope prefixing and record normalization. |
| tests/benchmarks/longmemeval/test_longmemeval.py | Refactors LongMemEval harness to use backend adapters and adds Recall@5 hit tracking. |
| tests/benchmarks/longmemeval/evaluator.py | Adds retrieval Recall@5 aggregation and reporting. |
| tests/benchmarks/longmemeval/configs.py | Adds backend/work_dir configuration fields. |
| scripts/browse_memories.py | Adds a CLI to browse/diagnose production FalkorDB + Qdrant (read-only). |
| scripts/bench/compare_results.py | Extends LongMemEval comparison to include Recall@5. |
| docs/TESTING.md | Documents benchmark ownership boundary (automem vs automem-evals). |
| docs/EVALS_CONTRACT.md | Adds an explicit eval contract for external benchmark repos. |
| benchmarks/EXPERIMENT_LOG.md | Records experiment notes for #142 scoped expansion fix. |
| README.md | Notes benchmark ownership boundary in the project overview. |
| CLAUDE.md | Documents benchmark ownership and the new DB browser script. |
| AGENTS.md | Adds benchmark ownership boundary note. |
| .gitignore | Ignores new benchmark comparison outputs and sweep artifacts. |
Clarify that canonical benchmark harnesses and published baselines stay in automem while exploratory cross-backend work lives in automem-evals.
Refactor the canonical benchmark harnesses around a shared backend adapter and surface LongMemEval Recall@5 so retrieval changes can be measured before answer quality shifts.
Expose a local CLI for inspecting FalkorDB and Qdrant state so benchmark and recall regressions can be debugged without mutating production data.
Let scoped recall keep tag gating for seed selection while allowing relation and entity expansion to traverse cross-boundary memories unless callers explicitly opt back into tag-respecting expansion.
4ebbbd5 to
3578a71
Compare
- honor scoped tag_mode semantics in benchmark search - persist and reuse deterministic LoCoMo scope prefixes - align benchmark cleanup and ingest pacing with backend batching
jack-arturo
added a commit
that referenced
this pull request
Apr 23, 2026
🤖 I have created a release *beep* *boop* --- ## [0.15.2](v0.15.1...v0.15.2) (2026-04-23) ### Bug Fixes * **benchmarks:** make LoCoMo judge runs reliable ([#149](#149)) ([c22f2c9](c22f2c9)) * **recall:** bypass tag filter on expansion candidates ([#142](#142)) ([#146](#146)) ([4f0fcf8](4f0fcf8)) * **recall:** keyword scoring for vector results + softer adaptive floor ([#150](#150)) ([591b2c7](591b2c7)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
exclude_tagsexpand_respect_tags=trueescape hatch plus focused regression tests and expansion telemetryRepro
After the fix, the local repro returns:
expansion.expanded_count: 1expansion.respect_tags: falseAnd the same request with
expand_respect_tags=truereturnsexpanded_count: 0.Test plan
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 .venv/bin/pytest tests/test_api_endpoints.py -q -k "expand_related_memories or relation_taxonomy"PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 .venv/bin/pytest tests/test_benchmark_backends.py -q.venv/bin/python tests/benchmarks/test_locomo.py --base-url http://localhost:8001 --api-token test-token --recall-limit 10.venv/bin/python tests/benchmarks/longmemeval/test_longmemeval.py --base-url http://localhost:8001 --api-token test-token --config baselineflask-apimake testis currently blocked locally because it bootstraps a freshvenv/under Python 3.14 and hits unrelatedspacy==3.8.7/ FastEmbed environment failuresCloses #142.