test: stabilize flaky retain and load batch tests#1053
Merged
nicoloboschi merged 1 commit intomainfrom Apr 14, 2026
Merged
Conversation
- test_retain.py: pin fact_type_override="world" on retains that later filter recall by fact_type=["world"]; the LLM was classifying facts as "experience" non-deterministically, returning 0 recall results. - test_load_large_batch.py: add disable_observations fixture so inline consolidation (SyncTaskBackend) doesn't run during load tests — the pool-under-load mock wasn't handling scope="consolidation" and was timing out under 10 concurrent retains. - test_load_large_batch.py: mark the file with xdist_group so the heavy load tests don't contend for CPU/memory with other parallel workers.
4c52f9d to
2ae8e74
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes four tests that were consistently failing/flaking on CI.
test_retain.py::test_mentioned_at_vs_occurredandtest_chunk_fact_mappingBoth retained content and then recalled with
fact_type=["world"]. The LLM was free to classify facts asexperiencevsworld, so recall returned 0 results whenever classification landed onexperience. Fix: passfact_type_override="world"on retain so the classification is deterministic.test_load_large_batch.py::test_db_connection_pool_under_loadWas timing out at 60s (3/3 runs locally). Root cause:
enable_observations=True(default) triggers inline consolidation viaSyncTaskBackendafter each retain. The mockLLMProvider.callin this test wasn't special-casingscope="consolidation", so consolidation received a fact-extraction-shaped response and stalled. With 10 concurrent retains, this compounded into timeouts. Fix: add adisable_observationsfixture that setsconfig.enable_observations = Falsefor the test's duration. Post-fix: 3/3 runs pass in <1s call time.test_load_large_batch.py::test_large_batch_500k_chars_20_itemsApplied the same
disable_observationsfixture (it was also spending time in inline consolidation, and the fixture is the cleaner pattern now that it exists for the file).test_load_large_batch.pyxdist groupingThe file is also marked with
pytest.mark.xdist_group("load_batch_tests")so the heavy load tests don't run simultaneously with each other or with the rest of the suite under-n 8. One of the runs was crashing under worker contention; grouping serializes them and eliminates the crash.Test plan
pytest tests/test_retain.py::test_mentioned_at_vs_occurred tests/test_retain.py::test_chunk_fact_mapping— 3/3 greenpytest tests/test_load_large_batch.py::TestLargeBatchRetain::test_db_connection_pool_under_load— 3/3 green (was 3/3 failing before)pytest tests/test_load_large_batch.py— all 3 tests pass together in ~68s