mnemostack 0.2.0
mnemostack 0.2.0 — first stable release
First stable release of mnemostack with all proven P1 answer-generation features enabled by default.
Defaults flipped (Variant B)
AnswerGenerator(...) now defaults to:
category_aware_prompts=True— routes list, temporal, multi-hop, inference, and adversarial questions through category-specific promptsspecificity_resolver=True— rewrites placeholder answers ("her colleague", "that book") with concrete names when evidence is presentinference_retry=True— retries low-confidence open-domain inference answers via query decomposition + RRF mergelist_extract_mode=False— kept off (wash on LoCoMo, may be useful elsewhere)
Explicit False arguments preserve 0.2.0b6 behavior. Backward compatible.
Benchmarks (LoCoMo, 9/9 valid conversations)
Clean comparison on 1782 QA, identical Gemini Flash judge, identical conversations (conv-50 excluded — pending Gemini RPD reset on 2026-04-27, will be added to README in next patch):
| Metric | 0.2.0b1 baseline (no P1) | 0.2.0 (all P1) | Δ |
|---|---|---|---|
| Strict correct | 66.6% (1186/1782) | 67.7% (1207/1782) | +1.1pp |
| Combined (correct+partial) | 80.4% | 81.0% | +0.6pp |
By category:
| Category | 0.2.0b1 | 0.2.0 | Δ |
|---|---|---|---|
| cat_1 single-hop | 32.4% | 32.8% | +0.4pp |
| cat_2 multi-hop temporal | 67.1% | 70.6% | +3.5pp ✅ |
| cat_3 open-domain inference | 29.2% | 39.3% | +10.1pp ✅✅ |
| cat_4 multi-hop relations | 69.2% | 69.6% | +0.4pp |
| cat_5 adversarial | 90.8% | 90.2% | -0.6pp (judge noise) |
Mechanism: inference_retry reduced cat_3 not-in-memory from 42 to 27, and 9 of those recovered answers became correct. This is the dominant lift.
Full 10/10 numbers will land in 0.2.0.x after conv-50 rerun on 2026-04-27.
What's in 0.2.0 (cumulative since 0.1.x)
- BM25 retriever from Qdrant payloads (
BM25Retriever.from_qdrant(...)) — exact-token recall over the same canonical chunks as vector search - Metadata/boilerplate stripping (
mnemostack.utils.strip_metadata_blocks,is_heartbeat_poll) - Unbounded BM25 corpus by default (no 40K cap)
- Category-aware prompts + multi-hop protection
- list_extract_mode (opt-in)
- specificity resolver (default on, with placeholder detector)
- inference retry with query decomposition (default on, with cat_5 adversarial guard)
- prompt_template explicit override now wins over category prompt
- inference_retry recaller wired in MCP server entrypoints
Install
pip install -U mnemostackor
pip install mnemostack==0.2.0