mnemostack 0.2.0 — first stable release

First stable release of mnemostack with all proven P1 answer-generation features enabled by default.

Defaults flipped (Variant B)

AnswerGenerator(...) now defaults to:

category_aware_prompts=True — routes list, temporal, multi-hop, inference, and adversarial questions through category-specific prompts
specificity_resolver=True — rewrites placeholder answers ("her colleague", "that book") with concrete names when evidence is present
inference_retry=True — retries low-confidence open-domain inference answers via query decomposition + RRF merge
list_extract_mode=False — kept off (wash on LoCoMo, may be useful elsewhere)

Explicit False arguments preserve 0.2.0b6 behavior. Backward compatible.

Benchmarks (LoCoMo, 9/9 valid conversations)

Clean comparison on 1782 QA, identical Gemini Flash judge, identical conversations (conv-50 excluded — pending Gemini RPD reset on 2026-04-27, will be added to README in next patch):

Metric	0.2.0b1 baseline (no P1)	0.2.0 (all P1)	Δ
Strict correct	66.6% (1186/1782)	67.7% (1207/1782)	+1.1pp
Combined (correct+partial)	80.4%	81.0%	+0.6pp

By category:

Category	0.2.0b1	0.2.0	Δ
cat_1 single-hop	32.4%	32.8%	+0.4pp
cat_2 multi-hop temporal	67.1%	70.6%	+3.5pp ✅
cat_3 open-domain inference	29.2%	39.3%	+10.1pp ✅✅
cat_4 multi-hop relations	69.2%	69.6%	+0.4pp
cat_5 adversarial	90.8%	90.2%	-0.6pp (judge noise)

Mechanism: inference_retry reduced cat_3 not-in-memory from 42 to 27, and 9 of those recovered answers became correct. This is the dominant lift.

Full 10/10 numbers will land in 0.2.0.x after conv-50 rerun on 2026-04-27.

What's in 0.2.0 (cumulative since 0.1.x)

BM25 retriever from Qdrant payloads (BM25Retriever.from_qdrant(...)) — exact-token recall over the same canonical chunks as vector search
Metadata/boilerplate stripping (mnemostack.utils.strip_metadata_blocks, is_heartbeat_poll)
Unbounded BM25 corpus by default (no 40K cap)
Category-aware prompts + multi-hop protection
list_extract_mode (opt-in)
specificity resolver (default on, with placeholder detector)
inference retry with query decomposition (default on, with cat_5 adversarial guard)
prompt_template explicit override now wins over category prompt
inference_retry recaller wired in MCP server entrypoints

Install

pip install -U mnemostack

pip install mnemostack==0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mnemostack 0.2.0

Choose a tag to compare

Sorry, something went wrong.