Skip to content

mnemostack 0.2.0

Choose a tag to compare

@Udjin79 Udjin79 released this 26 Apr 19:14
· 168 commits to main since this release
2a55b0f

mnemostack 0.2.0 — first stable release

First stable release of mnemostack with all proven P1 answer-generation features enabled by default.

Defaults flipped (Variant B)

AnswerGenerator(...) now defaults to:

  • category_aware_prompts=True — routes list, temporal, multi-hop, inference, and adversarial questions through category-specific prompts
  • specificity_resolver=True — rewrites placeholder answers ("her colleague", "that book") with concrete names when evidence is present
  • inference_retry=True — retries low-confidence open-domain inference answers via query decomposition + RRF merge
  • list_extract_mode=False — kept off (wash on LoCoMo, may be useful elsewhere)

Explicit False arguments preserve 0.2.0b6 behavior. Backward compatible.

Benchmarks (LoCoMo, 9/9 valid conversations)

Clean comparison on 1782 QA, identical Gemini Flash judge, identical conversations (conv-50 excluded — pending Gemini RPD reset on 2026-04-27, will be added to README in next patch):

Metric 0.2.0b1 baseline (no P1) 0.2.0 (all P1) Δ
Strict correct 66.6% (1186/1782) 67.7% (1207/1782) +1.1pp
Combined (correct+partial) 80.4% 81.0% +0.6pp

By category:

Category 0.2.0b1 0.2.0 Δ
cat_1 single-hop 32.4% 32.8% +0.4pp
cat_2 multi-hop temporal 67.1% 70.6% +3.5pp
cat_3 open-domain inference 29.2% 39.3% +10.1pp ✅✅
cat_4 multi-hop relations 69.2% 69.6% +0.4pp
cat_5 adversarial 90.8% 90.2% -0.6pp (judge noise)

Mechanism: inference_retry reduced cat_3 not-in-memory from 42 to 27, and 9 of those recovered answers became correct. This is the dominant lift.

Full 10/10 numbers will land in 0.2.0.x after conv-50 rerun on 2026-04-27.

What's in 0.2.0 (cumulative since 0.1.x)

  • BM25 retriever from Qdrant payloads (BM25Retriever.from_qdrant(...)) — exact-token recall over the same canonical chunks as vector search
  • Metadata/boilerplate stripping (mnemostack.utils.strip_metadata_blocks, is_heartbeat_poll)
  • Unbounded BM25 corpus by default (no 40K cap)
  • Category-aware prompts + multi-hop protection
  • list_extract_mode (opt-in)
  • specificity resolver (default on, with placeholder detector)
  • inference retry with query decomposition (default on, with cat_5 adversarial guard)
  • prompt_template explicit override now wins over category prompt
  • inference_retry recaller wired in MCP server entrypoints

Install

pip install -U mnemostack

or

pip install mnemostack==0.2.0

Links