v0.3.0
The workflow + measurement release. Ships a new public-API surface
that closes the templated-workload retention gap end-to-end, in all three
bindings (Rust, Python, Node): analyze_query_set, the QueryRewrite
trait with two built-in implementations (Stripper and Vocabulary),
Document::context_with_rewrites(...) to compose them with an audit
trail, Vocabulary::enrich(...) as the chunk-side mirror, and evaluate
for deterministic A/B with no LLM judge. On the CUAD framework comparison
the full detect → compile → context_with_rewrites → A/B workflow takes
≥0.8 retention from 81.3% → 90.7% — a 9.4-point lift over raw BM25,
beating LlamaIndex's 86% by 4 points, at native BM25 latency (~2.5ms/query)
on default lexical retrieval. Worked example, hand-curated CUAD clause-name
dictionary, and a 6-arm probe contrasting the workflow vs hybrid+cross-encoder
live in docs/findings/CUAD_CLAUSE_EXPANSION.md and
docs/findings/CUAD_HYBRID_RERANK.md.
Vocabulary.enrich(...) ships with bidirectional measured evidence on the
regime rule it follows. Positive side: docs/findings/SPIDER_ENRICH.md
measured +0.19 mean column recall on Spider-shape schema retrieval (curated
workload synonyms; n=30, candidate_k=10). Negative side:
docs/findings/CUAD_ENRICH_DEFINITIONS_NULL.md measured −2.0 pts on
CUAD prose chunks. The two findings together complete the four-corner
rule with measured evidence on all four corners: workload-pervasive
signal manipulation fails on either side of the pipeline; only
workload-curated semantics work. See docs/findings/VOCABULARY_ENRICH.md
for the regime rule, use-case ranking, and failure modes.
Breaking on the manual-chunks path (Python + Node): the typed
redhop.Chunk(text, *, source=None, id=None, metadata=None, ...) constructor
becomes the only accepted input shape for Document.from_chunks and the
low-level build_context / filter_context / analyze_context /
context_economics entry points. Bare strings and dicts both raise
ValueError with a migration hint pointing at the new constructor. The
trade-off is intentional: the dict path didn't expose chunk metadata at all,
so manually-constructed chunks couldn't carry page / heading / line
into citations — a real functional gap, not just ergonomics. The typed
Chunk closes that gap and surfaces source (provenance) and id
(identity) as the two distinct concepts they already are in the Rust core
(see Breaking below for the migration).
Added
Templated-workload helpers (Rust + Python + Node)
analyze_query_set(queries) → QuerySetReport— diagnostic that takes
a representative sample of your queries and reports whether they share
enough boilerplate to be templated, which terms are doing the dilution,
and a coarseestimated_dilution_costband. Cross-workload probe
(docs/findings/QUERY_SET_ANALYZER.md): CUAD fires (share 0.66, cost
high); HotpotQA + MuSiQue both stay quiet (0.00 and 0.12, both
is_templated=False). Conservative by design — false positives push
users toward a workaround that won't help, which is worse than staying
quiet.QueryRewritetrait +Stripper+Vocabulary— compiled,
observable, token-level-correct replacement for the function-form
rewrites originally drafted for this release. EachQueryRewrite
implementation returns aRewriteResult { query, record }so every
stage's{stage, from, to, matched, added, removed}lands on
ContextReport::query_rewritesautomatically when called through
the chain.Stripper::new(boilerplate)— compiled boilerplate-removal
rewrite. Matches at token granularity through the analyzer (with a
surface-form fallback for tokens like "of"/"the" that stem to
empty), so a single-token strip cannot accidentally erase a
substring inside a longer word (an"of"strip does not erase
the"of"inside"office"). Replaces the substring-based
drop_template_termsfunction originally drafted for 0.3.0.Vocabulary::new(entries)/Vocabulary::bidirectional(entries)
— compiled workload-curated equivalence classes. Tokenizes keys,
synonyms, and the query through the same analyzer the BM25 index
uses, so a vocabulary key"ip"cannot fire on the"ip"inside
"recipient". Bidirectional mode treats every class member as a
trigger (PTO ↔ "paid time off" ↔ "vacation"). The CUAD probe
(docs/findings/CUAD_CLAUSE_EXPANSION.md) shows +3.0 points on top
of the template-stripped baseline (the new token-level matching
re-validates at 90.7% vs the substring-based predecessor's 90.3%
— same workload, +0.4 from analyzer alignment).Document::context_with_rewrites(query, &[&stripper, &vocab])
— runs the chain left-to-right through retrieval. Each stage sees
the previous stage's output; the per-stageRewriteRecords land on
ctx.report.query_rewritesautomatically.- Future-extensible. Both
StripperandVocabularyare
QueryRewriteimplementations; user code can ship its own (e.g. a
workload-specific normalizer) and chain it alongside the built-ins.
The trait is exported on the public API surface. Vocabulary::enrich(chunk) → RewriteResult— chunk-side
mirror ofapplyshipped as a primitive on mechanism reasoning
with asymmetric measured evidence. The mechanism (a chunk-side
doc2query variant) and the regime hypothesis
(expected value ∝ shortness × opacity × dictionary-exists) are
well-grounded; the positive prediction (short opaque coded
units — schema columns, API symbols, error codes) is not yet
measured by RedHop. Spider/BIRD as the schema-regime probe is
queued, not run. The negative prediction (long prose chunks- workload-pervasive vocabulary will dilute, not help) has been
measured directly:
CUAD_ENRICH_DEFINITIONS_NULL
regressed retention −2.0pt vs the 90.7% workflow baseline
(~24-point loss on the 17/50 affected contracts). This completes
the four-corner rule from CUAD_PRF_NULL + SUB_IDF_AUTO_DROP_NULL
onto the chunk side: workload-pervasive signal manipulation fails
on either side of the pipeline. Users adoptingenrichshould
A/B on their own corpus withredhop::evaluate(...)—
the regime rule is a hypothesis, not a guarantee. Audit trail
(per-chunkRewriteRecordwithstage: "enrich") returned to
the caller so the A/B is auditable. Synthetic demo (not a
benchmark):crates/examples/examples/enrich_code_search.rs.
Full asymmetric-evidence framing + use case predictions + failure
modes indocs/findings/VOCABULARY_ENRICH.md.
- workload-pervasive vocabulary will dilute, not help) has been
evaluate(query, ctx, gold) → EvalReport— in-process retrieval-eval
scorer, no LLM judge. Self-eval (mean_grounding,evidence_density,
retained_evidence_ratio,second_hop_rescues,low_confidence,
estimated_waste_tokens) is always populated; gold-relative metrics
(context_recall,context_precision,answer_token_recall) are
optionally unlocked by passinggold_chunksand/orgold_answer.
Compositeoverallblends whichever fields are available. Designed as
a refraction of the same primitives the runtime uses to make its
Decision Report — a lowoverallandreport.low_confidence_retrieval
are the same signal viewed twice, not independent measurements, so eval
and runtime can never disagree. Rationale, contract details, and the
10 / 11 / 9 Rust / Python / Node tests pin in
docs/findings/EVALUATE_API.md.
Findings (the evidence layer)
New findings document what was tried, what worked, and what was
falsified across this release:
- Confirmed —
QUERY_SET_ANALYZER,CUAD_RECALL_GAP,
CUAD_CLAUSE_EXPANSION,MULTILINGUAL_ANALYZER,EVALUATE_API,
CUAD_HYBRID_RERANK(substitute-not-stack rule),VOCABULARY_ENRICH
(confirmed on both sides of the regime rule),SPIDER_ENRICH
(the positive-side validation forVocabulary.enrich(...): curated
chunk-side enrichment on a Spider-shape sample lifted mean column
recall +0.19 from 0.77 → 0.97, ≥0.8 retention 63% → 93%). - Null result / falsified —
CUAD_PRF_NULL(unweighted PRF on
boilerplate-heavy corpora),CUAD_CHUNK_FRAGMENTATION_NULL(chunker
isn't the CUAD lever),SUB_IDF_AUTO_DROP_NULL(corpus-only IDF
manipulation fails in both directions),
CUAD_ENRICH_DEFINITIONS_NULL(chunk-side enrich on per-contract
Definitions regressed −2.0 pts vs the 90.7% workflow baseline;
~24-point loss on the 17/50 contracts where Definitions were
extractable — chunk-side parallel to CUAD_PRF_NULL's failure mode,
measured directly). - The four-corner rule is now measured on all four corners.
Workload-pervasive signal manipulation fails on either side of the
pipeline; only workload-curated semantics work:
query-side curated wins (CUAD_CLAUSE_EXPANSION+3.0pt) ·
query-side auto fails (CUAD_PRF_NULL−3.7pt) ·
chunk-side curated wins (SPIDER_ENRICH+0.19 mean recall) ·
chunk-side auto fails (CUAD_ENRICH_DEFINITIONS_NULL−2.0pt).
Examples
Eleven new harnesses under crates/examples/examples/:
cuad_query_preprocessing, cuad_chunk_strategy_sweep,
cuad_chunk_fragmentation, cuad_clause_expansion, cuad_hybrid_rerank,
cuad_perf, cuad_prf, cuad_rust_vs_python_path,
multilingual_query_set_probe, query_set_analyzer_probe,
sub_idf_reweighting_probe.
Documentation
- New workflow-lift chart
.github/workflow_lift.svgembedded in the
root README + binding READMEs — surfaces the 81 → 88 → 90.7% story
visually. - Root README,
python/README.md,nodejs/README.md"Templated
workloads" section rewritten to detect → strip → (optional) vocabulary →
A/B withStripper/Vocabulary/context_with_rewritestabled. docs/CHOOSING_A_CONFIG.mdstep 3 leads with the new "two paths up
the same hill" decision table contrastingretrieval="hybrid"(the
one-knob alternative) vs BM25 + the helpers (best-quality).
Chat-RAG and chronology preservation
ContextConfig::preserve_order: bool— new field (defaultfalse,
no behavior change for existing callers). When set, the assembled
context emits selected chunks in source-document order instead of
the strategy's relevance-emitted order. The selection step is
untouched; only the final ordering changes. Designed for chat
histories, narrative transcripts, and sequential logs where
chronology / causality matters and a relevance-ranked emission would
destroy the meaning ("after the refund came in" reads strangely if
presented before "ordered the laptop").- The sort key is
(source, chunk_position)wherechunk_position
prefers achunk_indexmetadata field (stamped automatically by
Document::from_chunks_withbased on input order, so caller-supplied
chunks viafrom_chunksget a stable chronology key for free) and
falls back to the chunker's existingsentence_range.startfor
text-loaded paths. - Exposed across all three bindings:
- Rust —
ContextConfig { preserve_order: true, .. }; flows
throughLoadOptions::preserve_orderfor thetext()/
chunks()paths. - Python —
redhop.Document.from_text(text, preserve_order=True)
andfrom_chunks/from_file/from_bytes; also exposed on the
low-levelredhop.build_context(query, chunks, preserve_order=True)
andredhop.filter_context(...). - Node —
Document.fromText(text, { preserveOrder: true })and
siblings; also apreserveOrder?: booleanfield on the
ContextOptionsshape consumed bybuildContextandfilterContext.
- Rust —
- Worked example:
crates/examples/examples/chat_rag.rs
shows a 12-turn chat where, on the query"shipping refund label return", the strategy picks four turns by relevance — preserve_order
off emits them in[turn-08, turn-03, turn-05, turn-06](relevance);
preserve_order on emits them in[turn-03, turn-05, turn-06, turn-08]
(chronological), so the LLM reads what was said in the order it was
said. 3 new Rust unit tests pin the contract
(preserve_order_off_emits_relevance_order,
preserve_order_on_emits_document_order,
preserve_order_groups_by_source).
Changed
- Package registry URLs now point at
https://www.redhopai.comas
the canonicalHomepage, with the GitHub repo kept asRepository
(PyPI) /repository(npm) /repository(crates.io). Before this,
PyPI displayed two identical "Homepage" and "Repository" links both
pointing at GitHub; npm displayed neither. PyPI also gains
Documentation,Changelog,Issues, andEvidence layerlink
entries; npm gainshomepage,repository,bugs, and an
expandedkeywordsarray (reasoning,embeddingsadded). - Findings master table refreshed with new rows on
/docs/benchmarks/(website) anddocs/findings/README.md(repo).
Framework comparison row updated: the CUAD headline is now
90.7%viaStripper+Vocabulary(was88%via strip alone),
beating LlamaIndex by 4 points.VOCABULARY_ENRICHrow promoted from
asymmetric measured evidence to confirmed on both sides of the
regime rule after theSPIDER_ENRICHprobe landed. RewriteResult.queryfield renamed toRewriteResult.text
(Rust). The same struct is the output of both query-side
QueryRewrite::applyand chunk-sideVocabulary::enrich. The old
queryfield name read awkwardly on the enrich path
(vocab.enrich(chunk_text).querydescribes a chunk, not a query);
textis neutral and accurate for both directions. The audit-record
stagefield is the signal of which side of the pipeline emitted
the result ("strip"/"vocabulary"/"enrich"). Pre-publish
rename — no callers exist outside the repo yet, but flagging for
anyone building from source on a pre-release commit.- User-facing docs (
README.md,python/README.md,nodejs/README.md,
website) elevate the rewrite chain + audit trail +evaluateto a
dedicated "Show your work" section. The 0.3.0 differentiator
versus other RAG frameworks is every transform is observable on
the same Decision Report and every change is A/B-scoreable without
an LLM judge; the previous docs surfaced the 3-call surface plus
citations but understated the rewrite/audit/evaluate combo. The new
section appears on every binding's README and as both a homepage
card and a section on the website.
Fixed
Document.from_folderwas constructingLoadOptionswithout
preserve_orderunder--features files,semantic. Caught
locally while writingexamples/python/07_retrieval_tiers.py(a
full-feature build). The bug was hidden in the lean (no-features)
default build because the missing-field code path was behind
#[cfg(feature = "files")]. The default published wheel ships with
features = ["files", "semantic"], so end users would have hit it.
Fixed; all 4 feature configurations (--no-default-features,
--features files,--features semantic,--features files,semantic)
now compile cleanly.
Breaking — redhop.Chunk is now the only accepted manual-chunks shape
Document.from_chunks+build_context+filter_context+
analyze_context+context_economicsnow require typed
redhop.Chunk(...)instances. Bare strings and plain dicts both
raiseValueErrorwith a migration hint:chunk 0: expected redhop.Chunk(text, source=..., ...); got str. As of 0.3.0, strings and dicts are no longer accepted — wrap your input as `redhop.Chunk(text, source='myfile.txt')`.- What the new constructor looks like:
Node mirrors with
redhop.Chunk( text, source=None, # provenance: file path / URL / logical handle id=None, # identity: stable id, defaults to c0, c1, … metadata=None, # open dict; citations read page/heading/line token_count=None, # auto from whitespace if omitted embedding=None, # for pre-computed dense vectors )
new redhop.Chunk(text, { source, id, metadata, tokenCount, embedding }). - Why this is now a breaking change instead of a backward-compat additive:
the dict path didn't acceptmetadataat all, so manually-supplied
chunks couldn't carry page/heading/line into citations. The two-ways-
to-do-it cleanup is incidental; closing the metadata gap is the real
reason. Strict typing also surfacessource(provenance) andid
(identity) as distinct concepts the way the Rust core has always
treated them — the dict path conflated them in practice. - Migration:
Before After from_chunks(["a", "b"])from_chunks([redhop.Chunk("a"), redhop.Chunk("b")])from_chunks([{"text": "a", "source": "x.md"}])from_chunks([redhop.Chunk("a", source="x.md")])from_chunks([{"text": "a", "id": "x", "source": "y.md"}])from_chunks([redhop.Chunk("a", id="x", source="y.md")])buildContext(q, [{ id, text }, ...])(Node)buildContext(q, [new Chunk(text, { id }), ...]) - What's new on the typed-chunks path: citations now pick up
page,
heading, andlinefrommetadata={...}on chunks the user built
themselves. Before 0.3.0 those fields were alwaysNoneon the
manual-chunks path — only the file loaders populated them. - Rust callers unaffected. The
redhop::core::Chunkstruct hasn't
changed shape.Document::from_chunks(Vec<Chunk>)still takes
Vec<redhop::core::Chunk>exactly as it did. A new public facade
redhop::chunks_typed(Vec<Chunk>, &LoadOptions)was added so the
bindings can route pre-formed chunks through the indexing pipeline
without going through the chunker (preserving 1-to-1 chunk identity).
Breaking (Node only — Python and Rust callers unaffected)
- Node
BuiltContextis now a#[napi]class (was a plain
#[napi(object)]). The four exposed properties (text,chunks,
citations,report) remain readable as JS properties via getters, so
existing user code that doesctx.text,ctx.chunks, etc., continues
to work unchanged. The TypeScript type changes from
interface BuiltContext { … }toclass BuiltContext { … }. The
reason for the change is thatredhop.evaluate(query, ctx, …)needs
access to the underlying Rust struct (chunk IDs, the full report shape)
which a plain object can't carry.- What breaks: if you were
JSON.stringify(ctx), class getters
aren't enumerable by default and the output will be{}instead of
the four-field object. Project to a plain object explicitly:
JSON.stringify({ text: ctx.text, chunks: ctx.chunks, citations: ctx.citations, report: ctx.report }).
No other behavior changes.
- What breaks: if you were