Release v0.3.0 · vysakh0/redhop

The workflow + measurement release. Ships a new public-API surface
that closes the templated-workload retention gap end-to-end, in all three
bindings (Rust, Python, Node): analyze_query_set, the QueryRewrite
trait with two built-in implementations (Stripper and Vocabulary),
Document::context_with_rewrites(...) to compose them with an audit
trail, Vocabulary::enrich(...) as the chunk-side mirror, and evaluate
for deterministic A/B with no LLM judge. On the CUAD framework comparison
the full detect → compile → context_with_rewrites → A/B workflow takes
≥0.8 retention from 81.3% → 90.7% — a 9.4-point lift over raw BM25,
beating LlamaIndex's 86% by 4 points, at native BM25 latency (~2.5ms/query)
on default lexical retrieval. Worked example, hand-curated CUAD clause-name
dictionary, and a 6-arm probe contrasting the workflow vs hybrid+cross-encoder
live in docs/findings/CUAD_CLAUSE_EXPANSION.md and
docs/findings/CUAD_HYBRID_RERANK.md.

Vocabulary.enrich(...) ships with bidirectional measured evidence on the
regime rule it follows. Positive side: docs/findings/SPIDER_ENRICH.md
measured +0.19 mean column recall on Spider-shape schema retrieval (curated
workload synonyms; n=30, candidate_k=10). Negative side:
docs/findings/CUAD_ENRICH_DEFINITIONS_NULL.md measured −2.0 pts on
CUAD prose chunks. The two findings together complete the four-corner
rule with measured evidence on all four corners: workload-pervasive
signal manipulation fails on either side of the pipeline; only
workload-curated semantics work. See docs/findings/VOCABULARY_ENRICH.md
for the regime rule, use-case ranking, and failure modes.

Breaking on the manual-chunks path (Python + Node): the typed
redhop.Chunk(text, *, source=None, id=None, metadata=None, ...) constructor
becomes the only accepted input shape for Document.from_chunks and the
low-level build_context / filter_context / analyze_context /
context_economics entry points. Bare strings and dicts both raise
ValueError with a migration hint pointing at the new constructor. The
trade-off is intentional: the dict path didn't expose chunk metadata at all,
so manually-constructed chunks couldn't carry page / heading / line
into citations — a real functional gap, not just ergonomics. The typed
Chunk closes that gap and surfaces source (provenance) and id
(identity) as the two distinct concepts they already are in the Rust core
(see Breaking below for the migration).

Added

Templated-workload helpers (Rust + Python + Node)

analyze_query_set(queries) → QuerySetReport — diagnostic that takes
a representative sample of your queries and reports whether they share
enough boilerplate to be templated, which terms are doing the dilution,
and a coarse estimated_dilution_cost band. Cross-workload probe
(docs/findings/QUERY_SET_ANALYZER.md): CUAD fires (share 0.66, cost
high); HotpotQA + MuSiQue both stay quiet (0.00 and 0.12, both
is_templated=False). Conservative by design — false positives push
users toward a workaround that won't help, which is worse than staying
quiet.
QueryRewrite trait + Stripper + Vocabulary — compiled,
observable, token-level-correct replacement for the function-form
rewrites originally drafted for this release. Each QueryRewrite
implementation returns a RewriteResult { query, record } so every
stage's {stage, from, to, matched, added, removed} lands on
ContextReport::query_rewrites automatically when called through
the chain.
- Stripper::new(boilerplate) — compiled boilerplate-removal
  rewrite. Matches at token granularity through the analyzer (with a
  surface-form fallback for tokens like "of"/"the" that stem to
  empty), so a single-token strip cannot accidentally erase a
  substring inside a longer word (an "of" strip does not erase
  the "of" inside "office"). Replaces the substring-based
  drop_template_terms function originally drafted for 0.3.0.
- Vocabulary::new(entries) / Vocabulary::bidirectional(entries)
  — compiled workload-curated equivalence classes. Tokenizes keys,
  synonyms, and the query through the same analyzer the BM25 index
  uses, so a vocabulary key "ip" cannot fire on the "ip" inside
  "recipient". Bidirectional mode treats every class member as a
  trigger (PTO ↔ "paid time off" ↔ "vacation"). The CUAD probe
  (docs/findings/CUAD_CLAUSE_EXPANSION.md) shows +3.0 points on top
  of the template-stripped baseline (the new token-level matching
  re-validates at 90.7% vs the substring-based predecessor's 90.3%
  — same workload, +0.4 from analyzer alignment).
- Document::context_with_rewrites(query, &[&stripper, &vocab])
  — runs the chain left-to-right through retrieval. Each stage sees
  the previous stage's output; the per-stage RewriteRecords land on
  ctx.report.query_rewrites automatically.
- Future-extensible. Both Stripper and Vocabulary are
  QueryRewrite implementations; user code can ship its own (e.g. a
  workload-specific normalizer) and chain it alongside the built-ins.
  The trait is exported on the public API surface.
- Vocabulary::enrich(chunk) → RewriteResult — chunk-side
  mirror of apply shipped as a primitive on mechanism reasoning
  with asymmetric measured evidence. The mechanism (a chunk-side
  doc2query variant) and the regime hypothesis
  (expected value ∝ shortness × opacity × dictionary-exists) are
  well-grounded; the positive prediction (short opaque coded
  units — schema columns, API symbols, error codes) is not yet
  measured by RedHop. Spider/BIRD as the schema-regime probe is
  queued, not run. The negative prediction (long prose chunks
  - workload-pervasive vocabulary will dilute, not help) has been
    measured directly:
    CUAD_ENRICH_DEFINITIONS_NULL
    regressed retention −2.0pt vs the 90.7% workflow baseline
    (~24-point loss on the 17/50 affected contracts). This completes
    the four-corner rule from CUAD_PRF_NULL + SUB_IDF_AUTO_DROP_NULL
    onto the chunk side: workload-pervasive signal manipulation fails
    on either side of the pipeline. Users adopting enrich should
    A/B on their own corpus with redhop::evaluate(...) —
    the regime rule is a hypothesis, not a guarantee. Audit trail
    (per-chunk RewriteRecord with stage: "enrich") returned to
    the caller so the A/B is auditable. Synthetic demo (not a
    benchmark): crates/examples/examples/enrich_code_search.rs.
    Full asymmetric-evidence framing + use case predictions + failure
    modes in docs/findings/VOCABULARY_ENRICH.md.
evaluate(query, ctx, gold) → EvalReport — in-process retrieval-eval
scorer, no LLM judge. Self-eval (mean_grounding, evidence_density,
retained_evidence_ratio, second_hop_rescues, low_confidence,
estimated_waste_tokens) is always populated; gold-relative metrics
(context_recall, context_precision, answer_token_recall) are
optionally unlocked by passing gold_chunks and/or gold_answer.
Composite overall blends whichever fields are available. Designed as
a refraction of the same primitives the runtime uses to make its
Decision Report — a low overall and report.low_confidence_retrieval
are the same signal viewed twice, not independent measurements, so eval
and runtime can never disagree. Rationale, contract details, and the
10 / 11 / 9 Rust / Python / Node tests pin in
docs/findings/EVALUATE_API.md.

Findings (the evidence layer)

New findings document what was tried, what worked, and what was
falsified across this release:

Confirmed — QUERY_SET_ANALYZER, CUAD_RECALL_GAP,
CUAD_CLAUSE_EXPANSION, MULTILINGUAL_ANALYZER, EVALUATE_API,
CUAD_HYBRID_RERANK (substitute-not-stack rule), VOCABULARY_ENRICH
(confirmed on both sides of the regime rule), SPIDER_ENRICH
(the positive-side validation for Vocabulary.enrich(...): curated
chunk-side enrichment on a Spider-shape sample lifted mean column
recall +0.19 from 0.77 → 0.97, ≥0.8 retention 63% → 93%).
Null result / falsified — CUAD_PRF_NULL (unweighted PRF on
boilerplate-heavy corpora), CUAD_CHUNK_FRAGMENTATION_NULL (chunker
isn't the CUAD lever), SUB_IDF_AUTO_DROP_NULL (corpus-only IDF
manipulation fails in both directions),
CUAD_ENRICH_DEFINITIONS_NULL (chunk-side enrich on per-contract
Definitions regressed −2.0 pts vs the 90.7% workflow baseline;
~24-point loss on the 17/50 contracts where Definitions were
extractable — chunk-side parallel to CUAD_PRF_NULL's failure mode,
measured directly).
The four-corner rule is now measured on all four corners.
Workload-pervasive signal manipulation fails on either side of the
pipeline; only workload-curated semantics work:
query-side curated wins (CUAD_CLAUSE_EXPANSION +3.0pt) ·
query-side auto fails (CUAD_PRF_NULL −3.7pt) ·
chunk-side curated wins (SPIDER_ENRICH +0.19 mean recall) ·
chunk-side auto fails (CUAD_ENRICH_DEFINITIONS_NULL −2.0pt).

Examples

Eleven new harnesses under crates/examples/examples/:
cuad_query_preprocessing, cuad_chunk_strategy_sweep,
cuad_chunk_fragmentation, cuad_clause_expansion, cuad_hybrid_rerank,
cuad_perf, cuad_prf, cuad_rust_vs_python_path,
multilingual_query_set_probe, query_set_analyzer_probe,
sub_idf_reweighting_probe.

Documentation

New workflow-lift chart .github/workflow_lift.svg embedded in the
root README + binding READMEs — surfaces the 81 → 88 → 90.7% story
visually.
Root README, python/README.md, nodejs/README.md "Templated
workloads" section rewritten to detect → strip → (optional) vocabulary →
A/B with Stripper / Vocabulary / context_with_rewrites tabled.
docs/CHOOSING_A_CONFIG.md step 3 leads with the new "two paths up
the same hill" decision table contrasting retrieval="hybrid" (the
one-knob alternative) vs BM25 + the helpers (best-quality).

Chat-RAG and chronology preservation

ContextConfig::preserve_order: bool — new field (default false,
no behavior change for existing callers). When set, the assembled
context emits selected chunks in source-document order instead of
the strategy's relevance-emitted order. The selection step is
untouched; only the final ordering changes. Designed for chat
histories, narrative transcripts, and sequential logs where
chronology / causality matters and a relevance-ranked emission would
destroy the meaning ("after the refund came in" reads strangely if
presented before "ordered the laptop").
The sort key is (source, chunk_position) where chunk_position
prefers a chunk_index metadata field (stamped automatically by
Document::from_chunks_with based on input order, so caller-supplied
chunks via from_chunks get a stable chronology key for free) and
falls back to the chunker's existing sentence_range.start for
text-loaded paths.
Exposed across all three bindings:
- Rust — ContextConfig { preserve_order: true, .. }; flows
  through LoadOptions::preserve_order for the text() /
  chunks() paths.
- Python — redhop.Document.from_text(text, preserve_order=True)
  and from_chunks / from_file / from_bytes; also exposed on the
  low-level redhop.build_context(query, chunks, preserve_order=True)
  and redhop.filter_context(...).
- Node — Document.fromText(text, { preserveOrder: true }) and
  siblings; also a preserveOrder?: boolean field on the
  ContextOptions shape consumed by buildContext and filterContext.
Worked example:
crates/examples/examples/chat_rag.rs
shows a 12-turn chat where, on the query "shipping refund label return", the strategy picks four turns by relevance — preserve_order
off emits them in [turn-08, turn-03, turn-05, turn-06] (relevance);
preserve_order on emits them in [turn-03, turn-05, turn-06, turn-08]
(chronological), so the LLM reads what was said in the order it was
said. 3 new Rust unit tests pin the contract
(preserve_order_off_emits_relevance_order,
preserve_order_on_emits_document_order,
preserve_order_groups_by_source).

Changed

Package registry URLs now point at https://www.redhopai.com as
the canonical Homepage, with the GitHub repo kept as Repository
(PyPI) / repository (npm) / repository (crates.io). Before this,
PyPI displayed two identical "Homepage" and "Repository" links both
pointing at GitHub; npm displayed neither. PyPI also gains
Documentation, Changelog, Issues, and Evidence layer link
entries; npm gains homepage, repository, bugs, and an
expanded keywords array (reasoning, embeddings added).
Findings master table refreshed with new rows on
/docs/benchmarks/ (website) and docs/findings/README.md (repo).
Framework comparison row updated: the CUAD headline is now
90.7% via Stripper + Vocabulary (was 88% via strip alone),
beating LlamaIndex by 4 points. VOCABULARY_ENRICH row promoted from
asymmetric measured evidence to confirmed on both sides of the
regime rule after the SPIDER_ENRICH probe landed.
RewriteResult.query field renamed to RewriteResult.text
(Rust). The same struct is the output of both query-side
QueryRewrite::apply and chunk-side Vocabulary::enrich. The old
query field name read awkwardly on the enrich path
(vocab.enrich(chunk_text).query describes a chunk, not a query);
text is neutral and accurate for both directions. The audit-record
stage field is the signal of which side of the pipeline emitted
the result ("strip" / "vocabulary" / "enrich"). Pre-publish
rename — no callers exist outside the repo yet, but flagging for
anyone building from source on a pre-release commit.
User-facing docs (README.md, python/README.md, nodejs/README.md,
website) elevate the rewrite chain + audit trail + evaluate to a
dedicated "Show your work" section. The 0.3.0 differentiator
versus other RAG frameworks is every transform is observable on
the same Decision Report and every change is A/B-scoreable without
an LLM judge; the previous docs surfaced the 3-call surface plus
citations but understated the rewrite/audit/evaluate combo. The new
section appears on every binding's README and as both a homepage
card and a section on the website.

Fixed

Document.from_folder was constructing LoadOptions without
preserve_order under --features files,semantic. Caught
locally while writing examples/python/07_retrieval_tiers.py (a
full-feature build). The bug was hidden in the lean (no-features)
default build because the missing-field code path was behind
#[cfg(feature = "files")]. The default published wheel ships with
features = ["files", "semantic"], so end users would have hit it.
Fixed; all 4 feature configurations (--no-default-features,
--features files, --features semantic, --features files,semantic)
now compile cleanly.

Breaking — `redhop.Chunk` is now the only accepted manual-chunks shape

Document.from_chunks + build_context + filter_context +
analyze_context + context_economics now require typed
redhop.Chunk(...) instances. Bare strings and plain dicts both
raise ValueError with a migration hint:
```
chunk 0: expected redhop.Chunk(text, source=..., ...); got str. As of
0.3.0, strings and dicts are no longer accepted — wrap your input as
`redhop.Chunk(text, source='myfile.txt')`.
```

What the new constructor looks like:

redhop.Chunk(
    text,
    source=None,       # provenance: file path / URL / logical handle
    id=None,            # identity: stable id, defaults to c0, c1, …
    metadata=None,      # open dict; citations read page/heading/line
    token_count=None,   # auto from whitespace if omitted
    embedding=None,     # for pre-computed dense vectors
)

Node mirrors with new redhop.Chunk(text, { source, id, metadata, tokenCount, embedding }).

Why this is now a breaking change instead of a backward-compat additive:
the dict path didn't accept metadata at all, so manually-supplied
chunks couldn't carry page/heading/line into citations. The two-ways-
to-do-it cleanup is incidental; closing the metadata gap is the real
reason. Strict typing also surfaces source (provenance) and id
(identity) as distinct concepts the way the Rust core has always
treated them — the dict path conflated them in practice.

Migration:

Before	After
`from_chunks(["a", "b"])`	`from_chunks([redhop.Chunk("a"), redhop.Chunk("b")])`
`from_chunks([{"text": "a", "source": "x.md"}])`	`from_chunks([redhop.Chunk("a", source="x.md")])`
`from_chunks([{"text": "a", "id": "x", "source": "y.md"}])`	`from_chunks([redhop.Chunk("a", id="x", source="y.md")])`
`buildContext(q, [{ id, text }, ...])` (Node)	`buildContext(q, [new Chunk(text, { id }), ...])`

What's new on the typed-chunks path: citations now pick up page,
heading, and line from metadata={...} on chunks the user built
themselves. Before 0.3.0 those fields were always None on the
manual-chunks path — only the file loaders populated them.
Rust callers unaffected. The redhop::core::Chunk struct hasn't
changed shape. Document::from_chunks(Vec<Chunk>) still takes
Vec<redhop::core::Chunk> exactly as it did. A new public facade
redhop::chunks_typed(Vec<Chunk>, &LoadOptions) was added so the
bindings can route pre-formed chunks through the indexing pipeline
without going through the chunker (preserving 1-to-1 chunk identity).

Breaking (Node only — Python and Rust callers unaffected)

Node BuiltContext is now a #[napi] class (was a plain
#[napi(object)]). The four exposed properties (text, chunks,
citations, report) remain readable as JS properties via getters, so
existing user code that does ctx.text, ctx.chunks, etc., continues
to work unchanged. The TypeScript type changes from
interface BuiltContext { … } to class BuiltContext { … }. The
reason for the change is that redhop.evaluate(query, ctx, …) needs
access to the underlying Rust struct (chunk IDs, the full report shape)
which a plain object can't carry.
- What breaks: if you were JSON.stringify(ctx), class getters
  aren't enumerable by default and the output will be {} instead of
  the four-field object. Project to a plain object explicitly:
  JSON.stringify({ text: ctx.text, chunks: ctx.chunks, citations: ctx.citations, report: ctx.report }).
  No other behavior changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Templated-workload helpers (Rust + Python + Node)

Findings (the evidence layer)

Examples

Documentation

Chat-RAG and chronology preservation

Changed

Fixed

Breaking — `redhop.Chunk` is now the only accepted manual-chunks shape

Breaking (Node only — Python and Rust callers unaffected)

Uh oh!

v0.3.0

Added

Templated-workload helpers (Rust + Python + Node)

Findings (the evidence layer)

Examples

Documentation

Chat-RAG and chronology preservation

Changed

Fixed

Breaking — redhop.Chunk is now the only accepted manual-chunks shape

Breaking (Node only — Python and Rust callers unaffected)

Uh oh!

Breaking — `redhop.Chunk` is now the only accepted manual-chunks shape