v0.2.1
The robustness + bugfix patch release. Two real bugs fixed (one BM25
edge case, one cross-binding serde-compat break), one new Python helper,
and ~30 new tests pinning load-bearing contracts across the codebase.
Fixed
- BM25: silent wildcard fallback on no-signal queries. Queries whose
every term was filtered out (stopwords only, or all-out-of-vocab)
silently fell back to a match-all wildcard, returning the corpus's
top-BM25 chunks as if the query had matched something. Now returns
an empty result set with a clear signal. ContextReport.removedand.economicsmissing#[serde(default)].
A binding payload from an older RedHop binary missing these fields
would error on deserialize — a silent cross-version compatibility break
for Python/Node callers shuttlingContextReportacross the FFI as
JSON. Both target types already deriveDefault; the fix is a no-op
for fresh payloads and gracefully fills in zeros for old ones.
Added
-
redhop.context_with_timeout(Python). ThinThreadPoolExecutor
watchdog aroundDocument.context()for agent integrations that need
to bail on slow queries:try: ctx = redhop.context_with_timeout(doc, q, timeout_ms=5000) except TimeoutError: ...
Forwards
budget/neighbors/include_heading. Scope is
deliberately Python-only — true Rust-side cancellation needs hooks in
Tantivy/ONNX that don't exist yet, and the docstring +TimeoutError
message document the limitation. -
docs/DEFAULT_PROVENANCE.md— every tuned default in
ContextConfig/DocumentConfiglinked back to the finding that
justifies it (so callers can audit which numbers are calibrated vs
arbitrary).
Internal — robustness tests
Seven new test passes (~30 tests) pinning load-bearing contracts that
were previously informal:
- Determinism — same input → same output, Rust + cross-binding parity.
- Internal invariants — 7+ consistency invariants across the strategy
matrix (selected ⊆ input,removed.totalmatches drop count, etc.). - Concurrency —
Send + Syncaudit + 1024-call parallel stress. - Adversarial loaders — 9 tests covering corrupt PDFs, symlink loops,
deep recursion, malformed DOCX/PPTX/XLSX. - Auto-gate boundary — pins the inclusive
<=semantics at
1499/1500/1501 input tokens + the custom-gate path. - Serde round-trip — every cross-FFI type (
Chunk,Score,
ContextReport, ...) survives JSON round-trip; forward-compat
exercised via a minimal pre-0.1.3 payload. - Strategy semantics — 7 differential tests pinning the contrasts
between all 5ContextStrategyvariants on a shared corpus
(catches accidental strategy convergence). - Persisted cache — incremental cache hit/miss contract for
read_folder_with(persist=true): per-file(mtime, size)skip,
no-op reload doesn't rewrite, fingerprint invalidation on config
change, deleted-file cleanup.
No public API changes. Python and Node callers are unaffected aside
from the new context_with_timeout helper.