Skip to content

chunkshop 0.4.2 — main is the v4 truth (main's 5 Rust PRs re-applied)

Choose a tag to compare

@TheYonk TheYonk released this 16 May 00:03
· 387 commits to main since this release

First release cut from main. With PR #6, the v0.4.x modular-backend line was merged back onto main — the default branch is now the source of truth (no more "released code lives on experimental/"). This release brings main's five parallel Rust feature PRs onto the modular-backend codebase.

Python users: 0.4.2 is functionally identical to 0.4.1 — no Python API or schema changes. The deltas below are Rust features + repo/CI health. If you pin the Python package, 0.4.1 and 0.4.2 behave the same.

What's new since 0.4.1

Rust library features (re-applied from main PRs #1#5)

  • chunker-only Cargo feature gate — depend on chunkshop-rs for just the chunker structs without the embedder/source/sink/ML stack. default = ["full"] keeps full backward compatibility. The modular sink/backend layer sits under the sink feature; embedder splits into embedder-core (BYO) and embedder-hub (hf-hub-backed).
  • HierarchyChunker custom heading regex — new heading_pattern config field overrides the default markdown heading detector.
  • embedder-hub feature splithf-hub is opt-in separately from the core embedder.
  • Custom BoundaryEmbedder injection into SemanticChunker — supply a boundary embedder without the fastembed/hf-hub path.
  • fastembed pinned >= 5.13.1 — aligns ort on =2.0.0-rc.12, preventing dep-resolution regressions under restrictive lockfiles.

Repo / CI health

  • CI now exercises all four backends. The workflow predated the modular backends (Postgres-only, no backend extras, never run against v4). It now brings up Postgres + MariaDB + ClickHouse from the canonical compose file and installs all-backends, so the full cross-backend matrix runs in CI.
  • Scenario fixtures migrated to the v4 target schema — the 18 tests/sub + tests/use-cases configs were on the legacy 0.3.x schema: shape; migrated to type: + database:.

Install

# Source (modular backends; sample corpora + dev tooling):
git clone https://github.com/yonk-labs/chunkshop && cd chunkshop/python
uv sync --extra dev --extra all-backends

# Once this release's PyPI/crates publish is approved:
pip install 'chunkshop[all-backends]==0.4.2'
cargo install chunkshop-rs --version 0.4.2

Compare