chunkshop 0.8.3
The local files source learns incremental ingest: point it at a directory
and re-runs reprocess only new and changed files, pruning the chunks of deleted
ones, instead of re-embedding the whole corpus every run. Works for prose and
local source code alike (same source, content-agnostic cursor). Python-only this
release; the Rust crate is a lockstep version bump with no functional change.
Added
-
Incremental
filessource.FilesSourcenow implements the
IncrementalSourceandPrunableSourceprotocols (joinings3/http/
pg_table). An opt-insource.incrementalblock letschunkshop ingest
itself skip-and-prune via a JSON cursor sidecar — no external consumer loop:source: type: files glob: ./corpus/**/*.md id_from: path # path or sha1 — not stem — with incremental incremental: cursor_path: ./.chunkshop/files-cursor.json detect: hash # sha256 of bytes (survives git checkout); or `mtime`
- Change detection.
detect: hash(default) compares a sha256 of each
file's bytes — reliable acrossgit clone/checkout.detect: mtime
skips unchanged files by(mtime, size)without reading them (faster, but
unreliable on git work-trees where checkout rewrites mtimes). - Deletions. Files removed from disk have their chunks pruned, scoped to
the cell'ssource_tag(PrunableSource.iter_deleted_since). - Crash-safe. The cursor is written atomically (temp file + rename) and
only after a fully successful run; a crash leaves the prior cursor intact
and the next run re-upserts idempotently. Adoc_limit-truncated run does
not advance the cursor. - Stdlib only — no new runtime dependency. Library API + worked consumer
loop indocs/cookbook/incremental-sources.md; CLI setup, a full pattern
write-up, and a no-database quickstart indocs/incremental.md(Pattern G)
anddocs/samples/incremental-files/.
- Change detection.
Notes
- The incremental feature is Python-only this release; Rust parity is a
separate follow-up.chunkshop-rsis version-bumped to 0.8.3 for a lockstep
release only. - Remote sources (
s3/http/pg_table) already had incremental sync, and
thegithubconnector already declares cursor sync — unchanged here.