Skip to content

v0.3.0-rc.84

Choose a tag to compare

@Goldziher Goldziher released this 20 Jun 19:24
· 31 commits to main since this release

Added

  • CLI: batch-scrape, batch-crawl, download, citations, and version subcommands. Wire the existing core entry points into the CLI so the CLI, MCP server, and core surfaces are 1:1. download mirrors the MCP download tool (scrape with document download enabled, emitting the downloaded document's metadata); citations converts markdown links to numbered citations; version prints the crate version as JSON. (crates/kreuzcrawl-cli)
  • MCP: batch_crawl and generate_citations tools. batch_crawl crawls multiple seed URLs concurrently (mirroring batch_scrape); generate_citations converts markdown links into numbered citations. (crates/kreuzcrawl/src/mcp)

Changed

  • chore(precommit): drop the conflicting kotlin-android ktlint hook; ktfmt is the sole formatter. ktlint's always-format mode fought ktfmt (blank-line-after-brace) and rewrote alef's /// doc comments to // /, breaking alef verify. detekt remains for static analysis. Also excluded the vendored Gradle wrapper from shellcheck. (.pre-commit-config.yaml)

Removed

  • MCP: dropped the unimplemented screenshot, research, and crawl_status tools. They had no backing core capability and only ever returned "not yet implemented", advertising tools that always failed. Every remaining MCP tool is now 1:1 with a CLI subcommand and a real core function. (crates/kreuzcrawl/src/mcp)

Fixed

  • Release: CLI binaries are now reliably attached to a published release. The GitHub release is created as a draft and was only un-drafted by release-finalize, which is skipped whenever any unrelated publish leg fails — stranding the whole release (CLI binaries included) as an invisible draft. upload-cli-release now publishes the release directly via publish-github-release@v1 with draft: false (un-drafting the existing draft) and runs even on partial CLI-build failures, mirroring the sibling repos. (.github/workflows/publish.yaml)
  • Memory-bounded streaming crawl. crawl_stream / batch_crawl_stream now move each page into its CrawlEvent::Page and drop it instead of accumulating every page, bounding peak memory on large crawls (≈2.5 GB → ≈20 MB working set). crawl()'s batch result is unchanged; Complete.pages_crawled reports the exact post-filter count, and a terminal Complete is emitted on the seed-error path. (crates/kreuzcrawl/src/engine)
  • Dart package ships native libraries. The pub.dev package previously bundled no native libs, so the flutter_rust_bridge loader fell back to a relative framework path that macOS hardened-runtime rejects. The publish pipeline now builds the native on a 5-platform matrix and stages each into lib/src/native/<rid>/ before publishing. (.github/workflows/publish-pubdev.yaml)
  • Swift package links Apple system frameworks. The published root Package.swift now links Security, CoreFoundation, and SystemConfiguration on the RustBridge target so the pre-built static library's SC* symbols (reqwest proxy detection) resolve for remote SwiftPM consumers. (regenerated via alef 0.25.55)

Build

  • Regenerated all bindings against alef 0.25.55 — generic Go FFI provisioning (test-app runner delegates to the binding's own download_ffi; no project-specific download in the generic generator), swift framework linking, zig stale-hash strip, and test-app [crates.e2e.env] export.

Zig

Add to your build.zig.zon:

.dependencies = .{
    .kreuzcrawl-zig = .{\n        .url = \"https://github.com/kreuzberg-dev/kreuzcrawl/releases/download/v0.3.0-rc.84/kreuzcrawl-zig-v0.3.0-rc.84.tar.gz\",\n        .hash = \"kreuzcrawl-0.3.0-rc.84-l-oqNnXKeSCF_EYtxYkhF8AiaMo2OHBXJdTmZGqYDwDl\",\n    },\n},\n```\n