You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PDF page rendering API (#583): New render_pdf_page function and PdfPageIterator for rendering individual PDF pages as PNG images. Available across all 11 language bindings with idiomatic patterns (Python context manager, Go Close(), Java AutoCloseable, C# IDisposable, Elixir Stream, etc.). Default 150 DPI, configurable per call.
Fixed
Table recognition coordinate mismatch on scanned PDFs (#582): Layout detection bboxes (640x640 model space) are now scaled to OCR render resolution before TATR table recognition. Previously, coordinate space mismatch caused zero tables to be found.
OCR elements report page_number: 1 for all pages (#582): Tesseract resets page numbers per single-page render. Page numbers are now correctly stamped after OCR in the batch loop.
Rust E2E tests missing PDF feature: Added pdf feature to the e2e-generator Rust template, fixing 41 UnsupportedFormat("application/pdf") failures.
HWP styled extraction empty on ARM: Added skip_on_platform support to Python and Java e2e generators, skipping the hwp_styled fixture on aarch64-unknown-linux-gnu.
WASM CI build failure: Made kreuzberg-node prepare script resilient to missing native addon, preventing ENOENT: dist/cli.js during pnpm workspace install.
Go C header stale at 4.5.0: Synced header and DefaultVersion constant to match current version.