v0.3.59 | Community contributions — Type 4 PostScript calculator functions, optional-content (OCG/OCMD) render + extraction filtering, document-order ToUnicode parsing, per-variant standard-font width tables, subset-font cache isolation, and inline-image NUL-whitespace handling
Added
- Type 4 (PostScript calculator) function evaluator — a complete, standalone evaluator for PDF Type 4 functions (ISO 32000-1:2008 §7.10.5). All Table 42 operators are implemented with spec-faithful semantics: the trigonometric operators (
sin/cos/atan) operate in degrees withatanmapped to[0, 360),round/truncate/floor/ceilingtie behaviour, strict int-vs-real typing, andi64-overflow handling foridiv/mod/mul/add/sub/bitshift. A dedicatedError::Type4Runtimedistinguishes stack underflow, typecheck, sqrt-of-negative, and divide-by-zero from invalid input. Note: this lands as a tested capability not yet wired into the Separation/DeviceN tint-transform colour path — that integration is a tracked follow-up, so rendering behaviour is unchanged for now. (#603) Thanks @RayVR. - Optional-content (OCG / OCMD) filtering for rendering and extraction —
render_pageand the text extractors now resolve optional-content visibility through a sharedoptional_contentresolver, honouring OCMD/Ppolicy (AnyOn/AllOn/AnyOff/AllOff) and/VEvisibility expressions (§8.11.2.2), the default configuration/OCProperties/Dwith/BaseState/ON/OFF(§8.11.4), and the hidden-content text-advance rule (§8.11.3). Marked-content (BDC /OC … EMC) on both the extraction and rendering paths is filtered consistently, fixing a prior duplication where the renderer mis-decoded UTF-16LE/PDFDocEncoding layer names. PDFs without optional content are byte-for-byte unchanged. By design,render_pagehonours the PDF's own default configuration whileextract_textfilters only caller-supplied layers (§8.11.3 NOTE 4). (#604) Thanks @RayVR.
Fixed
- ToUnicode CMaps process
bfcharandbfrangesections in document order (#619) — a ToUnicode CMap is a single combined mapping space where a later definition overrides an earlier one for the same code (ISO 32000-1:2008 §9.10.3); sections are now applied in the order they appear so the last definition wins, matching Adobe/pdf.js/MuPDF/Poppler. Thanks @haberman. - Null byte (
0x00) is treated as PDF white-space when locating the inline-imageEIoperator (#618) —NULis one of the six PDF white-space characters (ISO 32000-1:2008 §7.2, Table 1) but was previously omitted, so anEIdelimited by a null byte in inline-image (BI/ID/EI) binary data could be missed. Thanks @haberman. - Bold/italic variants of the standard Times and Helvetica fonts use correct per-variant width tables (#615) — the Bold and Italic variants of Times and the Bold variant of Helvetica were falling back to the Regular-weight width table, drifting character positions and word-break detection in documents using these common standard-14 fonts without a
/Widthsarray (ISO 32000-1:2008 §9.6.2.2). Per-variant widths are now sourced from the Adobe Core 14 AFM metrics. Thanks @haberman. - Subset fonts with colliding BaseFont names no longer poison the cross-document font cache (#595) — two PDFs that reuse the same subset BaseFont name (a six-uppercase-letter
+tag such asAAAAAA+TestFont, ISO 32000-1:2008 §9.6.4) but embed different document-specific ToUnicode CMaps could be served the first document's cachedFontInfo, decoding the second document's text to the wrong characters. Subset fonts — whose glyph subset and ToUnicode are inherently document-specific — are now excluded from the cross-document global font cache (the per-document caches are unaffected), so each document decodes with its own mapping. Thanks @RayVR for the root-cause analysis and the fix.
Dependencies
- CI actions:
taiki-e/install-action2.79.12 → 2.81.0 (#580).
Installation
Rust (crates.io)
cargo add pdf_oxidePython (PyPI)
pip install pdf_oxideJavaScript/WASM (npm)
npm install pdf-oxide-wasmCLI (Homebrew)
brew install yfedoseev/tap/pdf-oxideCLI (Scoop — Windows)
scoop bucket add pdf-oxide https://github.com/yfedoseev/scoop-pdf-oxide
scoop install pdf-oxideCLI (Shell installer)
curl -fsSL https://raw.githubusercontent.com/yfedoseev/pdf_oxide/main/install.sh | shCLI (cargo-binstall)
cargo binstall pdf_oxide_cliMCP Server (for AI assistants)
cargo install pdf_oxide_mcpPre-built Binaries
Download archives for Linux, macOS, and Windows from the assets below. Each archive includes both pdf-oxide (CLI) and pdf-oxide-mcp (MCP server).
Platform Support
| Platform | Architecture | Archive |
|---|---|---|
| Linux | x86_64 (glibc) | pdf_oxide-linux-x86_64-*.tar.gz |
| Linux | x86_64 (musl) | pdf_oxide-linux-x86_64-musl-*.tar.gz |
| Linux | ARM64 | pdf_oxide-linux-aarch64-*.tar.gz |
| macOS | x86_64 (Intel) | pdf_oxide-macos-x86_64-*.tar.gz |
| macOS | ARM64 (Apple Silicon) | pdf_oxide-macos-aarch64-*.tar.gz |
| Windows | x86_64 | pdf_oxide-windows-x86_64-*.zip |
Changelog
See CHANGELOG.md for full details.