Skip to content

v0.3.59 | Community contributions — Type 4 PostScript calculator functions, optional-content (OCG/OCMD) render + extraction filtering, document-order ToUnicode parsing, per-variant standard-font width tables, subset-font cache isolation, and inline-image NUL-whitespace handling

Choose a tag to compare

@github-actions github-actions released this 01 Jun 02:48
· 96 commits to main since this release
e8b5709

Added

  • Type 4 (PostScript calculator) function evaluator — a complete, standalone evaluator for PDF Type 4 functions (ISO 32000-1:2008 §7.10.5). All Table 42 operators are implemented with spec-faithful semantics: the trigonometric operators (sin/cos/atan) operate in degrees with atan mapped to [0, 360), round/truncate/floor/ceiling tie behaviour, strict int-vs-real typing, and i64-overflow handling for idiv/mod/mul/add/sub/bitshift. A dedicated Error::Type4Runtime distinguishes stack underflow, typecheck, sqrt-of-negative, and divide-by-zero from invalid input. Note: this lands as a tested capability not yet wired into the Separation/DeviceN tint-transform colour path — that integration is a tracked follow-up, so rendering behaviour is unchanged for now. (#603) Thanks @RayVR.
  • Optional-content (OCG / OCMD) filtering for rendering and extractionrender_page and the text extractors now resolve optional-content visibility through a shared optional_content resolver, honouring OCMD /P policy (AnyOn/AllOn/AnyOff/AllOff) and /VE visibility expressions (§8.11.2.2), the default configuration /OCProperties/D with /BaseState/ON/OFF (§8.11.4), and the hidden-content text-advance rule (§8.11.3). Marked-content (BDC /OC … EMC) on both the extraction and rendering paths is filtered consistently, fixing a prior duplication where the renderer mis-decoded UTF-16LE/PDFDocEncoding layer names. PDFs without optional content are byte-for-byte unchanged. By design, render_page honours the PDF's own default configuration while extract_text filters only caller-supplied layers (§8.11.3 NOTE 4). (#604) Thanks @RayVR.

Fixed

  • ToUnicode CMaps process bfchar and bfrange sections in document order (#619) — a ToUnicode CMap is a single combined mapping space where a later definition overrides an earlier one for the same code (ISO 32000-1:2008 §9.10.3); sections are now applied in the order they appear so the last definition wins, matching Adobe/pdf.js/MuPDF/Poppler. Thanks @haberman.
  • Null byte (0x00) is treated as PDF white-space when locating the inline-image EI operator (#618)NUL is one of the six PDF white-space characters (ISO 32000-1:2008 §7.2, Table 1) but was previously omitted, so an EI delimited by a null byte in inline-image (BI/ID/EI) binary data could be missed. Thanks @haberman.
  • Bold/italic variants of the standard Times and Helvetica fonts use correct per-variant width tables (#615) — the Bold and Italic variants of Times and the Bold variant of Helvetica were falling back to the Regular-weight width table, drifting character positions and word-break detection in documents using these common standard-14 fonts without a /Widths array (ISO 32000-1:2008 §9.6.2.2). Per-variant widths are now sourced from the Adobe Core 14 AFM metrics. Thanks @haberman.
  • Subset fonts with colliding BaseFont names no longer poison the cross-document font cache (#595) — two PDFs that reuse the same subset BaseFont name (a six-uppercase-letter + tag such as AAAAAA+TestFont, ISO 32000-1:2008 §9.6.4) but embed different document-specific ToUnicode CMaps could be served the first document's cached FontInfo, decoding the second document's text to the wrong characters. Subset fonts — whose glyph subset and ToUnicode are inherently document-specific — are now excluded from the cross-document global font cache (the per-document caches are unaffected), so each document decodes with its own mapping. Thanks @RayVR for the root-cause analysis and the fix.

Dependencies

  • CI actions: taiki-e/install-action 2.79.12 → 2.81.0 (#580).

Installation

Rust (crates.io)

cargo add pdf_oxide

Python (PyPI)

pip install pdf_oxide

JavaScript/WASM (npm)

npm install pdf-oxide-wasm

CLI (Homebrew)

brew install yfedoseev/tap/pdf-oxide

CLI (Scoop — Windows)

scoop bucket add pdf-oxide https://github.com/yfedoseev/scoop-pdf-oxide
scoop install pdf-oxide

CLI (Shell installer)

curl -fsSL https://raw.githubusercontent.com/yfedoseev/pdf_oxide/main/install.sh | sh

CLI (cargo-binstall)

cargo binstall pdf_oxide_cli

MCP Server (for AI assistants)

cargo install pdf_oxide_mcp

Pre-built Binaries
Download archives for Linux, macOS, and Windows from the assets below. Each archive includes both pdf-oxide (CLI) and pdf-oxide-mcp (MCP server).

Platform Support

Platform Architecture Archive
Linux x86_64 (glibc) pdf_oxide-linux-x86_64-*.tar.gz
Linux x86_64 (musl) pdf_oxide-linux-x86_64-musl-*.tar.gz
Linux ARM64 pdf_oxide-linux-aarch64-*.tar.gz
macOS x86_64 (Intel) pdf_oxide-macos-x86_64-*.tar.gz
macOS ARM64 (Apple Silicon) pdf_oxide-macos-aarch64-*.tar.gz
Windows x86_64 pdf_oxide-windows-x86_64-*.zip

Changelog

See CHANGELOG.md for full details.