v0.3.50 | True destructive PDF redaction, PAdES-B-T/B-LT long-term-validation signatures, a runtime cryptographic algorithm-governance policy, and split-PDF-by-bookmarks across all seven bindings, plus a signature-date correctness fix.
Added
- True destructive redaction
(#231) — the
prior "redaction" only drew a filled rectangle over content whose
bytes survived (recoverable by copy-paste /pdftotext/ a hex
editor). Redaction is now destructive: the text under each region
is physically removed from the content stream — every glyph whose
ISO 32000-1:2008 §9.4.4 text-rendering box intersects the
(edge-padded) region is deleted, survivors are re-emitted with a
fresh absoluteTmand noTJdeltas so neither the glyphs nor
a width/shift side channel (Bland et al., PETS 2023) remain; the page
is rewritten so the original content object is dropped by the
garbage-collected full rewrite (no residual recoverable bytes); an
opaque overlay marks the area (ISO 32000-1:2008 §12.5.6.23, "remove
all traces … clipping shall not be used"). Composite/Type0/unknown
fonts are refused rather than risk a silent under-redaction
(fail-closed). NewDocumentEditor::add_redaction/
redaction_count/apply_redactions_destructiveplus the
pdf_redaction_add/count/apply/scrub_metadataC ABI and Python,
WASM, Node, C#, Go bindings and apdf-oxide redact INPUT --rect PAGE:x0,y0,x1,y1 [--from-annotations] [--fill R,G,B] [--no-scrub-metadata]CLI. The legacy
apply_page_redactions/apply_all_redactionskeep their signatures.
Standalone document sanitization (DocumentEditor::sanitize_document,
the livepdf_redaction_scrub_metadataC ABI, Python
sanitize_document, WASMsanitizeDocument, and the already-wired
Node/C#/Go scrub paths) strips the/Infodictionary, the catalog
XMP/Metadatastream, document JavaScript (/OpenAction,/AA,
/Names/JavaScript) and/Names/EmbeddedFiles; the removed object
subtrees are hard-excluded from the rewritten file so a secret cannot
survive even as a GC-missed orphan (G6). Geometric image/path/XObject
pruning remains roadmap; composite-font text and encrypted documents
are refused (not under-redacted). - PAdES long-term-validation signatures
(#235) —
signing now produces ETSI EN 319 142-1 PAdES baseline signatures, not
just bareadbe.pkcs7.detached: B-B embeds the RFC 5035 ESS
signing-certificate-v2signed attribute; B-T adds an RFC 3161
signature-time-stampunsigned attribute over the signature value;
B-LT appends a Document Security Store (ISO 32000-2:2020
§12.8.4.3 — certs/CRLs/OCSPs + a per-signature/VRIkeyed by the
uppercase-hex SHA-1 of the signature's/Contents) as an
append-only second incremental update, so the original
signature's byte range is untouched and staysValid. Read side:
read_dssparses a/DSSandclassify_pades_levelreports a
signature's level (B-B/B-T/B-LT). New
sign_pdf_bytes_pades/PadesLevel/RevocationMaterial/
DocumentSecurityStorein core, thepdf_sign_bytes_pades/
pdf_signature_get_pades_level/pdf_document_get_dss/
pdf_dss_*C ABI, and Python, WASM, Node, C#, Go bindings. B-LTA
is also produced: a/Type /DocTimeStamp(/SubFilter /ETSI.RFC3161) RFC 3161 timestamp over the whole file including
the DSS, appended as a third incremental update so the archival
timestamp covers the signature and its validation material;
has_document_timestampis the document-scoped reader signal
(classify_pades_levelstays signature-scoped and tops out at B-LT
by design — the frozenpdf_signature_get_pades_levelC ABI has no
document handle). The legacysign_pdf_bytesadbe.pkcs7.detached
path is byte-for-byte unchanged. Final ETSI conformance is gated on
the EU DSS demonstration-validator release check (online TSA fetch is
CGo/native-only — WASM takes a pre-fetched RFC 3161 token). - Runtime crypto-governance policy
(#230) — a
process-widecrypto::SecurityPolicy(modescompat/strict/
fips-strict, plus anallow:/deny:<alg>@<read|write>override
grammar) layered as an orthogonal, set-once decorator over the
existingCryptoProvider. Read/write asymmetry lets a deployment
read legacy RC4/MD5 PDFs while forbidding weak crypto on write or
new signatures; fail-closed throughout (unknown algorithm /
unparseable spec ⇒ deny). Includes a content-keyedinventory()
governance report and a pluggableAuditSink. Exposed across all
seven surfaces (Rust, Python, C ABI, Go, C#, WASM, Node) as
set_crypto_policy/crypto_policy/crypto_inventory. Default
(compat) behaviour is byte-for-byte unchanged. The residual
password-key-derivation MD5 (ISO 32000-1 §7.6.3 Algorithm 1/2/3/5/7)
is now also routed through the governed provider, so a
strict/fips-strictpolicy denies legacy R≤4 at the primitive
level, not only the operation gate — closing the gap noted in the
v0.3.50 slice. The hashing is byte-identical undercompat
(existing encrypted PDFs still decrypt; newly written ones are
bit-for-bit unchanged). Non-security opaque MD5 (file identifier,
embedded-file/CheckSum) is deliberately left direct so a strict
policy still permits AES-256 writes. A machine-readable CycloneDX
1.6 Cryptographic Bill of Materials of the algorithms a run
actually exercised is exported viacrypto_cbom(corecbom_json+
C ABI / Python / WASM / Go / Node / C# bindings) — the structured
complement tocrypto_inventoryfor CBOM/SPDX-crypto governance.
The policy now also recognises and governs post-quantum
algorithms:PolicyMode::Cnsa2(CNSA 2.0 — new crypto must be
FIPS-approved and 192-bit-class or stronger; 128-bit classical and
L1/L2 PQC denied for write) andPolicyMode::PqcReady(Strict
semantics that additionally recognise/permit ML-DSA/ML-KEM for
classical+PQC dual-stacking during migration), plus ML-DSA-44/65/87
(FIPS 204)
and ML-KEM-512/768/1024 (FIPS 203)AlgorithmIds in
inventory()/CBOM/the policy grammar. This is governance vocabulary
(the policy decides; the actual ML-DSA/ML-KEM primitives are a
separate provider concern — a sign attempt fails closed until they
land). Set via the string grammar (crypto_policy("cnsa2")), so all
seven bindings get it with no API change; frozenAlgorithmIdbit
indices are preserved (PQC ids appended). A governed RSA
modulus-size floor is also enforced for signing:
SecurityPolicy::min_rsa_modulus_bits(per-mode default — Compat 0,
Strict/PqcReady 2048, FipsStrict/Cnsa2 3072 per NIST SP 800-131A /
CNSA 2.0) makessign_pdf_bytes/sign_pdf_bytes_padesfail closed
with a weak RSA key — the key-strength gate the algorithm-level
min_security_bitscannot see. Defaultcompatkeeps no floor
(byte-for-byte unchanged). (Finer X.509 cert-policy governance —
keyUsage / extendedKeyUsage / validity-window enforcement for the
signing certificate — is the remaining #230 roadmap item, tracked as
a focused follow-up. Per-document policy override (Phase G) was
design-assessed and deliberately deferred: the active policy is
set-once specifically because a mid-flight downgrade is an attack
vector, so a runtime widening override (e.g. relax-for-one-document)
cannot be added safely; the only sound shape is an explicit
per-document policy threaded through every crypto call site — a large
cross-cutting change, tracked as a separate follow-up, not a set-once
relaxation.) - Split a PDF by bookmarks
(#482) — new
pdf-oxide split --by-bookmarks [--bookmark-prefix P] [--bookmark-level N] [--ignore-case] [--no-front-matter]CLI, plus
plan_split_by_bookmarks/split_by_bookmarks*in core and every
binding (Python, WASM, C ABI, Go, C#, Node). Splits at outline
boundaries into one PDF per (optionally prefix-filtered) bookmark,
with collision-free, filesystem-safe filenames. Outline parsing now
resolves named destinations (catalog/Destsdictionary and the
/Names→/Destsname tree, ISO 32000-1 §12.3.2.3 / §7.9.6),
bounded against malformed/cyclic name trees. Plain per-pagesplit
is unchanged (backward compatible). - Full idiomatic cross-binding parity for #230/#231/#235/#482 —
every feature is now exposed idiomatically in all supported
bindings (Rust, Python, C ABI, WASM, C#, Go-cgo, Go-purego, Node/TS):- A new additive C ABI
pdf_document_has_timestamp(doc)exposes the
document-scoped PAdES-B-LTA reader signal that
pdf_signature_get_pades_level(signature-scoped, ≤B-LT by design)
cannot report; surfaced as Pythonhas_document_timestamp, WASM
hasDocumentTimestamp, C#PdfDocument.HasDocumentTimestamp, Go
(*PdfDocument).HasDocumentTimestamp, and Node
PdfDocument.hasDocumentTimestamp/SignatureManager. - Python now re-exports the entire signing/PAdES surface
(sign_pdf_bytes,sign_pdf_bytes_pades,Certificate,
Signature,PadesLevel,RevocationMaterial,Dss) plus
crypto_cbomfrom the top-levelpdf_oxidepackage under idiomatic
names (the functions were previously reachable only as
py_-prefixed symbols on the private extension module). - The standalone document sanitization entrypoint (#231) is now a
first-classSanitizeDocument()on the C# and Go (cgo + purego)
DocumentEditor(previously the livepdf_redaction_scrub_metadata
C ABI had no managed/Go wrapper). - The Go purego (CGO-free) backend, previously read-side only,
now covers crypto-governance (#230), destructive redaction +
sanitize (#231), PAdES signing + DSS read + B-LTA (#235), and
split-by-bookmarks (#482) with signatures identical to the cgo
backend. - Node/TS gains idiomatic
signPdfBytesPades,PadesLevel,
PdfDocument.getDocumentSecurityStore/hasDocumentTimestamp/ planSplitByBookmarks,setCryptoPolicy/cryptoPolicy/ cryptoInventory/cryptoCbom, andSecurityManager/
SignatureManager/OutlineManagermethods, all with generated
TypeScript declarations. Behaviour and the frozenPadesLevel
integer mapping are unchanged.
- A new additive C ABI
Fixed
- Wrong dates in digital-signature timestamps —
format_pdf_date
hard-coded the month/day to0101and approximated the year as
1970 + days/365, so every signature/Mvalue (and document
timestamps) was an incorrect ≈Jan-1-of-leap-drifted-year
(ISO 32000-1 §7.9.4). Replaced with one leap-year-correct,
de-duplicated implementation (the two divergent copies are gone).
Security
- Redaction now actually removes content
(#231) — the
Nodeediting-managerredaction methods previously called native
pdf_redaction_*symbols that did not exist (silently no-op'ing — a
security-critical operation pretending to succeed while removing
nothing). Those C ABI symbols now exist and perform true
destructive redaction (see Added); the binding gap is closed across
all surfaces. A[BLOCK]integration test builds a real PDF
containing a secret, redacts it through the public API, and asserts
the secret is absent from both re-extracted text and the raw
saved bytes (idempotent). - PAdES long-term-validation signatures
(#235) — PDF
signatures can now carry the ESSsigning-certificate-v2binding
(RFC 5035, defeats certificate-substitution), an RFC 3161 timestamp
(B-T), and a Document Security Store for offline long-term validation
(B-LT). The DSS is added as an append-only incremental update so
pre-existing signatures provably remainValid(asserted by the
I1–I7 integrity-invariant suite intests/pades_ltv.rs); a tampered
signed region still fails verification (negative test). See Added for
scope and the EU-DSS conformance gate.
Thanks
- @Suleman-Elahi for requesting
split-by-bookmarks (#482). - @jedzill4 for volunteering on
destructive redaction (#231).
Installation
Rust (crates.io)
cargo add pdf_oxidePython (PyPI)
pip install pdf_oxideJavaScript/WASM (npm)
npm install pdf-oxide-wasmCLI (Homebrew)
brew install yfedoseev/tap/pdf-oxideCLI (Scoop — Windows)
scoop bucket add pdf-oxide https://github.com/yfedoseev/scoop-pdf-oxide
scoop install pdf-oxideCLI (Shell installer)
curl -fsSL https://raw.githubusercontent.com/yfedoseev/pdf_oxide/main/install.sh | shCLI (cargo-binstall)
cargo binstall pdf_oxide_cliMCP Server (for AI assistants)
cargo install pdf_oxide_mcpPre-built Binaries
Download archives for Linux, macOS, and Windows from the assets below. Each archive includes both pdf-oxide (CLI) and pdf-oxide-mcp (MCP server).
Platform Support
| Platform | Architecture | Archive |
|---|---|---|
| Linux | x86_64 (glibc) | pdf_oxide-linux-x86_64-*.tar.gz |
| Linux | x86_64 (musl) | pdf_oxide-linux-x86_64-musl-*.tar.gz |
| Linux | ARM64 | pdf_oxide-linux-aarch64-*.tar.gz |
| macOS | x86_64 (Intel) | pdf_oxide-macos-x86_64-*.tar.gz |
| macOS | ARM64 (Apple Silicon) | pdf_oxide-macos-aarch64-*.tar.gz |
| Windows | x86_64 | pdf_oxide-windows-x86_64-*.zip |
Changelog
See CHANGELOG.md for full details.