v0.1.0 | First release — a stateless single-shot PDF REST API over the `pdf_oxide` engine: extract text/markdown/html, fill AcroForm fields (any UTF-8 script — CJK, Arabic, Hebrew), merge/split, and chain ops in one request. PDF in → result out, nothing persisted. Ships as a hardened, signed, ~14.5 MB distroless image.
Added
- HTTP service on axum 0.8 + tokio, with a bounded rayon CPU pool + semaphore
admission control for allpdf_oxidework (nospawn_blocking); panics in a
worker are isolated viacatch_unwind. - Extraction:
POST /v1/extract/text,/v1/extract/markdown(heading
detection),/v1/extract/html— with an optionalpagesselection
("1-3,5"). - Forms (the issue #611 hero feature):
POST /v1/forms/fields(introspect
AcroForm fields) andPOST /v1/forms/fill(fill from a JSON map, optional
flatten). Field values are passed topdf_oxideverbatim as UTF-8 and
written as UTF-16BE, so CJK, Arabic, Hebrew, and any Unicode round-trip with
no mojibake — covered by a gating acceptance test againstpdf_oxide0.3.59. - Document ops:
POST /v1/docs/merge,/v1/docs/split(one PDF per page,
returned as a ZIP),/v1/docs/metadata,/v1/docs/page-info. POST /v1/pipeline— chain ops over one in-memory parse (e.g. fill →
extract); a data-producing op must be last.max_pipeline_stepsenforced.- Dual request encoding on every data endpoint:
multipart/form-data(file
parts),application/json(pdf_base64/pdfs_base64), and raw body. - Operational endpoints:
GET /healthz,GET /readyz(503 while draining),
GET /version(reports the embeddedpdf_oxideversion),GET /metrics. healthchecksubcommand for the no-shell containerHEALTHCHECK.- RFC 9457
application/problem+jsonerror envelope via a singleApiError
with a variant-awarepdf_oxide::Errormapping that never leaks document
content (regression-tested). - Hardening: env-configurable limits (max body 32 MiB, request timeout 30 s,
max pages 2000, max in-flight 8, max pipeline steps 16);Cache-Control: no-storeon results; optional bearer auth; a loud startup warning on a
non-loopback bind without an API key (opt into hard fail-closed with
PDF_OXIDE_API_REQUIRE_AUTH=true); graceful-drain readiness. - Hardened multi-stage
Dockerfile(static musl on Chainguardstatic,
cargo-chef caching, mimalloc) and a hardeneddocker-compose.yml. - CI (fmt, clippy
-D warnings, test, cargo-deny, cargo-audit, MSRV, Docker
build + Trivy + smoke test), release workflow (multi-arch buildx, cosign
keyless sign, SBOM + SLSA provenance attest), and the cross-repo
pdf-oxide-releasedrebuild trigger with a crates.io poll fallback. - SEO/GEO docs assets:
README.md,llms.txt,.devin/wiki.json,
openapi.yaml(OpenAPI 3.1) + served/openapi.json, and an mdBook docs
site.
Run it
docker run --rm -p 8080:8080 ghcr.io/yfedoseev/pdf_oxide:latest
curl -s -F file=@doc.pdf http://localhost:8080/v1/extract/textPin a digest for reproducibility:
docker pull ghcr.io/yfedoseev/pdf_oxide@sha256:<digest-from-assets>The image is multi-arch (linux/amd64 + linux/arm64), cosign-signed
(keyless), and ships an attached CycloneDX SBOM + SLSA build provenance.
Verify the image
cosign verify ghcr.io/yfedoseev/pdf_oxide:VERSION_TAG \
--certificate-identity-regexp 'https://github.com/yfedoseev/pdf_oxide_api/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.comAPI contract
- OpenAPI 3.1:
GET /openapi.json· interactive docs:GET /docs - Versions:
GET /version(reports the embeddedpdf_oxideengine version)
Changelog
See CHANGELOG.md for full history.