Release v0.1.0 | First release — a stateless single-shot PDF REST API over the `pdf_oxide` engine: extract text/markdown/html, fill AcroForm fields (any UTF-8 script — CJK, Arabic, Hebrew), merge/split, and chain ops in one request. PDF in → result out, nothing persisted. Ships as a hardened, signed, ~14.5 MB distroless image. · yfedoseev/pdf_oxide_api

Added

HTTP service on axum 0.8 + tokio, with a bounded rayon CPU pool + semaphore
admission control for all pdf_oxide work (no spawn_blocking); panics in a
worker are isolated via catch_unwind.
Extraction: POST /v1/extract/text, /v1/extract/markdown (heading
detection), /v1/extract/html — with an optional pages selection
("1-3,5").
Forms (the issue #611 hero feature): POST /v1/forms/fields (introspect
AcroForm fields) and POST /v1/forms/fill (fill from a JSON map, optional
flatten). Field values are passed to pdf_oxide verbatim as UTF-8 and
written as UTF-16BE, so CJK, Arabic, Hebrew, and any Unicode round-trip with
no mojibake — covered by a gating acceptance test against pdf_oxide 0.3.59.
Document ops: POST /v1/docs/merge, /v1/docs/split (one PDF per page,
returned as a ZIP), /v1/docs/metadata, /v1/docs/page-info.
POST /v1/pipeline — chain ops over one in-memory parse (e.g. fill →
extract); a data-producing op must be last. max_pipeline_steps enforced.
Dual request encoding on every data endpoint: multipart/form-data (file
parts), application/json (pdf_base64 / pdfs_base64), and raw body.
Operational endpoints: GET /healthz, GET /readyz (503 while draining),
GET /version (reports the embedded pdf_oxide version), GET /metrics.
healthcheck subcommand for the no-shell container HEALTHCHECK.
RFC 9457 application/problem+json error envelope via a single ApiError
with a variant-aware pdf_oxide::Error mapping that never leaks document
content (regression-tested).
Hardening: env-configurable limits (max body 32 MiB, request timeout 30 s,
max pages 2000, max in-flight 8, max pipeline steps 16); Cache-Control: no-store on results; optional bearer auth; a loud startup warning on a
non-loopback bind without an API key (opt into hard fail-closed with
PDF_OXIDE_API_REQUIRE_AUTH=true); graceful-drain readiness.
Hardened multi-stage Dockerfile (static musl on Chainguard static,
cargo-chef caching, mimalloc) and a hardened docker-compose.yml.
CI (fmt, clippy -D warnings, test, cargo-deny, cargo-audit, MSRV, Docker
build + Trivy + smoke test), release workflow (multi-arch buildx, cosign
keyless sign, SBOM + SLSA provenance attest), and the cross-repo
pdf-oxide-released rebuild trigger with a crates.io poll fallback.
SEO/GEO docs assets: README.md, llms.txt, .devin/wiki.json,
openapi.yaml (OpenAPI 3.1) + served /openapi.json, and an mdBook docs
site.

Run it

docker run --rm -p 8080:8080 ghcr.io/yfedoseev/pdf_oxide:latest
curl -s -F file=@doc.pdf http://localhost:8080/v1/extract/text

Pin a digest for reproducibility:

docker pull ghcr.io/yfedoseev/pdf_oxide@sha256:<digest-from-assets>

The image is multi-arch (linux/amd64 + linux/arm64), cosign-signed
(keyless), and ships an attached CycloneDX SBOM + SLSA build provenance.

Verify the image

cosign verify ghcr.io/yfedoseev/pdf_oxide:VERSION_TAG \
  --certificate-identity-regexp 'https://github.com/yfedoseev/pdf_oxide_api/.*' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com

API contract

OpenAPI 3.1: GET /openapi.json · interactive docs: GET /docs
Versions: GET /version (reports the embedded pdf_oxide engine version)

Changelog

See CHANGELOG.md for full history.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.