Skip to content
#

pdf-preprocessing

Here is 1 public repository matching this topic...

High-fidelity OCR + pre-RAG pipeline processor featuring: 1.) Tesseract OCR 2.) Built-in cross-line dehyphenation + real word verification 3.) Support for TIFF series, & JPEG2000 (jpx) for hi-fidelity pdf sources with logistically significant size savings. Morphic assists in pre-RAG PDF prep for analysis, large-scale ingest & agentic analysis

  • Updated Dec 8, 2025
  • Python

Improve this page

Add a description, image, and links to the pdf-preprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-preprocessing topic, visit your repo's landing page and select "manage topics."

Learn more