Skip to content
@pd3f

pd3f

PDF text extraction pipeline: self-hosted, local-first and Docker-based

Pinned Loading

  1. pd3f Public

    🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

    HTML 322 38

  2. pd3f-core Public

    📑 Python Package to reconstruct the original continuous text from PDFs with language models

    Jupyter Notebook 32 8

  3. dehyphen Public

    📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF

    Python 39 4

Repositories

Showing 7 of 7 repositories