Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
Updated
Jun 7, 2024 - HTML
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Aspose.PDF for Javascript via C++
Extract structured text and data from documents like invoices, book pages, tables, etc.. using OpenCV and Tesseract OCR
Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.
To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."