pdf-to-text

Here is 1 public repository matching this topic...

Directorman9 / Optical-character-recognition

The notebook in this repository uses pytesseract to extract text from a pdf document. The script can be used to automate text acquisition from a large body of printed resources such as books. The acquired text can then be used for dowstream tasks, such as training language models, topic models, document summarization etc

ocr pdf-to-text pytesseract

Updated Apr 30, 2022

Improve this page

Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-to-text

Here is 1 public repository matching this topic...

Directorman9 / Optical-character-recognition

Improve this page

Add this topic to your repo