Image text detection with Jupyter Notebook Python
-
Updated
Dec 31, 2022 - Jupyter Notebook
Image text detection with Jupyter Notebook Python
The notebook in this repository uses pytesseract to extract text from a pdf document. The script can be used to automate text acquisition from a large body of printed resources such as books. The acquired text can then be used for dowstream tasks, such as training language models, topic models, document summarization etc
Add a description, image, and links to the pytesseract topic page so that developers can more easily learn about it.
To associate your repository with the pytesseract topic, visit your repo's landing page and select "manage topics."