Skip to content

Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2

License

Notifications You must be signed in to change notification settings

yjg30737/pyqt-pdf2text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyqt-pdf2text

Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2

Requirements

  • PyPDF2
  • pytesseract
  • pdf2image
  • PyQt5>=5.14

Poppler is already included. (As of September 14, 2020, it is the latest version.)

Note

The current GUI only uses Tesseract for image-to-text conversion and does not use it for PDF-to-text conversion. The functionality does exist in the script.py, so feel free to use it if you'd like.

How to install

  1. Install Tesseract from Google.
  2. Add the installed path of Tesseract to your environment variables.
  3. git clone
  4. pip install -r requirements.txt
  5. python main.py

Preview

image

About

Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published