Image To Text

Convert image to text using Python (Tesseract OCR)

A. Image to Text:

Python should be installed on your system. If it is not installed then you can install Anaconda from here.
Install Tesseract OCR from here.
Run the command: pip install pytesseract pillow
Check if the requirement is satisfied or not. If not then the above command will install the package.
Command to check which language codes Tesseract OCR identifies: tesseract --list-langs
Syntax to run the image2Text.py is:
python image2Text.py path-to-input-image-or-folder path-of-output-folder --languages language-code

B. If you want to merge all the text files into a single text file then use merge-text.bat
Syntax:
merge-text.bat path-to-folder-containing-text-files
It will create the merged text file in the same location that contains the text files.

C. If you want to convert pages of a pdf file or multiple pdf files to images then use pdf2image.py

Run the command: pip install pymupdf
It takes the input argument as a path to a single pdf file or a path to a folder containing multiple pdf files.
It does not require an argument to define the output path. It will automatically create folders as per the name of the pdf file and store it's converted pages as images there. the output folder will be created in the same place where the pdf is located.
Syntax:
pdf2image.py path-to-pdf-or-folder-containing-pdf

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
image2Text.py		image2Text.py
merge-text.bat		merge-text.bat
pdf2image.py		pdf2image.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image To Text

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Image To Text

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages