Skip to content

tinytips4u/image2text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image To Text

Convert image to text using Python (Tesseract OCR)

A. Image to Text:

  1. Python should be installed on your system. If it is not installed then you can install Anaconda from here.
  2. Install Tesseract OCR from here.
  3. Run the command: pip install pytesseract pillow
    Check if the requirement is satisfied or not. If not then the above command will install the package.
  4. Command to check which language codes Tesseract OCR identifies: tesseract --list-langs
  5. Syntax to run the image2Text.py is:
    python image2Text.py path-to-input-image-or-folder path-of-output-folder --languages language-code

B. If you want to merge all the text files into a single text file then use merge-text.bat
Syntax:
merge-text.bat path-to-folder-containing-text-files
It will create the merged text file in the same location that contains the text files.

C. If you want to convert pages of a pdf file or multiple pdf files to images then use pdf2image.py

  1. Run the command: pip install pymupdf
  2. It takes the input argument as a path to a single pdf file or a path to a folder containing multiple pdf files.
  3. It does not require an argument to define the output path. It will automatically create folders as per the name of the pdf file and store it's converted pages as images there. the output folder will be created in the same place where the pdf is located.
    Syntax:
    pdf2image.py path-to-pdf-or-folder-containing-pdf

About

Convert image to text using python (Tesseract OCR)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors