This project is a Python-based Optical Character Recognition (OCR) system that extracts text from images using OpenCV and Tesseract OCR. The program fetches images from provided URLs, preprocesses them, and then applies OCR to extract text.
You can find the .ipynb file, download it, and directly run it in a Jupyter notebook environment.
- Download Images: Fetches images from given URLs and ensures they are in a valid format.
- Preprocessing: Converts images to grayscale and applies thresholding for improved text recognition.
- OCR Implementation: Uses Tesseract OCR to extract text from images.
- Visualization: Displays the original and processed images using Matplotlib.
- Programming Language: Python
- Libraries: OpenCV, PIL (Pillow), Requests, NumPy, Matplotlib, Tesseract OCR
Make sure you have Python installed (preferably Python 3.8 or higher). Then, install the required dependencies:
pip install opencv-python numpy requests pillow matplotlib pytesseractDownload and install Tesseract from here. After installation, add it to your system's PATH.
sudo apt install tesseract-ocr # For Ubuntu/Debian
brew install tesseract # For MacRun the script and provide image URLs when prompted:
python ocr_image_extractor.pyEnter image URLs (comma-separated): https://example.com/image1.jpg, https://example.com/image2.png
Extracted Text from Image 1:
Hello, this is an example!
--------------------------------------------------
Extracted Text from Image 2:
Sample OCR output text.
--------------------------------------------------
ocr_image_extractor.py: Main script for downloading images, preprocessing, applying OCR, and displaying results.requirements.txt: List of required Python dependencies.
- Support for batch processing multiple images simultaneously.
- Integration with a graphical user interface (GUI).
- Support for more advanced image processing techniques.
This project is licensed under the MIT License.
Pull requests are welcome. For major changes, please open an issue first to discuss what you’d like to change.
For any inquiries, contact Vigneshwaran at chokkalingamvigneshwaran@gmail.com