A lightweight, end-to-end document scanner built using OpenCV and Tesseract OCR. This project detects documents (like receipts, paper, forms), extracts the perspective-warped version, enhances it, and finally uses OCR to extract the textual content from the image.
Tool/Library | Purpose |
---|---|
Python | Main programming language |
OpenCV | Image processing & computer vision |
NumPy | Numerical operations |
Matplotlib | For debugging and image visualization |
PIL (Pillow) | Exporting scanned images to PDF |
pytesseract | OCR engine to extract text |
- Image Preprocessing β Grayscale, Gaussian Blur, Thresholding
- Contour Detection β Finding document edges
- Perspective Transformation β Warping document into a flat scan
- Morphological Operations β Dilation to enhance edge detection
- Canny Edge Detection β Identifying boundaries
- Utilizes Tesseract to recognize text in scanned, binarized images.
graph TD
A[Original Image] --> B[Grayscale + Blur]
B --> C[Adaptive Threshold + Dilation]
C --> D[Canny Edge Detection]
D --> E[Find Contours]
E --> F[Detect 4-Point Document]
F --> G[Perspective Transform]
G --> H[Enhance / Binarize]
H --> I[OCR with Tesseract]
The image is loaded into the system for processing.
The image is first converted to grayscale to simplify processing, then Gaussian blur is applied to reduce noise and help with edge detection.
We apply adaptive thresholding to convert the image into a binary (black and white) format and use dilation to emphasize edges.
Canny edge detection is applied to identify the boundaries within the image.
Contours of the image are detected, and the largest 4-point contour (representing the document edges) is identified.
If a clean 4-point contour is detected, it is used as the boundary for the document. Otherwise, we allow the user to manually select the 4 points.
Once the 4 points are detected, we warp the document into a top-down view to simulate a scan.
After the transformation, the image is enhanced and binarized to improve the contrast for OCR recognition.
Finally, we use Tesseract OCR to extract text from the binarized document.
Ensure you have Python installed (Python 3.6 or higher).
You can install all the necessary dependencies by running the following command:
pip install opencv-python numpy pytesseract Pillow matplotlib
Windows: Download and install Tesseract OCR from here. Add the installed path to the environment variables.
Linux (Ubuntu): Use the following command to install Tesseract:
sudo apt install tesseract-ocr
Clone this repository to your local machine:
git clone https://github.com/krushangptl/Doc-Scanner-Project
Doc-Scanner-Project
Edge Detection β Detects the edges of the document using Canny edge detection.
Perspective Transformation β Warps the document to create a top-down scan.
OCR Integration β Extracts text from scanned documents using Tesseract OCR.
Output Options β Supports exporting the result as a JPG, PNG, PDF, or TXT file.
Manual Mode β Allows users to manually select the corners of the document if automatic detection fails.
This project demonstrates how OpenCV and Tesseract OCR can be combined to build a powerful document scanning tool. From preprocessing to text extraction, this end-to-end solution can be further extended with additional features and enhancements.