OCR Web Application

A Streamlit web application that uses PyOCR to extract text from images and PDFs.

Prerequisites

Before running this application, you need to install Tesseract-OCR and Poppler on your system:

Ubuntu/Debian

sudo apt-get update
sudo apt-get install tesseract-ocr poppler-utils

Windows

Download and install Tesseract-OCR from: https://github.com/UB-Mannheim/tesseract/wiki
Download and install Poppler from: http://blog.alivate.com.au/poppler-windows/
Add both Tesseract and Poppler to your system PATH

macOS

brew install tesseract poppler

Setup

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required Python packages:

pip install -r requirements.txt

Running the Application

Activate your virtual environment if not already activated:

source venv/bin/activate  # On Windows: venv\Scripts\activate

Run the Streamlit app:

streamlit run app.py

Open your web browser and navigate to the URL shown in the terminal (typically http://localhost:8501)

Usage

For Images:

Upload an image file (PNG, JPG, or JPEG)
Click "Extract Text" to process the image
View the extracted text
Download the text if needed

For PDFs:

Upload a PDF file
Select a specific page to process or choose to process all pages
Click "Extract Text" to process the selected page or "Process All Pages" to extract text from all pages
View the extracted text
Download the text (individual page or all pages)

Features

Image upload support (PNG, JPG, JPEG)
PDF upload support with multi-page processing
Page selection for PDFs
Real-time text extraction
Download extracted text
User-friendly interface
Support for multiple file formats

Notes

For best results, use clear, well-lit images with good contrast
The application uses English language by default for OCR
Processing time may vary depending on file size and complexity
PDF processing may take longer than image processing due to the conversion step

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
pyocr		pyocr
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCR Web Application

Prerequisites

Ubuntu/Debian

Windows

macOS

Setup

Running the Application

Usage

For Images:

For PDFs:

Features

Notes

About

Uh oh!

Releases

Packages

Languages

vva1kerr/pdf_to_text

Folders and files

Latest commit

History

Repository files navigation

OCR Web Application

Prerequisites

Ubuntu/Debian

Windows

macOS

Setup

Running the Application

Usage

For Images:

For PDFs:

Features

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages