gpt4o_batch_image_text_extractor

Automated pipeline for batch text extraction from images using GPT-4o Vision.

This project provides a robust Python script to batch process images, leveraging OpenAI's GPT-4o multimodal capabilities to accurately transcribe visible text. It handles error management, retries, and outputs results in both individual Markdown files and a consolidated JSON database.

Features

Batch Processing: Automatically processes all images in the images/ folder.
AI-Powered Extraction: Uses GPT-4o Vision to accurately transcribe text from images.
Error Handling: Automatically moves failed images to a failed/ folder for review.
Retry Mechanism: Options to retry failed images.
Duplicate Prevention: Skips images that have already been processed.
Structured Output: Generates individual .md files and a consolidated extracted_text.json.

Prerequisites

Python 3.8 or higher
An OpenAI API Key

Installation

Clone the repository (or download the files):

git clone <repository-url>
cd camilo_images_to_text

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install dependencies:
```
pip install requests tqdm python-dotenv
```

Configuration

Create a .env file in the root directory (you can copy the structure from the example below).
Add your OpenAI API Key and the absolute path to the project folder.

Example .env file:

OPENAI_API_KEY=your-openai-api-key-here
BASE_FOLDER=/absolute/path/to/camilo_images_to_text

Note: Ensure BASE_FOLDER points to the exact location where you have this project on your machine.

Usage

Place your images in the images/ folder. Supported formats: .jpg, .jpeg, .png.
Run the script:
```
python3 script.py
```
Check the results:
- md_results/: Contains one Markdown file per image with the extracted text.
- extracted_text.json: A single JSON file containing all extracted text mapped to filenames.
- failed/: Contains images that could not be processed (e.g., no text found or API errors).

Project Structure

script.py: Main script that handles the logic.
images/: Input folder for images to be processed.
md_results/: Output folder for individual text files.
failed/: Folder for images that failed processing.
extracted_text.json: Database of all extracted text.
.env: Configuration file for secrets and paths.

Troubleshooting

"No module named...": Make sure you activated the virtual environment and installed the requirements.
API Errors: Check your API Key in .env and ensure you have credits in your OpenAI account.
Path Errors: Verify that BASE_FOLDER in .env is the correct absolute path.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
md_results		md_results
.gitignore		.gitignore
README.md		README.md
extracted_text.json		extracted_text.json
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gpt4o_batch_image_text_extractor

Features

Prerequisites

Installation

Configuration

Usage

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

wizsebastian/gpt4o_batch_image_text_extractor

Folders and files

Latest commit

History

Repository files navigation

gpt4o_batch_image_text_extractor

Features

Prerequisites

Installation

Configuration

Usage

Project Structure

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages