Skip to content

Automated pipeline for batch text extraction from images using GPT-4o Vision. This tool processes image collections, handles API retries, and outputs structured data in Markdown and JSON formats.

Notifications You must be signed in to change notification settings

wizsebastian/gpt4o_batch_image_text_extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gpt4o_batch_image_text_extractor

Automated pipeline for batch text extraction from images using GPT-4o Vision.

This project provides a robust Python script to batch process images, leveraging OpenAI's GPT-4o multimodal capabilities to accurately transcribe visible text. It handles error management, retries, and outputs results in both individual Markdown files and a consolidated JSON database.

Features

  • Batch Processing: Automatically processes all images in the images/ folder.
  • AI-Powered Extraction: Uses GPT-4o Vision to accurately transcribe text from images.
  • Error Handling: Automatically moves failed images to a failed/ folder for review.
  • Retry Mechanism: Options to retry failed images.
  • Duplicate Prevention: Skips images that have already been processed.
  • Structured Output: Generates individual .md files and a consolidated extracted_text.json.

Prerequisites

  • Python 3.8 or higher
  • An OpenAI API Key

Installation

  1. Clone the repository (or download the files):

    git clone <repository-url>
    cd camilo_images_to_text
  2. Create a virtual environment (recommended):

    python3 -m venv venv
    source venv/bin/activate  # On Windows use: venv\Scripts\activate
  3. Install dependencies:

    pip install requests tqdm python-dotenv

Configuration

  1. Create a .env file in the root directory (you can copy the structure from the example below).
  2. Add your OpenAI API Key and the absolute path to the project folder.

Example .env file:

OPENAI_API_KEY=your-openai-api-key-here
BASE_FOLDER=/absolute/path/to/camilo_images_to_text

Note: Ensure BASE_FOLDER points to the exact location where you have this project on your machine.

Usage

  1. Place your images in the images/ folder. Supported formats: .jpg, .jpeg, .png.
  2. Run the script:
    python3 script.py
  3. Check the results:
    • md_results/: Contains one Markdown file per image with the extracted text.
    • extracted_text.json: A single JSON file containing all extracted text mapped to filenames.
    • failed/: Contains images that could not be processed (e.g., no text found or API errors).

Project Structure

  • script.py: Main script that handles the logic.
  • images/: Input folder for images to be processed.
  • md_results/: Output folder for individual text files.
  • failed/: Folder for images that failed processing.
  • extracted_text.json: Database of all extracted text.
  • .env: Configuration file for secrets and paths.

Troubleshooting

  • "No module named...": Make sure you activated the virtual environment and installed the requirements.
  • API Errors: Check your API Key in .env and ensure you have credits in your OpenAI account.
  • Path Errors: Verify that BASE_FOLDER in .env is the correct absolute path.

About

Automated pipeline for batch text extraction from images using GPT-4o Vision. This tool processes image collections, handles API retries, and outputs structured data in Markdown and JSON formats.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages