Automated pipeline for batch text extraction from images using GPT-4o Vision.
This project provides a robust Python script to batch process images, leveraging OpenAI's GPT-4o multimodal capabilities to accurately transcribe visible text. It handles error management, retries, and outputs results in both individual Markdown files and a consolidated JSON database.
- Batch Processing: Automatically processes all images in the
images/folder. - AI-Powered Extraction: Uses GPT-4o Vision to accurately transcribe text from images.
- Error Handling: Automatically moves failed images to a
failed/folder for review. - Retry Mechanism: Options to retry failed images.
- Duplicate Prevention: Skips images that have already been processed.
- Structured Output: Generates individual
.mdfiles and a consolidatedextracted_text.json.
- Python 3.8 or higher
- An OpenAI API Key
-
Clone the repository (or download the files):
git clone <repository-url> cd camilo_images_to_text
-
Create a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
-
Install dependencies:
pip install requests tqdm python-dotenv
- Create a
.envfile in the root directory (you can copy the structure from the example below). - Add your OpenAI API Key and the absolute path to the project folder.
Example .env file:
OPENAI_API_KEY=your-openai-api-key-here
BASE_FOLDER=/absolute/path/to/camilo_images_to_textNote: Ensure
BASE_FOLDERpoints to the exact location where you have this project on your machine.
- Place your images in the
images/folder. Supported formats:.jpg,.jpeg,.png. - Run the script:
python3 script.py
- Check the results:
md_results/: Contains one Markdown file per image with the extracted text.extracted_text.json: A single JSON file containing all extracted text mapped to filenames.failed/: Contains images that could not be processed (e.g., no text found or API errors).
script.py: Main script that handles the logic.images/: Input folder for images to be processed.md_results/: Output folder for individual text files.failed/: Folder for images that failed processing.extracted_text.json: Database of all extracted text..env: Configuration file for secrets and paths.
- "No module named...": Make sure you activated the virtual environment and installed the requirements.
- API Errors: Check your API Key in
.envand ensure you have credits in your OpenAI account. - Path Errors: Verify that
BASE_FOLDERin.envis the correct absolute path.