A streamlined PowerShell toolkit for converting images to markdown using Azure AI Foundry's vision models. This tool performs OCR (Optical Character Recognition) on images and generates properly formatted markdown files with optional YAML front matter.
- Azure AI Foundry Integration: Leverages Azure AI Foundry's GPT-4o vision models for accurate text extraction
- Batch Processing: Process entire directories of images automatically
- Flexible Output: Optional YAML front matter for static site generators
- Multiple Image Formats: Supports PNG, JPG, JPEG, BMP, GIF, and WebP
- Recursive Processing: Handle nested directory structures
- Environment Auto-Detection: Automatically loads configuration from multiple locations
- Error Handling: Robust error handling with detailed logging
- PowerShell 5.1 or later
- Azure AI Foundry or Azure OpenAI service with vision-enabled model deployment
- Valid Azure credentials
Create an ai-foundry.env
file in one of these locations:
./ai-foundry.env
(same directory as scripts)../ai-foundry.env
(parent directory)../../config/ai-foundry.env
(config directory)
# Required: Azure AI Foundry/OpenAI Configuration
AZURE_AI_FOUNDRY_ENDPOINT=https://your-foundry-endpoint.openai.azure.com
AZURE_AI_FOUNDRY_KEY=your-api-key
# Alternative: Azure OpenAI Configuration (fallback)
AZURE_OPENAI_ENDPOINT=https://your-openai-endpoint.openai.azure.com
AZURE_OPENAI_API_KEY=your-api-key
# Optional: API Version (defaults to 2025-01-01-preview)
AZURE_OPENAI_API_VERSION=2025-01-01-preview
Ensure you have a vision-capable model deployed in Azure AI Foundry:
- GPT-4o (recommended, default)
- GPT-4 Vision
- GPT-4 Turbo with Vision
- Clone Repository: Download the PowerShell scripts to your local machine
- Configure Environment: Set up your Azure AI Foundry credentials
- Verify Model Access: Ensure you have a vision-capable model deployed
- Prepare Images: Organize your image files in dedicated directories
- Choose Processing Mode: Single directory or batch processing
- Execute Scripts: Run conversion with your desired parameters
# Basic usage - process images in a folder
.\image-to-markdown-foundry.ps1 -ImageFolderPath "C:\path\to\images"
# Custom output directory
.\image-to-markdown-foundry.ps1 -ImageFolderPath "C:\path\to\images" -OutputFolderPath "C:\output\folder"
# Include YAML front matter
.\image-to-markdown-foundry.ps1 -ImageFolderPath "C:\path\to\images" -IncludeYamlFrontMatter
# Process subdirectories recursively
.\image-to-markdown-foundry.ps1 -ImageFolderPath "C:\path\to\images" -Recursive
# Use specific deployment and custom prompt
.\image-to-markdown-foundry.ps1 -ImageFolderPath "C:\path\to\images" -DeploymentName "gpt-4-vision" -SystemPrompt "Extract text with emphasis on preserving table structures"
# Process all image directories under a root path
.\batch-image-to-markdown-foundry.ps1 -RootDirectory "C:\screenshots" -OutputBaseDirectory "C:\markdown-output"
# Include YAML front matter for all processed files
.\batch-image-to-markdown-foundry.ps1 -RootDirectory "C:\screenshots" -IncludeYamlFrontMatter
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
ImageFolderPath |
String | Yes | - | Path to folder containing images |
OutputFolderPath |
String | No | {ImageFolderPath}\markdown-output |
Output directory for markdown files |
DeploymentName |
String | No | gpt-4o |
Azure AI model deployment name |
MaxTokens |
Int | No | 4000 |
Maximum tokens for vision analysis |
SystemPrompt |
String | No | Default OCR prompt | Custom system prompt for text extraction |
IncludeYamlFrontMatter |
Switch | No | false |
Include YAML front matter in output |
Recursive |
Switch | No | false |
Process subdirectories recursively |
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
RootDirectory |
String | No | ../../data/screenshots |
Root directory to search for images |
OutputBaseDirectory |
String | No | ../../data/markdown-output |
Output base directory |
IncludeYamlFrontMatter |
Switch | No | false |
Include YAML front matter in all outputs |
DeploymentName |
String | No | gpt-4o |
Azure AI model deployment name |
# Image Title
> Extracted from image: screenshot.png
[Extracted text content here]
---
title: "Image Title"
date: "2025-06-05"
type: "image_extraction"
source_image: "screenshot.png"
extraction_method: "azure_ai_foundry"
---
# Image Title
> Extracted from image: screenshot.png
[Extracted text content here]
- PNG (.png)
- JPEG (.jpg, .jpeg)
- Bitmap (.bmp)
- GIF (.gif)
- WebP (.webp)
The scripts include comprehensive error handling for:
- Missing environment configuration
- Invalid image paths
- Azure API errors
- File system permissions
- Network connectivity issues
- Token Usage: Each image consumes tokens based on size and complexity
- Rate Limits: Azure AI services have rate limits; large batches may need throttling
- Image Size: Larger images may require higher
MaxTokens
values - Cost: Monitor usage in Azure portal to track costs
-
"Azure AI Foundry credentials not found"
- Verify
ai-foundry.env
file exists and is properly formatted - Check environment variable names match exactly
- Verify
-
"No text extracted from image"
- Verify image contains readable text
- Try increasing
MaxTokens
parameter - Check image quality and resolution
-
API Rate Limit Errors
- Reduce batch size
- Add delays between API calls
- Verify your Azure service tier and limits
-
PowerShell Execution Policy
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
- Image Quality: Use high-resolution, clear images for best OCR results
- Batch Size: Process images in smaller batches to avoid rate limits
- Cost Management: Monitor token usage and set up Azure budgets
- Security: Store API keys securely and never commit them to version control
- Backup: Keep original images as backup before processing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- VTT to Markdown Converter - Convert VTT transcript files to markdown