# Image Extraction and Text Recognition Notebook

This notebook demonstrates how to extract images from a dataset using a trained YOLO model and perform text recognition on the extracted images. 

## Setup

First, we'll import the necessary libraries. Don't worry if you don't understand all of these - they're tools we'll use throughout the notebook.

In [None]:
import os
import time
import pandas as pd
from PIL import Image
from ultralytics import YOLO 
import torch  
import numpy as np
import glob
import pytesseract
from PIL import Image, ImageEnhance
from concurrent.futures import ThreadPoolExecutor

## Loading the Model and Setting Up Directories

Now, we'll load our trained YOLO model and set up the directories for our input and output images.

In [None]:
# Load the trained model (adjust the path to where your model is stored). You trained this model in the notebook of step 4
model = YOLO('finetunedmodel.pt')

# Directory containing your original images
source = 'images'
# Directory where extracted images will be saved
output_folder = 'extractedimages'
os.makedirs(output_folder, exist_ok=True)

## Gathering Image Paths

This step collects the paths of all images in our source directory. It looks for common image file types like .png, .jpg, and .jpeg.

In [None]:
# Gather all images using a glob pattern
image_extensions = ('*.png', '*.jpg', '*.jpeg')
image_paths = []
for ext in image_extensions:
    image_paths.extend(glob.glob(f'{source}/**/{ext}', recursive=True))

## Extracting Images

This is the main part of our script. It processes the images in batches, uses our YOLO model to detect regions of interest, and extracts these regions as separate images.

In [None]:
# Initialize a DataFrame to store results
data = []

# Time tracking
start_time = time.time()

# Process images in batches
batch_size = 8  # Adjust based on your system's memory capacity
for i in range(0, len(image_paths), batch_size):
    batch = image_paths[i:i + batch_size]
    results = model(batch)
    for img_path, result in zip(batch, results):
        with Image.open(img_path) as img:
            width, height = img.size
            for bbox_info in result.boxes:
                if bbox_info.cls == 0:
                    bbox = bbox_info.xyxy.cpu().detach().numpy().flatten()
                    x1, y1, x2, y2 = map(int, bbox)
                    cropped_img = img.crop((x1, y1, x2, y2))
                    cropped_name = f"{os.path.splitext(os.path.basename(img_path))[0]}_{x1}_{y1}_{x2}_{y2}.png"
                    cropped_img_path = os.path.join(output_folder, cropped_name)
                    cropped_img.save(cropped_img_path)
                    data.append({
                        'original_filename': os.path.basename(img_path),
                        'cropped_filename': cropped_name,
                        'bbox': (x1, y1, x2, y2),
                        'confidence': float(bbox_info.conf)
                    })

# Save metadata to a CSV file for further processing
df = pd.DataFrame(data)
df.to_csv('cropped_images_metadata.csv', index=False)
end_time = time.time()
print(f"Extraction process completed in {end_time - start_time:.2f} seconds.")

## Text Recognition

Now that we've extracted our images, we'll perform text recognition on them. This process involves checking the readability of the text, enhancing the image quality, and then using OCR (Optical Character Recognition) to extract the text.

In [None]:
df = pd.read_csv('cropped_images_metadata.csv')

# Function to check text readability and correct orientation for a specific area of the image
def check_text_readability(img):
    img_width, img_height = img.size
    caption_area = img.crop((0, int(img_height * 0.95), img_width, img_height))

    # Try OCR at different rotations: 0, 90, 180, 270 degrees
    for angle in [0, 90, 180, 270]:
        test_img = caption_area.rotate(angle, expand=True)
        test_text = pytesseract.image_to_string(test_img, config='--psm 7')
        if any(char.isalpha() for char in test_text):
            if angle != 0:
                return img.rotate(angle, expand=True)
            return img
    return img

# Function to process each image and measure processing time
def process_image(row):
    cropped_img_path = os.path.join(output_folder, row['cropped_filename'])
    start_time = time.time()  # Start time measurement

    try:
        with Image.open(cropped_img_path) as img:
            img = check_text_readability(img)
            img_width, img_height = img.size
            caption_img = img.crop((0, int(img_height * 0.95), img_width, img_height))
            gray_img = caption_img.convert('L')
            contrast_enhancer = ImageEnhance.Contrast(gray_img)
            enhanced_img = contrast_enhancer.enhance(2)
            threshold = 128
            binarized_img = enhanced_img.point(lambda p: p > threshold and 255)
            text = pytesseract.image_to_string(binarized_img, config='--psm 6')
    except Exception as e:
        print(f"Failed to process {cropped_img_path}: {e}")
        text = ""

    end_time = time.time()  # End time measurement
    processing_time = end_time - start_time
    return text.strip(), processing_time

# Parallel processing and collect times
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(process_image, df.to_dict('records')))

# Unpack results and processing times
text_data, times = zip(*results)

# Add extracted text to the DataFrame
df['extracted_text'] = text_data
df.to_csv('final_output_with_text.csv', index=False)

# Calculate average time per image
total_time = sum(times)
average_time = sum(times) / len(times)
print(f"Total processing time: {total_time:.2f} seconds")
print(f"Average processing time per image: {average_time:.2f} seconds")

## Conclusion

This notebook has demonstrated how to:
1. Load a trained YOLO model
2. Process a batch of images to extract regions of interest
3. Perform text recognition on the extracted images
4. Save the results in a CSV file

The final output is saved in 'final_output_with_text.csv', which contains information about each extracted image and its recognized text.

Remember to adjust file paths and model names as necessary for your specific setup. Happy image processing!