# OCR Model Training and Prediction Pipeline

## ⚠️ URGENT FIX FOR "ValueError: numpy.dtype size changed" ⚠️

You are seeing this error because **Numpy 2.0** is installed, but **TensorFlow** requires **Numpy 1.x**.

**INSTRUCTIONS:**
1.  **Run the cell below** (it uses `%pip` to fix your specific kernel).
2.  **Wait** for it to finish uninstalling and installing.
3.  **Click 'Kernel' -> 'Restart Kernel'** in the top menu.
4.  **Run the verification cell** (Cell 2) to confirm Numpy is version 1.2x.x.
5.  Then run the rest of the notebook.

*Note: If you get 'Permission denied' errors, you verify you are running Jupyter as Administrator, or run `pip install "numpy<2"` in your Anaconda Prompt.*

In [10]:
# CRITICAL FIX: Use %pip magic to ensure we install into the CURRENT kernel
# We force uninstall numpy and reinstall a compatible version (<2.0)
%pip uninstall -y numpy h5py tensorflow
%pip install "numpy<2.0" tensorflow h5py emnist scikit-image opencv-python matplotlib

Found existing installation: numpy 1.26.4
Uninstalling numpy-1.26.4:
  Successfully uninstalled numpy-1.26.4
Found existing installation: h5py 3.15.1
Uninstalling h5py-3.15.1:
  Successfully uninstalled h5py-3.15.1
Found existing installation: tensorflow 2.20.0
Uninstalling tensorflow-2.20.0:
  Successfully uninstalled tensorflow-2.20.0
Note: you may need to restart the kernel to use updated packages.
Collecting numpy<2.0
  Using cached numpy-1.26.4-cp312-cp312-win_amd64.whl.metadata (61 kB)
Collecting tensorflow
  Using cached tensorflow-2.20.0-cp312-cp312-win_amd64.whl.metadata (4.6 kB)
Collecting h5py
  Using cached h5py-3.15.1-cp312-cp312-win_amd64.whl.metadata (3.1 kB)
Using cached numpy-1.26.4-cp312-cp312-win_amd64.whl (15.5 MB)
Using cached tensorflow-2.20.0-cp312-cp312-win_amd64.whl (331.9 MB)
Using cached h5py-3.15.1-cp312-cp312-win_amd64.whl (2.9 MB)
Installing collected packages: numpy, h5py, tensorflow
Successfully installed h5py-3.15.1 numpy-1.26.4 tensorflow-2.20.0
Note: 

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
streamlit 1.32.0 requires protobuf<5,>=3.20, but you have protobuf 6.33.2 which is incompatible.


In [1]:
# VERIFICATION CELL
# Run this AFTER restarting the kernel. 
# It must print a version starting with '1.' (e.g., 1.26.4). 
# If it says '2.0.0', the fix didn't work and you need to run the pip command in your terminal.
import numpy as np
print(f"Current Numpy Version: {np.__version__}")

if np.__version__.startswith("2"):
    raise RuntimeError("STOP! Numpy 2.0 is still active. Please restart the kernel again. If that fails, run 'pip install \"numpy<2\"' in your command prompt.")
else:
    print("SUCCESS: Numpy version is compatible. Proceeding to import TensorFlow.")

Current Numpy Version: 1.26.4
SUCCESS: Numpy version is compatible. Proceeding to import TensorFlow.


In [2]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
import cv2
import os
import glob
from skimage.measure import label, regionprops
from emnist import extract_training_samples, extract_test_samples

# Check for GPU
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  0


## 1. Load and Prepare EMNIST Data
We use the **Balanced** split (47 classes: 0-9, A-Z, a-z mapping merged for similarity).

In [4]:
print("Loading EMNIST data... (This might take a moment to download)")
X_train, y_train = extract_training_samples('balanced')
X_test, y_test = extract_test_samples('balanced')

# IMPORTANT: EMNIST images are rotated 90 degrees and flipped by default.
# We need to transpose them to look like normal characters.
X_train = np.array([image.T for image in X_train])
X_test = np.array([image.T for image in X_test])

# Normalize to [0, 1]
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# Reshape for CNN (28, 28, 1)
X_train = np.expand_dims(X_train, -1)
X_test = np.expand_dims(X_test, -1)

# One-hot encode labels
num_classes = len(np.unique(y_train))
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)

print(f"Data Loaded. Train: {X_train.shape}, Test: {X_test.shape}")

# Mapping for EMNIST Balanced
label_map = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghnqrt"

Loading EMNIST data... (This might take a moment to download)


Downloading emnist.zip: 32.0kB [00:00, 5.97MB/s]


BadZipFile: File is not a zip file

## 2. Train the CNN Model
We will train a Convolutional Neural Network and save it as `ocr_model.h5`.

In [None]:
model = Sequential([
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

print("Starting training...")
history = model.fit(X_train, y_train_cat, epochs=10, batch_size=128, validation_data=(X_test, y_test_cat), verbose=1)

print("Training complete.")

In [None]:
# Save the model
model_filename = 'ocr_model.h5'
model.save(model_filename)
print(f"Model saved as {model_filename}")

## 3. Predict on Handwriting Images
Now we use the trained model to find and recognize text in your images.

**Optimizations inserted:**
- **Noise Removal:** Median Blur + Morphological Opening.
- **Sorting:** Characters are sorted top-to-bottom, then left-to-right to handle multiple lines correctly.

In [None]:
def sort_regions_reading_order(regions, line_threshold=15):
    """
    Sorts regions in reading order (Top -> Bottom, Left -> Right).
    'line_threshold' is the pixel tolerance to consider regions on the same line.
    """
    # 1. Sort all by Y-coordinate (top to bottom)
    regions = sorted(regions, key=lambda r: r.bbox[0])
    
    lines = []
    current_line = []
    current_y = regions[0].bbox[0]
    
    for r in regions:
        # If this region is significantly lower, start a new line
        if r.bbox[0] > current_y + line_threshold:
            # Sort the completed line by X-coordinate (left to right)
            current_line.sort(key=lambda r: r.bbox[1])
            lines.extend(current_line)
            
            # Start new line
            current_line = [r]
            current_y = r.bbox[0]
        else:
            current_line.append(r)
            
    # Append the last line
    current_line.sort(key=lambda r: r.bbox[1])
    lines.extend(current_line)
    
    return lines

def prepare_segment(segment, target_size=28):
    """
    Resizes and pads a character segment to fit the 28x28 input of the CNN.
    Keeps aspect ratio and centers the image.
    """
    h, w = segment.shape
    if h == 0 or w == 0: return None
    
    # Add padding to aspect ratio
    padding = 4
    scale = (target_size - 2*padding) / max(h, w)
    new_h, new_w = int(h * scale), int(w * scale)
    
    resized = cv2.resize(segment, (new_w, new_h), interpolation=cv2.INTER_AREA)
    canvas = np.zeros((target_size, target_size), dtype='uint8')
    
    # Calculate center offset
    y_off = (target_size - new_h) // 2
    x_off = (target_size - new_w) // 2
    
    canvas[y_off:y_off+new_h, x_off:x_off+new_w] = resized
    return canvas

def predict_receipt(img_path, model):
    if not os.path.exists(img_path):
        print(f"File not found: {img_path}")
        return

    # 1. Read Image
    original = cv2.imread(img_path)
    gray = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)
    
    # 2. Preprocess (Noise Cleaning)
    # Gaussian Blur to smooth edges
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    # Adaptive Threshold to binarize
    binary = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
    # Morphological clean up
    kernel = np.ones((2,2), np.uint8)
    opened = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
    
    # 3. Find Char Candidates
    lbl = label(opened)
    regions = regionprops(lbl)
    
    # Filter noise by size
    valid_regions = []
    img_area = gray.shape[0] * gray.shape[1]
    for r in regions:
        h = r.bbox[2] - r.bbox[0]
        w = r.bbox[3] - r.bbox[1]
        if h > 10 and w > 5 and h*w < img_area * 0.1: # Min size 10x5, Max 10% of image
            valid_regions.append(r)
            
    if not valid_regions:
        print(f"No text found in {os.path.basename(img_path)}")
        return

    # 4. Sort Candidates (Reading Order)
    sorted_regions = sort_regions_reading_order(valid_regions)
    
    # 5. Predict
    predicted_text = ""
    annotated_img = original.copy()
    
    # To detect spaces, we check horizontal distance between characters
    last_max_col = 0
    
    for i, r in enumerate(sorted_regions):
        minr, minc, maxr, maxc = r.bbox
        
        # Check for space (heuristic: if distance > 15 pixels, add space)
        # Only if we are on the same line (this logic is simplified, works for single lines well)
        if i > 0 and (minc - last_max_col) > 20:
            predicted_text += " "
        last_max_col = maxc

        segment = opened[minr:maxr, minc:maxc]
        nn_input = prepare_segment(segment)
        
        if nn_input is not None:
            # Normalize and Reshape
            nn_input = nn_input.astype('float32') / 255.0
            nn_input = np.expand_dims(nn_input, -1)
            nn_input = np.expand_dims(nn_input, 0)
            
            prediction = model.predict(nn_input, verbose=0)
            char_idx = np.argmax(prediction)
            char = label_map[char_idx]
            predicted_text += char
            
            # Visuals
            cv2.rectangle(annotated_img, (minc, minr), (maxc, maxr), (0, 255, 0), 2)
            cv2.putText(annotated_img, char, (minc, minr-5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)

    # Show Result
    plt.figure(figsize=(10, 10))
    plt.imshow(cv2.cvtColor(annotated_img, cv2.COLOR_BGR2RGB))
    plt.title(f"Prediction: {predicted_text}")
    plt.axis('off')
    plt.show()
    print(f"File: {os.path.basename(img_path)} | Text: {predicted_text}")

In [None]:
# Run prediction on all images in folder
input_folder = "handwritten-receipts"
images = glob.glob(os.path.join(input_folder, "*.jpg")) + glob.glob(os.path.join(input_folder, "*.png"))

print(f"Found {len(images)} images to process.")
for img in images:
    with tf.device('/CPU:0'): # Force CPU if CUDA OOM issues occur with small models
        predict_receipt(img, model)