# Introduction

This example demonstrates a simple OCR model built with the Functional API. Apart from combining CNN and RNN, it also illustrates how you can instantiate a new layer and use it as an "Endpoint layer" for implementing CTC loss. 

https://keras.io/examples/vision/captcha_ocr/

# Setup

In [3]:
import os
import numpy as np
import matplotlib.pyplot as plt

from pathlib import Path
from collections import Counter

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [4]:
from auto_everything.disk import Disk
from auto_everything.terminal import Terminal
disk = Disk()
t = Terminal()

In [5]:
root_folder = disk.create_a_new_folder_under_home("Keras_Lab/ocr_for_captchas")
root_folder

'/home/yingshaoxo/Keras_Lab/ocr_for_captchas'

# Load the data

Download and uncompress the dataset to `~/Keras_Lab/ocr_for_captchas/captcha_images_v2/`

In [6]:
download_file = os.path.join(root_folder, "captcha_images_v2.zip")
images_folder = os.path.join(root_folder, "captcha_images_v2")

if not disk.exists(download_file):
    t.run_command(
        f"""
    wget https://github.com/AakashKumarNain/CaptchaCracker/raw/master/captcha_images_v2.zip -P {root_folder}
        """
    )
    
if not disk.exists(images_folder):
    disk.uncompress(download_file, images_folder)
    
print(f"> Now you got {len(os.listdir(images_folder))} images.")

> Now you got 1041 images.


The dataset contains 1040 captcha files as png images. The label for each sample is a string, the name of the file (minus the file extension). 

We will map each character in the string to an integer for training the model. Similary, we will need to map the predictions of the model back to strings. For this purpose we will maintain two dictionaries, mapping characters to integers, and integers to characters, respectively.

In [7]:
# Path to the images directory
data_dir = Path(images_folder)

# Get list of all the images
images = sorted(list(map(str, list(data_dir.glob("*.png")))))
images[:2]

['/home/yingshaoxo/Keras_Lab/ocr_for_captchas/captcha_images_v2/226md.png',
 '/home/yingshaoxo/Keras_Lab/ocr_for_captchas/captcha_images_v2/22d5n.png']

In [8]:
# remove file extension from filename
labels = [img.split(os.path.sep)[-1].split(".png")[0] for img in images]
labels[:2]

['226md', '22d5n']

In [19]:
# example, get char from a list of strings
set(char for string in ["abc", "def"] for char in string)

{'a', 'b', 'c', 'd', 'e', 'f'}

In [20]:
# get char set from the list of labels
characters = set(char for label in labels for char in label)
characters

{'2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'm',
 'n',
 'p',
 'w',
 'x',
 'y'}

In [21]:
print("Number of images found: ", len(images))
print("Number of labels found: ", len(labels))
print("Number of unique characters: ", len(characters))
print("Characters present: ", characters)

Number of images found:  1040
Number of labels found:  1040
Number of unique characters:  19
Characters present:  {'n', 'x', 'e', 'm', 'f', '6', '3', 'b', '8', 'd', 'p', '7', 'c', '4', '2', 'w', 'y', '5', 'g'}


In [22]:
# Batch size for training and validation
batch_size = 16

# Desired image dimensions
img_width = 200
img_height = 50

# Factor by which the image is going to be downsampled
# by the convolutional blocks. We will be using two
# convolution blocks and each block will have
# a pooling layer which downsample the features by a factor of 2.
# Hence total downsampling factor would be 4.
downsample_factor = 4

# Maximum length of any captcha in the dataset
max_length = max([len(label) for label in labels])