---
# Writer Identification System Workbook

## Introduction

In this workbook, you will develop a simple Writer Identification System using classical machine learning and image processing techniques. The system identifies the writer of a given handwritten document by analyzing the text's handwriting style.

### Project Pipeline Overview

1. **Preprocessing Module:** Crop the handwritten region from the image and split it into separate lines.
2. **Feature Extraction Module:** Use Local Binary Patterns (LBP) to extract textural features from each line.
3. **Model Training Module:** Train a k-Nearest Neighbors (KNN) classifier using the extracted features.
4. **Performance Analysis Module:** Analyze the system's performance by comparing predicted results with actual results and calculating the processing time.
5. **Test Generation:** Use test cases generated from the IAM dataset for evaluation.

## 1. Preprocessing Module

In the preprocessing stage, we focus on isolating the handwritten part of the image and then splitting this part into individual lines of text.

### First Step: Crop the Handwritten Region

The handwritten part is between the second and third black lines in the image. Here’s how you can detect and crop this region:

#### Steps:

1. Convert the image to a binary image using Otsu's thresholding.
2. Find all contours in the image.
3. Identify contours that likely represent lines based on their geometry.
4. Sort these lines and use the second and third lines to determine the crop region.
5. Crop and optionally erode the image to reduce noise.

Below is the function to crop the handwritten region. Some parts are missing, which you need to fill in.

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
import os
import time
import statistics

def crop_handwritten_region(img_path):
    img = cv2.imread(img_path)
    imgray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    _, bin_img = cv2.threshold(imgray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    
    contours, _ = cv2.findContours(bin_img, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    
    y_array = []
    for cnt in contours:
        x, y, w, h = cv2.boundingRect(cnt)
        if w > 1000 and h < 500:
            y_array.append(y)
    
    y_array = sorted(y_array)
    cropped_image_bin = bin_img[y_array[1]+4:y_array[2], :]
    
    kernel = np.ones((5,5), np.uint8)
    cropped_image_bin = cv2.erode(cropped_image_bin, kernel, iterations=2)
    
    return cropped_image_bin

### Second Step: Split Cropped Image to Separated Written Lines

After cropping the handwritten region, we split this region into individual lines of text.

#### Steps:

1. Calculate the sum of black pixels for each row in the binary image.
2. Identify rows that mark the beginning and end of each line of text.
3. Use these row indices to split the image into lines.

Here is the function to split the image into lines. Fill in the missing parts.

In [None]:
def split_lines(cropped_img):
    sum_black_in_row = np.sum(cropped_img < 255, axis=1)
    lines = []
    i = 0
    
    while i < len(sum_black_in_row):
        if sum_black_in_row[i] > 15:
            up = max(0, i - 6)
            while i < len(sum_black_in_row) and sum_black_in_row[i] > 15:
                i += 1
            down = min(len(sum_black_in_row) - 1, i + 6)
            
            if down - up > 20:
                lines.append(cropped_img[up:down, :])
        i += 1
    
    return lines

## 2. Feature Extraction Module

Here, you will use the Local Binary Pattern (LBP) method to extract features from the handwritten lines.

### Local Binary Pattern (LBP)

LBP is a simple yet very efficient texture operator that labels the pixels of an image by thresholding the neighborhood of each pixel and considers the result as a binary number.

#### Steps:

1. For each pixel, compare its value with its 8 neighbors. Follow the pixels along a circle, so the first pixel compared is the top-left and the last is the middle-left.
2. Threshold the neighborhood with the center value and consider the result as a binary number.
3. Convert this binary number to a decimal number and use it as a new value for the center pixel.

Below is the function to calculate LBP for a single pixel. You need to complete the function by calculating the `lbp_val`.

In [None]:
def lbp_calculate_pixels(img, x, y, radius=3, neighbors=8):
    threshold = img[x, y]
    binary_string = []
    
    for i in range(neighbors):
        dx = round(radius * np.cos(2 * np.pi * i / neighbors))
        dy = round(radius * np.sin(2 * np.pi * i / neighbors))
        neighbor_value = img[x + dx, y + dy]
        binary_string.append(int(neighbor_value >= threshold))
    
    lbp_val = sum(val * (2 ** idx) for idx, val in enumerate(binary_string))
    return lbp_val

def lbp_get_result(img):
    height, width = img.shape
    result_img = np.zeros((height, width), dtype=np.uint8)
    
    for i in range(3, height - 3):
        for j in range(3, width - 3):
            result_img[i, j] = lbp_calculate_pixels(img, i, j)
    
    return result_img

### Feature Vector from LBP

After computing the LBP for each pixel, the next step is to calculate the histogram of these values. This histogram serves as a feature vector for the classifier.

In [None]:
def lbp_hist(lbp_img):
    histogram, _ = np.histogram(lbp_img.flatten(), bins=np.arange(257))
    return histogram

def lbp_normalize(histogram):
    return histogram / np.mean(histogram)

## 3. Model Training Module

Using the features extracted from the LBP, you can now train a KNN classifier.

In [None]:
def train_knn(features, labels):
    classifier = KNeighborsClassifier(n_neighbors=5)
    classifier.fit(features, labels)
    return classifier

## 4. Performance Analysis Module

In this module, you evaluate the performance of your model on unseen data.

```python
def predict_and_evaluate(test_img_path, classifier):
    _, _, test_lines = crop_handwritten_region(test_img_path)
    test_features = [lbp_normalize(lbp_hist(lbp_get_result(line))) for line in split_lines(test_lines)]
    
    predictions = classifier.predict(test_features)
    return np.bincount(predictions).argmax()
```

## 5. Test Generation

Utilize the IAM dataset for generating test cases.

### Usage

1. Preprocess your images using `crop_handwritten_region` and `split_lines`.
2. Extract features using `lbp_get_result`, `lbp_hist`, and `lbp_normalize`.
3. Train your model using `train_knn`.
4. Predict and evaluate with `predict_and_evaluate`.

---

## Additional Resources

- [OpenCV Documentation](https://docs.opencv.org/master/)
- [Scikit-learn KNN](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)
- [Numpy Documentation](https://numpy.org/doc/stable/)
- [Matplotlib Examples](https://matplotlib.org/stable/gallery/index.html)
- [Understanding Image Thresholding](https://learnopencv.com/otsu-thresholding-with-opencv/)

Feel free to search these terms in Google for more information:

- "Local Binary Patterns"
- "Image Contouring in OpenCV"
- "Histograms in Image Processing"
- "K-Nearest Neighbors Algorithm"
- "Image Preprocessing Techniques"

---

This workbook provides a structured approach to building a Writer Identification System using classical machine learning techniques in Python. Ensure to fill in the missing parts and understand each step to get the most out of this exercise!