# Task 2 - CV + NLP Sentiment Analysis of Handwritten Text Using Custom-built OCR and Sentiment Classification Models
The primary objective of this project is to use artificial intelligence to convert handwritten text images into digital text and subsequently perform sentiment analysis on the extracted text. The project will leverage the provided labelled dataset of handwritten alphabets (alphabets_dataset.zip) and the sentiment analysis dataset (sentiment_analysis_dataset.csv). Use of pre-trained models is not allowed. Finally, the performance is to be tested and reported on the images provided in the target_images
folder (whose labels are in target_labels.csv).
## Answer:
The following code snippet is responsible for creating and training a convolutional neural network (CNN) model to recognize handwritten characters from images of letters A-Z
- The models.Sequential function initializes a sequential model, allowing layers to be added in sequence.
- layers.Input(shape=(28, 28, 1)) specifies the input shape for each image, which is 28x28 pixels with 1 channel (grayscale).
- layers.Conv2D(32, (3, 3), activation='relu') adds a 2D convolutional layer with 32 filters of size 3x3, using ReLU activation function.
- layers.MaxPooling2D((2, 2)) adds a max pooling layer with pool size 2x2, which reduces spatial dimensions to capture the most important features.
- layers.Conv2D(64, (3, 3), activation='relu') adds another convolutional layer with 64 filters of size 3x3 and ReLU activation.
- Another layers.MaxPooling2D((2, 2)) follows to further downsample the feature maps.
- layers.Flatten() flattens the 2D feature maps into a 1D vector to prepare for fully connected layers.
- layers.Dense(128, activation='relu') adds a fully connected layer with 128 neurons and ReLU activation.
- layers.Dense(26, activation='softmax') adds the output layer with 26 neurons (one for each letter from A to Z), using softmax activation to output probabilities of each class.

Finally save it as handwritten.keras

In [None]:
import zipfile
import numpy as np
import cv2
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers, models
import os
import warnings
warnings.filterwarnings("ignore")

def load_handwritten_data_from_zip(zip_path):
    images = []
    labels = []
    
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        # Extract and read the CSV file
        with zip_ref.open('alphabet_labels.csv') as csvfile:
            labels_df = pd.read_csv(csvfile)
        
        # Create a dictionary from the CSV file
        label_dict = dict(zip(labels_df['file'], labels_df['label']))

        for file_name in zip_ref.namelist():
            if file_name.endswith('.png'):  # Adjust if images are in a different format
                image_name = os.path.basename(file_name)
                if image_name in label_dict:
                    label = label_dict[image_name]
                    if len(label) == 1:  # Ensure the label is a single character
                        with zip_ref.open(file_name) as file:
                            # Read image from the file
                            image = np.frombuffer(file.read(), np.uint8)
                            image = cv2.imdecode(image, cv2.IMREAD_GRAYSCALE)
                            image = cv2.resize(image, (28, 28))  # Resize to 28x28 pixels
                            images.append(image)
                            labels.append(ord(label) - ord('A'))  # Convert character to numerical value
                    else:
                        print(f"Skipping file with invalid label: {file_name}")
                else:
                    print(f"Skipping file with no matching label: {file_name}")

    images = np.array(images).reshape(-1, 28, 28, 1) / 255.0
    labels = np.array(labels)
    return images, labels

# Load images and labels from the ZIP file
images, labels = load_handwritten_data_from_zip('alphabets_dataset.zip')

if len(images) == 0:
    print("No valid images found. Please check the dataset and label extraction logic.")
else:
    # Create and train the model
    model = models.Sequential([
        layers.Input(shape=(28, 28, 1)),
        layers.Conv2D(32, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(26, activation='softmax')  # 26 classes for A-Z
    ])

    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    model.fit(images, labels, epochs=10, validation_split=0.2)
    model.save('handwritten.keras')

Now, we need to get sentences from the images using the model we created and find the sentiment by training and testing it here.
1. Image Processing:
    - preprocess_image(image_path): Reads an image, converts it to grayscale, applies Gaussian blur, and thresholds it to create a binary image (binary)
    - detect_text_regions(binary_image): Labels connected components in the binary image and identifies bounding boxes around them.
    - recognize_text(image, bounding_boxes, model): Extracts each character from identified bounding boxes, preprocesses them for the Keras model, makes predictions, and converts predictions into characters.
2. Training Sentiment Analysis Models:
   - Splits the dataset into training and testing sets for each sentiment category ('Angry', 'Happy', 'Neutral').
   - Trains three Naive Bayes classifiers (clf_angry, clf_neutral, clf_happy) using the TF-IDF transformed data
3. Processing and Analysis:
   - draw_and_group_boxes(image_paths, output_paths): Processes multiple images (image_paths), detects text regions, recognizes characters, and groups them into meaningful text units (grouped_texts). It also draws bounding boxes around recognized characters on the original images and saves them to specified paths (output_paths).

In [2]:
import os
import pandas as pd
import cv2
import numpy as np
from scipy.ndimage import label
from keras.models import load_model
from keras.preprocessing.image import img_to_array
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn import naive_bayes
from sklearn.metrics import roc_auc_score
from sklearn.impute import SimpleImputer
import warnings
import contextlib
warnings.filterwarnings("ignore")

# Suppress TensorFlow/Keras messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Download NLTK stopwords
nltk.download('stopwords')

# Load dataset
df = pd.read_csv("sentiment_analysis_dataset.csv")

# Mapping sentiments to binary labels
y_angry = df['sentiment'].map({'Angry': 1, 'Happy': 0, 'Neutral': 0})
y_neutral = df['sentiment'].map({'Angry': 0, 'Happy': 0, 'Neutral': 1})
y_happy = df['sentiment'].map({'Angry': 0, 'Happy': 1, 'Neutral': 0})

# Initialize stopwords set and convert it to a list
stopset = set(stopwords.words('english'))
stop_words_list = list(stopset)

# Initialize TF-IDF vectorizer with correct stop_words parameter
vectorizer = TfidfVectorizer(use_idf=True, lowercase=True, strip_accents='ascii', stop_words=stop_words_list)

# Vectorize text data
X = vectorizer.fit_transform(df['line'])

# Split dataset into training and testing sets for each sentiment category
X_train_angry, X_test_angry, y_train_angry, y_test_angry = train_test_split(X, y_angry, random_state=42)
X_train_neutral, X_test_neutral, y_train_neutral, y_test_neutral = train_test_split(X, y_neutral, random_state=42)
X_train_happy, X_test_happy, y_train_happy, y_test_happy = train_test_split(X, y_happy, random_state=42)

# Train Naive Bayes classifiers for each sentiment category
clf_angry = naive_bayes.MultinomialNB()
clf_angry.fit(X_train_angry, y_train_angry)

clf_neutral = naive_bayes.MultinomialNB()
clf_neutral.fit(X_train_neutral, y_train_neutral)

clf_happy = naive_bayes.MultinomialNB()
clf_happy.fit(X_train_happy, y_train_happy)

# Load your trained model
model = load_model('handwritten.keras')

def preprocess_image(image_path):
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    if image is None:
        raise ValueError("Image not found or unable to load.")
    blurred = cv2.GaussianBlur(image, (5, 5), 0)
    _, binary = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    return binary

def detect_text_regions(binary_image):
    labeled_image, num_features = label(binary_image)
    bounding_boxes = []
    for component in range(1, num_features + 1):
        coords = np.column_stack(np.where(labeled_image == component))
        x, y, w, h = cv2.boundingRect(coords)
        bounding_boxes.append((x, y, w, h))
    return bounding_boxes

def recognize_text(image, bounding_boxes, model):
    recognized_text = []
    for (x, y, w, h) in bounding_boxes:
        roi = image[y:y+h, x:x+w]
        if roi.size == 0:
            continue
        roi = cv2.resize(roi, (28, 28))
        roi = roi.astype("float") / 255.0
        roi = img_to_array(roi)
        roi = np.expand_dims(roi, axis=0)
        with open(os.devnull, 'w') as f:
            with contextlib.redirect_stdout(f):
                prediction = model.predict(roi)
        character = chr(np.argmax(prediction) + ord('a'))
        recognized_text.append(character)
    return recognized_text

def draw_and_group_boxes(image_paths, output_paths):
    grouped_texts = []

    for i, image_path in enumerate(image_paths):
        binary_image = preprocess_image(image_path)
        bounding_boxes = detect_text_regions(binary_image)
        bounding_boxes = sorted(bounding_boxes, key=lambda bb: (bb[0], bb[1]))
        recognized_characters = recognize_text(binary_image, bounding_boxes, model)

        grouped_text = []
        current_word = []
        word_distance_threshold = 5
        character_idx = 0

        for idx, (x, y, w, h) in enumerate(bounding_boxes):
            if character_idx >= len(recognized_characters):
                break
            if idx > 0 and (x > bounding_boxes[idx-1][0] + bounding_boxes[idx-1][2] + word_distance_threshold):
                if current_word:
                    grouped_text.append("".join(current_word))
                    current_word = []
            
            current_word.append(recognized_characters[character_idx])
            character_idx += 1

        if current_word:
            grouped_text.append("".join(current_word))
        
        grouped_texts.append(grouped_text)
        # Load the original image to display bounding boxes
        original_image = cv2.imread(image_path)
        # Draw bounding boxes on the original image
        for (x, y, w, h) in bounding_boxes:
            cv2.rectangle(original_image, (x, y), (x + w, y + h), (0, 255, 0), 2)

        # Save the image with bounding boxes
        cv2.imwrite(output_paths[i], original_image)
    
    return grouped_texts

# Example usage
image_paths = ['target_images/line_1.png', 'target_images/line_2.png', 'target_images/line_3.png', 'target_images/line_4.png', 'target_images/line_5.png', 'target_images/line_6.png']
output_paths = ['split_letters_1.png', 'split_letters_2.png', 'split_letters_3.png', 'split_letters_4.png', 'split_letters_5.png', 'split_letters_6.png']
grouped_texts = draw_and_group_boxes(image_paths, output_paths)

# Display grouped text as a sentence with spaces in between
for idx, text in enumerate(grouped_texts):
    sentence = ' '.join(text)
    Sentence = vectorizer.transform([sentence])
    print(f"Line {idx+1}: {sentence}")
    # Print predictions for each sentiment category
    if clf_angry.predict(Sentence.toarray()) > 0.33:
        print("Sentiment Prediction: Angry")
    elif clf_neutral.predict(Sentence.toarray()) > 0.33:
        print("Sentiment Prediction: Neutral")
    elif clf_happy.predict(Sentence.toarray()) > 0.33:
        print("Sentiment Prediction: Happy")
    else:
        print("Can't predict the sentiment")


[nltk_data] Downloading package stopwords to /home/skp123/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Line 1: lonnaulo xylysxys aatvxaxx nvayxwpsm nxnoouspn ollyryut xxnxcxysl ncourmuav loxulnox lcmpvoym
Can't predict the sentiment
Line 2: lyccsymo oranuytwxym xlxsutpx plxxnyzy aoxuosalcyw xlmulncxl wxxxauxa mnlcxyws ucovuxxo svxsxxar
Can't predict the sentiment
Line 3: zuxuxsyxt xlncxcam pbcumnlma musxlcxp asxxarxol mmxylsbt ylcbsarm xnrynatvc xyvclbwuz clsrarx
Can't predict the sentiment
Line 4: llsxxpxm unpnynxct sxtrwaou yumsmmmcu sxnsollz xnpxupxymu wapsxwxmr rywnnuru auuaxpcm suvxsanyt
Can't predict the sentiment
Line 5: sxmyvsyll ouvolxvt samacuxx rtvrlyuwx ommnorxlx llnuyoxaxxx nxtltxuxnnx mmuryapam lusxrruxx oonuacrx
Can't predict the sentiment
Line 6: vsyclxavx avxoaaawcxsxlx cmxosluryt xulpsllu mnpclnlnx rxuulucao btrmcwyxx clwuyvnxam pxynnaxsm xurlcaxccxx
Can't predict the sentiment


As we can see the output is not what we expect. The split_letters are near to what we expect but still there are lot of errors and I didn't get the sentiment part. I tried my best to get the output but i wasn't successful :(