# Motivation
Now that I've got a preliminary method to detect Boggle tiles (from **`03. Finalizing Board Detection`**), I want to try and test it on the data that I've got. 

# Setup
The cells below will set up the rest of this notebook. 

First, I'll configure the kernel: 

In [1]:
# Change directories to the root of the project
%cd ..

# Enable autoreload of modules
%load_ext autoreload
%autoreload 2

d:\data\programming\boggle-vision


Next, I'll import some relevant libraries:

In [2]:
# Import statements
import cv2
import os
import pandas as pd
from pathlib import Path
import math
from matplotlib import pyplot as plt
import numpy as np
from statistics import mode
import utils
import cv2
import pytesseract
from PIL import Image
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor
import torch
import traceback

# Importing custom modules
import utils.board_detection as board_detect
from utils.cnn import BoggleCNN, EnhancedBoggleCNN
from utils.settings import allowed_boggle_tiles

# # Set up an EasyOCR reader
# import easyocr
# reader = easyocr.Reader(['en'], gpu=False)

# Set up the model
net = EnhancedBoggleCNN()
net.load_state_dict(torch.load("models/boggle_cnn.pth"))

<All keys matched successfully>

# Loading Data
Here, I'm going to load in all of the pictures, as well as some information about each of them. 

In [3]:
# Open the .csv file containing the labeled boards
board_data_df = pd.read_csv("data/labeled-boards.csv")

# Add a column which is the parsed letter sequence
board_data_df["parsed_letter_sequence"] = board_data_df["letter_sequence"].apply(
    lambda letter_list: letter_list.split(";")
)

# Load all of the images using cv2
file_path_to_image = {}
for row in board_data_df.itertuples():
    file_path_to_image[row.file_path] = cv2.imread(row.file_path)

# Parsing Boards
Below, I'm going to run each of the boards through a "parsing" method. 

In [45]:
# We'll collect some results about the board data here
all_parsed_boards_df_records = []

# Iterate through all of the rows in the board data
for row in tqdm(list(board_data_df.query("difficulty == 'easy'").itertuples())):
    
    # Try and parse the board
    error_msg = None
    try:
        parsed_board_df = board_detect.parse_boggle_board(
            file_path_to_image[row.file_path],
            max_image_height=1200,
            # easyocr_reader=reader,
            model=net
        )

        letter_sequence = list(parsed_board_df["letter"])

    except Exception as e:
        error_msg = str(e)
        letter_sequence = None

    # Add some information to the all_parsed_boards_df_records
    all_parsed_boards_df_records.append(
        {
            "file_path": row.file_path,
            "letter_sequence": letter_sequence,
            "error_msg": error_msg,
        }
    )

# Parse the results into a dataframe
all_parsed_boards_df = pd.DataFrame(all_parsed_boards_df_records)

100%|██████████| 30/30 [00:22<00:00,  1.34it/s]


# Validating Results
Now that I've got the boards parsed, I want to spend some time validating the results. 

In [46]:
# Merge the DataFrames together
results_to_validate_df = board_data_df.merge(
    all_parsed_boards_df.rename(
        columns={"letter_sequence": "predicted_letter_sequence"}
    ),
    on="file_path",
    how="inner",
)


def match_letter_sequence(sequence_one, sequence_two):
    """
    This function will compare two letter sequences and return a list of
    booleans indicating whether the letters match or not. Each index of the
    list will correspond to a letter in the sequence.
    """

    # If either of the sequences are empty or None, return None
    if not sequence_one or not sequence_two:
        return None

    # If the sequences are not the same length, return None
    if len(sequence_one) != len(sequence_two):
        return None

    # We'll store the results in a list
    results = []

    # Iterate through all of the letters in the sequence
    for index, letter in enumerate(sequence_one):
        other_letter = sequence_two[index]
        results.append(letter == other_letter)

    # Return the results
    return results


results_to_validate_df["letter_sequence_match"] = results_to_validate_df.apply(
    lambda row: match_letter_sequence(
        row["parsed_letter_sequence"], row["predicted_letter_sequence"]
    ),
    axis=1,
)

# Add a column indicating the percent of letters that match
results_to_validate_df["percent_match"] = results_to_validate_df.apply(
    lambda row: sum(row["letter_sequence_match"]) / len(row["letter_sequence_match"])
    if row["letter_sequence_match"]
    else None,
    axis=1,
)

# Add a column indicating which letters don't match
results_to_validate_df["errors"] = results_to_validate_df.apply(
    lambda row: [
        {
            "actual": row.parsed_letter_sequence[letter_idx],
            "predicted": row.predicted_letter_sequence[letter_idx],
            "idx": letter_idx
        }
        for letter_idx, letter_match in enumerate(row.letter_sequence_match)
        if not letter_match
    ]
    if row.letter_sequence_match
    else None,
    axis=1,
)

results_to_validate_df["correct"] = results_to_validate_df.apply(
    lambda row: [
        {
            "actual": row.parsed_letter_sequence[letter_idx],
            "predicted": row.predicted_letter_sequence[letter_idx],
            "idx": letter_idx
        }
        for letter_idx, letter_match in enumerate(row.letter_sequence_match)
        if letter_match
    ] 
    if row.letter_sequence_match
    else None,
    axis=1,
)

# Add a column indicating how many errors there are 
results_to_validate_df["num_errors"] = results_to_validate_df.apply(
    lambda row: len(row.errors) if row.errors else None,
    axis=1,
)

In [47]:
print(f"Average percent match: {results_to_validate_df.percent_match.mean()}")

Average percent match: 0.8064814814814816


In [48]:
results_to_validate_df.iloc[0]

file_path                                       data\test-pictures\easy-01.png
difficulty                                                                easy
letter_sequence              Y;G;R;L;H;N;E;T;T;N;T;O;Th;F;E;E;E;N;C;L;E;T;H...
parsed_letter_sequence       [Y, G, R, L, H, N, E, T, T, N, T, O, Th, F, E,...
predicted_letter_sequence    [T, G, R, L, H, N, L, Y, T, N, T, O, Th, F, E,...
error_msg                                                                 None
letter_sequence_match        [False, True, True, True, True, True, False, F...
percent_match                                                         0.833333
errors                       [{'actual': 'Y', 'predicted': 'T', 'idx': 0}, ...
correct                      [{'actual': 'G', 'predicted': 'G', 'idx': 1}, ...
num_errors                                                                   6
Name: 0, dtype: object

In [49]:
results_to_validate_df.iloc[0].errors

[{'actual': 'Y', 'predicted': 'T', 'idx': 0},
 {'actual': 'E', 'predicted': 'L', 'idx': 6},
 {'actual': 'T', 'predicted': 'Y', 'idx': 7},
 {'actual': 'C', 'predicted': 'U', 'idx': 18},
 {'actual': 'R', 'predicted': 'H', 'idx': 25},
 {'actual': 'L', 'predicted': 'U', 'idx': 29}]