<!-- Codes by HTMLcodes.ws -->
<h1 style = "background-color:MediumSpringGreen;font-family:newtimeroman;font-size:250%;text-align:center;border-radius:15px 50px;">"ASL Fingerspelling Accuracy with Levenshtein Distance"</h1>


# Introduction

Welcome to the ASL Fingerspelling Translation competition, where AI empowers the Deaf and Hard of Hearing community to enhance communication. By training a specialized model, participants have the opportunity to revolutionize sign language recognition technology.

While voice-enabled assistants and AI solutions have revolutionized modern devices, they often overlook the 70+ million Deaf individuals worldwide and the 1.5+ billion people affected by hearing loss. Fingerspelling, a key aspect of ASL, uses hand shapes to represent letters and is frequently used for text input on mobile devices. Deaf smartphone users can fingerspell words faster than they can type on virtual keyboards. However, sign language recognition AI for text entry has been limited due to the lack of comprehensive datasets.

This competition aligns with Google's mission of universal accessibility and AI principles by exploring scalable solutions for sign language recognition. In collaboration with the Deaf Professional Arts Network, the competition aims to address individual user needs and expand to other sign languages.

Participating in this competition empowers Deaf and Hard of Hearing users to use fingerspelling instead of traditional keyboards. Beyond convenient text entry, there is potential for an app that translates fingerspelling into spoken words, facilitating smoother communication between the Deaf and non-signing individuals.

Join this competition to contribute to the advancement of sign language technology, bridging the gap between sign language and mainstream AI applications. Together, we can build a more inclusive and accessible future.

## What is ASL Fingerspelling Translation?

ASL Fingerspelling Translation refers to the process of translating American Sign Language (ASL) fingerspelling into written or spoken language using artificial intelligence (AI) technology. Fingerspelling is a fundamental aspect of ASL where hand shapes are used to represent letters of the alphabet. It is commonly used for spelling out words, names, or other specific terms in sign language.

The ASL Fingerspelling Translation competition harnesses the power of AI to improve the recognition and interpretation of fingerspelling gestures. Participants in the competition train AI models on specialized datasets to enhance the accuracy and efficiency of translating fingerspelling into written or spoken words. The goal is to develop scalable AI solutions that can benefit the Deaf and Hard of Hearing community by improving communication and accessibility through sign language recognition technology.

# **Install Dependencies**

In [1]:
%%capture
!pip install python-Levenshtein==0.12.0

Ref: [python-Levenshtein](https://pypi.org/project/python-Levenshtein/0.12.0/)

# **Import Modules**

In [2]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
import json
import tensorflow as tf
from tensorflow.keras import layers, optimizers, constraints, regularizers
import plotly.graph_objects as go
import plotly.io as pio
import os
from Levenshtein import distance
from datetime import datetime

plt.rcParams['figure.figsize'] = (12,6)
plt.style.use('fivethirtyeight')

import warnings
warnings.filterwarnings("ignore")

caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']


# **Load the Dataset**

In [3]:
train_df = pd.read_csv('/kaggle/input/asl-fingerspelling/train.csv')
train_df.head(4).style.set_properties(**{'background-color':'royalblue','color':'black','border-color':'#8b8c8c'})

Unnamed: 0,path,file_id,sequence_id,participant_id,phrase
0,train_landmarks/5414471.parquet,5414471,1816796431,217,3 creekhouse
1,train_landmarks/5414471.parquet,5414471,1816825349,107,scales/kuhaylah
2,train_landmarks/5414471.parquet,5414471,1816862427,0,hentaihubs.com
3,train_landmarks/5414471.parquet,5414471,1816909464,1,1383 william lanier


In [4]:
# Check the dimensions of the dataset
print(train_df.shape)

# Check the data types of columns
print(train_df.info())

(67287, 5)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67287 entries, 0 to 67286
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   path            67287 non-null  object
 1   file_id         67287 non-null  int64 
 2   sequence_id     67287 non-null  int64 
 3   participant_id  67287 non-null  int64 
 4   phrase          67287 non-null  object
dtypes: int64(3), object(2)
memory usage: 2.6+ MB
None


In [5]:
# Calculate the statistical of the dataset
styled_data = train_df.describe().style\
.background_gradient(cmap='coolwarm')\
.set_properties(**{'text-align':'center','border':'1px solid black'})

# display styled data
display(styled_data)

Unnamed: 0,file_id,sequence_id,participant_id
count,67287.0,67287.0,67287.0
mean,1094616847.933137,1072691863.586815,119.762346
std,639518479.478188,617732361.801586,74.333078
min,5414471.0,71095.0,0.0
25%,527708222.0,537640270.0,63.0
50%,1099408314.0,1074262620.0,113.0
75%,1662742697.0,1605477865.5,178.0
max,2118949241.0,2147465106.0,254.0


In [6]:
sequence_id = 1817362238
file_id = 5414471

In [7]:
sign_path = f"/kaggle/input/asl-fingerspelling/train_landmarks/{file_id}.parquet"
sign = pd.read_parquet(sign_path)


In [8]:
len(np.unique(sign.index))

1000

In [9]:
sequence = sign[sign.index == sequence_id]

In [10]:
suppl_df = pd.read_csv('/kaggle/input/asl-fingerspelling/supplemental_metadata.csv')
suppl_df.head(4).style.set_properties(**{'background-color':'lightgreen','color':'black','border-color':'#8b8c8c'})

Unnamed: 0,path,file_id,sequence_id,participant_id,phrase
0,supplemental_landmarks/33432165.parquet,33432165,1535467051,251,coming up with killer sound bites
1,supplemental_landmarks/33432165.parquet,33432165,1535499058,239,we better investigate this
2,supplemental_landmarks/33432165.parquet,33432165,1535530550,245,interesting observation was made
3,supplemental_landmarks/33432165.parquet,33432165,1535545499,38,victims deserve more redress


In [11]:
# Check the dimensions of the dataset
print(suppl_df.shape)

# Check the data types of columns
print(suppl_df.info())

(52958, 5)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52958 entries, 0 to 52957
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   path            52958 non-null  object
 1   file_id         52958 non-null  int64 
 2   sequence_id     52958 non-null  int64 
 3   participant_id  52958 non-null  int64 
 4   phrase          52958 non-null  object
dtypes: int64(3), object(2)
memory usage: 2.0+ MB
None


In [12]:
# Calculate the statistical of the dataset
styled_data = suppl_df.describe().style\
.background_gradient(cmap='coolwarm')\
.set_properties(**{'text-align':'center','border':'1px solid black'})

# display styled data
display(styled_data)

Unnamed: 0,file_id,sequence_id,participant_id
count,52958.0,52958.0,52958.0
mean,968039213.216889,1072800152.741191,132.738661
std,577928763.278782,616574821.784176,81.745528
min,33432165.0,28699.0,0.0
25%,471766624.0,541130791.0,53.0
50%,897287709.0,1069840365.0,135.0
75%,1471341722.0,1606032034.5,216.0
max,2100073719.0,2147472980.0,254.0


In [13]:
sequence_id = 1535585216
file_id = 33432165

In [14]:
sign_path = f"/kaggle/input/asl-fingerspelling/supplemental_landmarks/{file_id}.parquet"
sign = pd.read_parquet(sign_path)


# The Evaluation metric - Levenshtein distance:

## What is Levenshtein distance?

The Levenshtein distance, or edit distance, is a metric that quantifies the dissimilarity between two strings. It calculates the minimum number of single-character operations (insertions, deletions, or substitutions) needed to transform one string into another.

Named after Vladimir Levenshtein, a Soviet mathematician, the concept was introduced in 1965. The Levenshtein distance finds applications in various fields like spell checking, DNA sequence analysis, natural language processing, and computational linguistics.

To compute the Levenshtein distance, an algorithm constructs a matrix where each cell represents the cost of transforming one substring to another. Starting from the top-left cell and moving towards the bottom-right, the algorithm compares characters and determines the minimum cost using insertions, deletions, or substitutions. The value in the bottom-right cell represents the Levenshtein distance between the two strings.

The Levenshtein distance serves as a similarity measure between strings and is used in tasks such as string matching, string clustering, and fuzzy string searching. It also forms the basis for other string distance metrics, like the Damerau-Levenshtein distance, which incorporates transpositions as an additional operation.

In the context of the provided code, the Levenshtein distance is utilized to compute a distance matrix between two sequences. This matrix is then used for model training and performance evaluation.

* **Expression Metric**

Levenshtein distance, the expression metric = (N - D) / N represents a way to measure the similarity or dissimilarity between two strings based on their Levenshtein distance.

Let's break down the components of the expression:

* N: N represents the length of the longer string between the two compared strings. It is the maximum possible number of character positions that need to be considered.

* D: D corresponds to the Levenshtein distance between the two strings. It is the actual number of single-character edits required to transform one string into the other.

The expression (N - D) represents the number of character positions that are unchanged or require no edit operations to transform one string into the other. By subtracting D from N, we obtain the number of common characters or positions between the two strings.

Dividing (N - D) by N normalizes this value by the maximum possible number of character positions, N. This normalization results in a similarity metric ranging between 0 and 1, where 0 represents no similarity and 1 indicates an exact match.

Therefore, the expression metric = (N - D) / N provides a measure of similarity between two strings based on the Levenshtein distance. A value close to 1 suggests a high degree of similarity, while a value closer to 0 indicates a larger difference between the strings.

Ref:[Wikipedia - Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance)

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following 3 edits change one into the other, and there is no way to do it with fewer than 3 edits:

* kitten → sitten (substitution of "s" for "k"),
* sitten → sittin (substitution of "i" for "e"),
* sittin → sitting (insertion of "g" at the end).

![image](https://upload.wikimedia.org/wikipedia/commons/d/d1/Levenshtein_distance_animation.gif)

In [15]:
from Levenshtein import distance as lev

In [16]:
def levenshtein(seq1, seq2):
    size_x = len(seq1) + 1
    size_y = len(seq2) + 1
    matrix = np.zeros((size_x, size_y))

    for x in range(size_x):
        matrix[x, 0] = x

    for y in range(size_y):
        matrix[0, y] = y

    for x in range(1, size_x):
        for y in range(1, size_y):
            if seq1[x-1] == seq2[y-1]:
                matrix[x, y] = min(
                    matrix[x-1, y] + 1,
                    matrix[x-1, y-1],
                    matrix[x, y-1] + 1
                )
            else:
                matrix[x, y] = min(
                    matrix[x-1, y] + 1,
                    matrix[x-1, y-1] + 1,
                    matrix[x, y-1] + 1
                )

    return matrix[size_x - 1, size_y - 1]


In [17]:
import plotly.graph_objects as go

def plot_levenshtein_matrix(matrix):
    fig = go.Figure(data=go.Heatmap(z=matrix, colorscale='Viridis'))
    fig.update_layout(
        title='Levenshtein Distance Matrix',
        xaxis_title='Sequence 2',
        yaxis_title='Sequence 1'
    )
    fig.show()

# Example usage
seq1 = '3 creekhouse'
seq2 = 'scales/kuhaylah'
matrix = np.zeros((len(seq1) + 1, len(seq2) + 1))

distance = levenshtein(seq1, seq2)
print("Levenshtein Distance:", distance)

plot_levenshtein_matrix(matrix)


Levenshtein Distance: 12.0


# Inference Model

In [18]:
basedir = "/kaggle/working/"
NUM_CHARACTERS = 59

SEL_FEATURES = ['x_right_hand_0', 'y_right_hand_0', 'z_right_hand_0',
                # ... rest of the features ...
                'x_left_hand_20', 'y_left_hand_20', 'z_left_hand_20'
                ]
NUM_FEATURES = len(SEL_FEATURES)

d = {"selected_columns": SEL_FEATURES}

with open(f"{basedir}/inference_args.json", "w") as f:
    json.dump(d, f)


def get_dummy_model():
    inputs = tf.keras.Input(shape=(NUM_FEATURES), dtype=tf.float32, name="inputs")
    x = tf.where(tf.math.is_nan(inputs), tf.zeros_like(inputs), inputs)
    x = tf.keras.layers.Dense(NUM_CHARACTERS)(x)
    out = tf.keras.layers.Activation("linear", name="outputs")(x)
    inference_model = tf.keras.Model(inputs=inputs, outputs=out)
    inference_model.compile(loss="sparse_categorical_crossentropy",
                            metrics="accuracy")
    return inference_model


dummy_model_test = get_dummy_model()
dummy_model_test.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 inputs (InputLayer)            [(None, 6)]          0           []                               
                                                                                                  
 tf.math.is_nan (TFOpLambda)    (None, 6)            0           ['inputs[0][0]']                 
                                                                                                  
 tf.zeros_like (TFOpLambda)     (None, 6)            0           ['inputs[0][0]']                 
                                                                                                  
 tf.where (TFOpLambda)          (None, 6)            0           ['tf.math.is_nan[0][0]',         
                                                                  'tf.zeros_like[0][0]',      

In [19]:
converter = tf.lite.TFLiteConverter.from_keras_model(dummy_model_test)

tflite_model = converter.convert()
model_path = 'model.tflite'

with open(model_path, 'wb') as f:
    f.write(tflite_model)

!zip submission.zip  '/kaggle/working/model.tflite' '/kaggle/working/inference_args.json'

  adding: kaggle/working/model.tflite (deflated 28%)
  adding: kaggle/working/inference_args.json (deflated 49%)


In [20]:
CHECKING = False

if CHECKING:
    !pip install tflite-runtime==2.9.1
    import tflite_runtime.interpreter as tflite

    def load_relevant_data_subset(pq_path):
        return pd.read_parquet(pq_path, columns=SEL_FEATURES) #selected_columns)
    
    data_path = "/kaggle/input/asl-fingerspelling/train_landmarks/1021040628.parquet"
    frames = load_relevant_data_subset(data_path).values
    
    interpreter = tflite.Interpreter(model_path)
    found_signatures = list(interpreter.get_signature_list().keys())
    prediction_fn = interpreter.get_signature_runner("serving_default")
    
    with open ("/kaggle/input/asl-fingerspelling/character_to_prediction_index.json", "r") as f:
        character_map = json.load(f)
    rev_character_map = {j:i for i,j in character_map.items()}

In [21]:
if CHECKING:
    output = prediction_fn(inputs=frames)
    prediction_str = "".join([rev_character_map.gets(s,"")for s in np.argmax(output['outputs'], axis=1)])
    print("\n\n",prediction_str[:100])

![image](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTDAD-Z6e0Lad_MWmVJw-crpHqq-SFh9aBOdA&usqp=CAU)

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:110%;
           font-family:Verdana;
           letter-spacing:0.5px">

<p style="padding: 10px;
              color:white;">
Your upvote is a great way to show your support and help others discover this valuable resource.

<div class="alert alert-block alert-info"> 📌 Note: If you forks my notebook, please don't forget to upvote it. </div>
