# Sign Language Real Time Detection

## Decomposition

According to the World Health Organization (WHO) statistics, there are 466 million people with a hearing loss disability in 2020[1,2] and is estimated to increase to over 900 million people with a hearing loss disability by 2050. With that, there are about 300 different types of sign languages (SL) use across the world. SL has served as the bridge for communication between deaf and nomal people through the usage of hand gestures, postures, movements, and facial expressions. The most common SLs are American Sign Language (ASL), Spanish Sign Language (SSL), Australian Sign Language (AUSLAN), and Arabic Sign Language (ArSL). Unfortunately, it is difficult to estimate the number of people that know a type of sign language as consensus surveys do not include SL as a type of language spoken[3], however it can be said that people requiring the usage of sign language to communicate are often at a disadvantage as sign language is not majorily adopted and there are fewer number of specialized people who are fluent or professional in practicing a form of SL [4]. This results in the need of an interpreter to translate real-world words and sentences to deaf people can understand us and vice versa. 

The different types of sign languages have many linguistics that are difficult to be understood by researchers interested in developing technology for SL. This requires the usage of experts of SL to faciliate these difficulties. Sign Language Processing (SLP) is the process of understanding sign language through the use of a machine[6]. This can be expanded into the domains of Sign Language Recognition (SLR), Sign Language Identification (SLID), Sign Language Synthesis, and Sign Language Translation. This system will consider the tasks of SLR and Sign Language Translation. SLR encompasses the translation of any hand gesture and posture included in SL in order to generate text that normal hearing people are able to understand. For SLR, the feature extraction step is a crucial phase in the recognition system. In the current age of technology, deaf people need a way to interact with normal hearing people that does not also require the need of a translator. Deaf people who need to contact and attend online meetings through a platforms such as Zoom, Microsoft Teams, and Google Meetings need a system that allows for the real time process of sign  to test generation. Since there is a limited quantity of specialized sign language translators, this system would not only allow businesses to accommodate and better integrate deaf people in aspects of work and daily life and better facilite communication between SL and non SL speakers.

## Domain Expertise

There are about 400,000 U.S. residents that are considered profoundly deaf, with an additional 20 million are classified as hard-of-hearing[7]. Another report by the National Deaf Center, found that about 53% of deaf people were employed in 2017, and that the employment rates for deaf people have not increased from 2008 to 2017[6]. Deaf people face considerable challenges in finding jobs and the failure of providing a SL interpreter can sink a deaf job seeker's chances at an interview. The shortage of interpreters and many different types of SLs calls for researchers interested in developing SLP technology to seek experts of SLs. One of the biggest difficulties stems from SL's phonological features. These types of features are represented by hand gestures, facial expressions and body movements. Each one of these three phonological features has its own shape that allows to differentiate one sign from another. For example, "drink" has similar phonological features and can be represented similarly in three different SLs; ASL, ArSL, SSl [8]. Thus it is important to include experts of multiple SLs in order to develop a system that is not only able to accurately recognize the signing of one of the SLs but also help translate the discrepencies across other SLs. 

## Data
Important considerations to take into account for data collection to be used in SLP are skin, body detection, image segmentation, feature extraction, gesture recognition, and sign identification. There are two main preprocessing steps for the data collecton. The first starts with the video rendering of the signer and applying a color space conversion from BGR to RGB and after all the processes are done to convert it back to BGR from RGB. Second part of pre-processing and data collection comes from using MediaPipe's Holistic package. This allows for the live perception simultaneous human poses, face landmarks, and hand tracking in real time applications. In collecting data for training, OpenCV will capture the holistics from the MediaPipe model and map the associated landmarks to coordinate points. 30 videos with a frame size of 30 will be collected for each word that the model is to be trained on and will be saved to a directory. In developing this system to a deployable application, one of the most important aspects from manually training is ethical data collection from users themselves and also must ensure that there is a broad representation of all users to avoid any discriminatory biases. 

## Import Modules

In [1]:
import cv2
import numpy as np
from matplotlib import pyplot as plt
import time
import os 
import mediapipe as mp
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.callbacks import TensorBoard
from sklearn.metrics import multilabel_confusion_matrix, accuracy_score
from gtts import gTTS
from io import BytesIO

2023-01-29 07:26:49.001126: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-29 07:26:49.081349: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/wsv/anaconda3/envs/signLang/lib/python3.10/site-packages/cv2/../../lib64:
2023-01-29 07:26:49.081369: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-01-29 07:26:49.605356: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfe

In [2]:
mp_holistic = mp.solutions.holistic

mp_draw = mp.solutions.drawing_utils

### Defining the model and handling color space conversion

In [3]:
def mp_detection(image, model):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # BGR to RGB
    image.flags.writeable = False                  
    results = model.process(image)                 # Make Prediction
    image.flags.writeable = True                   
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # RGB to BGR
    return image, results

In [4]:
def draw_landmarks(image, results):
    mp_draw.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_TESSELATION) # FACEMESH_CONTOURS
    mp_draw.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS) # Poses
    mp_draw.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS) # Left Hand
    mp_draw.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS) # Right Hand

In [5]:
def draw_styled_landmarks(image, results):
    # Draw face connections
    mp_draw.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_TESSELATION, 
                             mp_draw.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1), 
                             mp_draw.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                             ) 
    # Draw pose connections
    mp_draw.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS,
                             mp_draw.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4), 
                             mp_draw.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                             ) 
    # Draw left hand connections
    mp_draw.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                             mp_draw.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4), 
                             mp_draw.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                             ) 
    # Draw right hand connections  
    mp_draw.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                             mp_draw.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4), 
                             mp_draw.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                             ) 

### Implementing MediaPipe's real-time capture of holistics

In [6]:
vid_capture = cv2.VideoCapture(0)
# mediapipe model
with mp_holistic.Holistic(min_detection_confidence=0.5,
                         min_tracking_confidence=0.5) as holistic:
    while vid_capture.isOpened():

        # read in feed
        ret, frame = vid_capture.read()

        # Make detections
        image, results = mp_detection(frame, holistic)
        
        # draw landmarks
        draw_styled_landmarks(image, results)

        # show to screen
        cv2.imshow('OpenCV Feed', image)

        # break when hitting q
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
    vid_capture.release()
    cv2.destroyAllWindows()


INFO: Created TensorFlow Lite XNNPACK delegate for CPU.


## Sample the CV output w/ MediaPipe's Holistics

![Screenshot%20from%202023-01-29%2004-42-47.png](attachment:Screenshot%20from%202023-01-29%2004-42-47.png)

### Determining total count of landmarks

In [7]:
len(results.pose_landmarks.landmark)

33

In [8]:
len(results.face_landmarks.landmark)

468

In [9]:
468*3+33*4+21*3+21*3

1662

In [10]:
pose = []
for res in results.pose_landmarks.landmark:
    test = np.array([res.x, res.y, res.z, res.visibility])
    pose.append(test)

### Error handling if body part not in frame

In [11]:
pose = np.array([[res.x, res.y, res.z, res.visibility] for res in results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else np.zeros(132)
face = np.array([[res.x, res.y, res.z] for res in results.face_landmarks.landmark]).flatten() if results.face_landmarks else np.zeros(1404)
lh = np.array([[res.x, res.y, res.z] for res in results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else np.zeros(21*3)
rh = np.array([[res.x, res.y, res.z] for res in results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks else np.zeros(21*3)

### Extracting keypoints of the holistic

In [12]:
def extract_keypoints(results):
    pose = np.array([[res.x, res.y, res.z, res.visibility] for res in results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else np.zeros(132)
    face = np.array([[res.x, res.y, res.z] for res in results.face_landmarks.landmark]).flatten() if results.face_landmarks else np.zeros(1404)
    lh = np.array([[res.x, res.y, res.z] for res in results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else np.zeros(21*3)
    rh = np.array([[res.x, res.y, res.z] for res in results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks else np.zeros(21*3)
    return np.concatenate([pose, face, lh, rh])

In [13]:
extract_keypoints(results).shape

(1662,)

### Folders for data collection

In [14]:
# Path for exported data i.e.(numpy arrays)
DATA_PATH = os.path.join('mediapipe_data') 

# Actions to detect
actions = np.array(['hello', 'goodbye', 'my', 'name is'])

# Number of videos of data
no_sequences = 30

# Videos have 30 frames
sequence_length = 30


In [15]:
# Create Mediapipe Data Directory
for action in actions: 
    for sequence in range(no_sequences):
        try: 
            os.makedirs(os.path.join(DATA_PATH, action, str(sequence)))
        except:
            pass

## Collect keypoint values for training and testing 

In [16]:
cap = cv2.VideoCapture(0)
# Set mediapipe model 
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
    
    # loop through actions
    for action in actions:
        # loop through sequences i.e.(videos)
        for sequence in range(no_sequences):
            # loop through video length i.e(sequence length)
            for frame_num in range(sequence_length):

                # read feed
                ret, frame = cap.read()

                # make detections
                image, results = mp_detection(frame, holistic)

                # draw landmarks
                draw_styled_landmarks(image, results)
                
                # wait logic
                if frame_num == 0: 
                    cv2.putText(image, 'STARTING COLLECTION', (120,200), 
                               cv2.FONT_HERSHEY_SIMPLEX, 1, (0,255, 0), 4, cv2.LINE_AA)
                    cv2.putText(image, 'Collecting frames for {} Video Number {}'.format(action, sequence), (15,12), 
                               cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1, cv2.LINE_AA)
                    # shown on screen
                    cv2.imshow('OpenCV Feed', image)
                    cv2.waitKey(1000)
                else: 
                    cv2.putText(image, 'Collecting frames for {} Video Number {}'.format(action, sequence), (15,12), 
                               cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1, cv2.LINE_AA)
                    # shown on screen
                    cv2.imshow('OpenCV Feed', image)
                
                # extract keypoints 
                keypoints = extract_keypoints(results)
                npy_path = os.path.join(DATA_PATH, action, str(sequence), str(frame_num))
                np.save(npy_path, keypoints)

                # break if press q
                if cv2.waitKey(10) & 0xFF == ord('q'):
                    break
                    
    cap.release()
    cv2.destroyAllWindows()

## Preprocessing Data 

In [17]:
label_map = {label:num for num, label in enumerate(actions)}

In [18]:
label_map

{'hello': 0, 'goodbye': 1, 'my': 2, 'name is': 3}

In [19]:
sequences, labels = [], []
for action in actions:
    for sequence in range(no_sequences):
        window = []
        for frame_num in range(sequence_length):
            res = np.load(os.path.join(DATA_PATH, action, str(sequence), "{}.npy".format(frame_num)))
            window.append(res)
        sequences.append(window)
        labels.append(label_map[action])
        

In [59]:
np.array(sequences).shape

(120, 30, 1662)

In [60]:
np.array(labels).shape

(120,)

In [61]:
X = np.array(sequences)

In [62]:
y = to_categorical(labels).astype(int)

In [63]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05)

## Design 
The system makes use of a Long Short-Term (LSTM) network, which is a type of recurrent neural network that is capable of learning order dependence in sequence prediction problems. This is important for our system since sign language to text detection is done in sequence to form sentences. Bengio et al. (1994) states that the 3 basic requirements of a recurrent neural network is that system is; 1 - able to store information for an arbitrary duration, 2 - system must be resistant to noise, 3 - the parameters of the system must be trainable. The choice to implement an LSTM is because of its ability to process sequences of data that would have otherwise been affected by noise in the moving gestures.

## building and training LSTM Neural Network

### Can access logs by going to Log\train directory and running "tensorboard --logdir=." in terminal 

In [80]:
# Setting up TensorBoard Callback 
log_dir = os.path.join('Logs')
tb_callback = TensorBoard(log_dir=log_dir)

In [81]:
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(30,1662)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(actions.shape[0], activation='softmax'))

In [82]:
# Compile Model and Fit It
opt = keras.optimizers.Adam(learning_rate=0.003)
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])

In [83]:
model.fit(X_train, y_train, epochs=100, callbacks=[tb_callback])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100


Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x7f61ac35d240>

In [84]:
model.save('model_eta003.h5') # 83.33%

In [39]:
model.save('model_eta005.h5')    # 74.56%

In [85]:
model.load_weights('model_eta003.h5') # loads the weights

## Diagnosis 
This sign language detection system will implement a confusion matrix to assess it's performance. This is done so since the output of the signing gesture is converted to a categorical text output. In this demonstration, 4 values were trained on {'hello': 0, 'goodbye': 1, 'my': 2, 'name is': 3}. This is done using American Sign Language but further training should include other variations of Sign Language and implement a way avoid confusion between similar gestures in different languages. 

### Evaluate Model

In [86]:
y_hat = model.predict(X_test)



In [87]:
y_true = np.argmax(y_test, axis=1).tolist()
y_hat = np.argmax(y_hat, axis=1).tolist()

In [88]:
multilabel_confusion_matrix(y_true, y_hat)

array([[[5, 0],
        [0, 1]],

       [[4, 0],
        [0, 2]],

       [[3, 0],
        [0, 3]]])

In [89]:
accuracy_score(y_true, y_hat)

1.0

# Deployment 
Currently this system is being tested in real time as shown below using OpenCV and the model from MediaPipe. Further deployment of this system could include building a fullstack app that would be packaged into a docker image and deployed onto AWS or Azure. Further refinement of the system would include have a data storage solution to store all the array conversions of the mediapipe landmark points that are used to represent the motion of the gesture. This system could also be developed into an API or plugin that integrates with current video chat services such as Zoom, Microsoft Teams and Google Meetings. Further additons to the the system would be the integration of a text to speech from the processed sign gestures. 

## Test in real time

In [96]:
# Detection Variables
sequence = []
sentence = []
predictions = []
threshold = 0.7

vid_capture = cv2.VideoCapture(0)
# mediapipe model
with mp_holistic.Holistic(min_detection_confidence=0.5,
                         min_tracking_confidence=0.5) as holistic:
    while vid_capture.isOpened():

        # read in feed
        ret, frame = vid_capture.read()

        # Make detections
        image, results = mp_detection(frame, holistic)
        print(results)
      
        # draw landmarks
        draw_styled_landmarks(image, results)
        
        # predicition method
        keypoints = extract_keypoints(results)
        sequence.append(keypoints)
        sequence = sequence[-30:]  #the 30 frames
        
        if len(sequence) == 30:
            res = model.predict(np.expand_dims(sequence, axis=0))[0]
            print(actions[np.argmax(res)])
            predictions.append(np.argmax(res))
            
        # visualizations 
            if np.unique(predictions[-10:])[0] == np.argmax(res):
                if res[np.argmax(res)] > threshold:
                    if len(sentence) > 0:
                        if actions[np.argmax(res)] != sentence[-1]:
                            sentence.append(actions[np.argmax(res)])
                    else:
                        sentence.append(actions[np.argmax(res)])
            if len(sentence) > 5:
                sentence = sentence[-5:]

        cv2.rectangle(image, (0,0), (640,40), (245, 117, 16), -1)
        cv2.putText(image, ' '.join(sentence), (3,30),
                   cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 2, cv2.LINE_AA)
            
            
        # show to screen
        cv2.imshow('OpenCV Feed', image)

        # break when hitting q
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
    vid_capture.release()
    cv2.destroyAllWindows()

<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.solution_base.SolutionOutputs'>
<class 'mediapipe.python.soluti

my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
my
<class 'mediapipe.python.solution_base.SolutionOutputs'>
name is
<class 'mediapipe.python.solution_base.SolutionOutputs'>
name is
<class 'mediapipe.python.solution_base.SolutionOutputs'>
name is
<class 'mediapipe.python.solution_base.SolutionOutputs'>
name is
<class 'mediapipe

hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
hello
<class 'mediapipe.python.solution_base.SolutionOutputs'>
goodbye
<class 'mediapipe.python.solution_base.SolutionOutputs'>
goodbye
<class 'mediapipe.python.solution_base.SolutionOutputs'>
goodbye
<class 'mediapipe.python.solution_base.SolutionOutputs'>
goodbye
<class 'mediapipe.python.solution_base.So

In [93]:
vid_capture.release()
cv2.destroyAllWindows()

![Screenshot%20from%202023-01-29%2007-55-34.png](attachment:Screenshot%20from%202023-01-29%2007-55-34.png)

## Text to Audio 

In [63]:
def speak(text, language='en'):
    mp3_fo = BytesIO()
    tts = gTTS(text, lang=language)
    tts.write_to_fp(mp3_fo)
    return mp3_fo

## References

[1]Kushalnagar, R. (2019). Deafness and hearing loss. Web Accessibility: A Foundation for Research, 35-47.

[2]Bragg, Danielle, et al. "Sign language recognition, generation, and translation: An interdisciplinary perspective." Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 2019.

[3] Mitchell, Ross & Young, Travas & Bachleda, Bellamie & Karchmer, Michael. (2006). How Many People Use ASL in the United States? Why Estimates Need Updating. Sign Language Studies. 6. 10.1353/sls.2006.0019. 

[4] Baker-Shenk, C. L., & Cokely, D. (1991). American Sign Language: A teacher's resource text on grammar and culture. Gallaudet University Press.

[5] Shrawankar, U., & Dixit, S. (2016, April). Framing sentences from sign language symbols using NLP. In IEEE conference (pp. 5260-5262).

[6] Garberoglio, C., Palmer, J., Cawthon, S., &amp; Sales, A. (2019, October). Deaf people and employment in the United States: 2019. Nationaldeafcenter.org. Retrieved January 27, 2023, from https://nationaldeafcenter.org/wp-content/uploads/2019/10/Deaf-People-and-Employment-in-the-United-States_-2019-7.26.19ENGLISHWEB.pdf 

[7] Heibutzki, R. (2017, November 21). Problems faced by deaf individuals in finding jobs. Problems Faced by Deaf Individuals in Finding Jobs. Retrieved January 27, 2023, from https://work.chron.com/problems-faced-deaf-individuals-finding-jobs-23030.html 

[8] Chou, F. H., & Su, Y. C. (2012, July). An encoding and identification approach for the static sign language recognition. In 2012 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) (pp. 885-889). IEEE.
[9] Y. Bengio, P. Simard and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," in IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, March 1994, doi: 10.1109/72.279181.