# Real-Time Prediction

## Reference

The code in this notebook is adapted and modified from the following Youtube tutorial: https://www.youtube.com/watch?v=doDUihpj6ro 

## Usage

With this notebook, real-time predictions can be made on signs you perform in front of your running webcam. 

### <span style="color:green">Quick User Guide: 

<span style="color:green">1. Load your trained TF Model.</span>

<span style="color:green">2. Copy + paste used Configurations from Franziska's pre-processing notebook</span>

<span style="color:green">Optional: 3. Copy + paste used Pre-processing Layer from Franziska's pre-processing notebook</span>


## Install and Import Dependencies

### Install Dependencies

In [None]:
%pip install tensorflow-macos opencv-python mediapipe-silicon sklearn matplotlib
#!pip install tensorflow==2.4.1 tensorflow-gpu==2.4.1 opencv-python mediapipe sklearn matplotlib # original code line from tutorial (he had a windows system)

### Import Dependencies

In [None]:
# general
import numpy as np
import pandas as pd
import os # easier file path handling

# for camera feed
import cv2 # opencv
from matplotlib import pyplot as plt # imshow for easy visualization
import time # to insert "sleep" in between frames
import mediapipe as mp # for accessing and reading from webcam

# for model re-building
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# for loading .json dictionary
import json

# <span style="color:green">Load Files</span>

## <span style="color:green">Load Saved Model</span>

### <span style="color:green">Here, you can load your trained TensorFlow model, e.g. LSTM.</span>

In [None]:
# load model
model = tf.keras.models.load_model('../models/LSTM_model_2_1')

## Load sign_to_prediction_index_map.json file

In [None]:
# label map dictionary 
label_map = json.load(open("../data/asl-signs/sign_to_prediction_index_map.json", "r")) 

# actions to detect
actions = np.array(list(label_map.keys()))

# Setup

## <span style="color:green">Franziskas's Configuration :) </span>

### <span style="color:green">For now, just copy + paste the "Configuration" cell from Franziska's notebook "TF_Load-PreprocessData.ipynb" into the following cell :D </span>

### <span style="color:green">These configuration must be the same ones, which you used to preprocess the data to train the TensorFlow model, which you loaded above! </span>

In [None]:
#limit dataset for quick test
QUICK_TEST = False
QUICK_LIMIT = 1000

#Define length of sequences for padding or cutting; 22 is the median length of all sequences
LENGTH = 22

#define min or max length of sequences; sequences too long/too short will be dropped
#max value of 92 was defined by calculating the interquartile range
MIN_LENGTH = 10
MAX_LENGTH = 92

#final data will be flattened, if false data will be 3 dimensional
FLATTEN = False

#define initialization of numpy array 
ARRAY = False #(True=Zeros, False=empty values)

#Define padding mode 
#1 = padding at start&end; 2 = padding at end; 3 = no padding, 4 = copy first/lastframe, 5 = copy last frame)
#Note: Mode 3 will give you an error due to different lengths, working on that
PADDING = 2
CONSTANT_VALUE = 0 #only required for mode 1 and 2; enter tf.constant(float('nan')) for NaN

#define if z coordinate will be dropped
DROP_Z = True

#define if csv file should be filtered
CSV_FILTER  = True
#define how many participants for test set
TEST_COUNT = 5 #5 participants account for ca 23% of dataset
#generate test or train dataset (True = Train dataset; False = Test dataset)
TRAIN = True #only works if CSV_FILTER is activated
#TRAIN = False

#define filenames for x and y:
feature_data = 'X_train' #x data
feature_labels = 'y_train' #y data

#use for test dataset
#feature_data = 'X_test' #x data
#feature_labels = 'y_test' #y data


RANDOM_STATE = 42

#Defining Landmarks
#index ranges for each landmark type
#dont change these landmarks
FACE = list(range(0, 468))
LEFT_HAND = list(range(468, 489))
POSE = list(range(489, 522))
POSE_UPPER = list(range(489, 510))
RIGHT_HAND = list(range(522, 543))
LIPS = [61, 185, 40, 39, 37,  0, 267, 269, 270, 409,
                 291,146, 91,181, 84, 17, 314, 405, 321, 375, 
                 78, 191, 80, 81, 82, 13, 312, 311, 310, 415, 
                 95, 88, 178, 87, 14,317, 402, 318, 324, 308]
#defining landmarks that will be merged
averaging_sets = [FACE]

#generating list with all landmarks selected for preprocessing
#change landmarks you want to use here:
point_landmarks = LEFT_HAND + POSE_UPPER + RIGHT_HAND + LIPS


#calculating sum of total landmarks used
LANDMARKS = len(point_landmarks) + len(averaging_sets)
print(f'Total count of used landmarks: {LANDMARKS}')

#defining input shape for model
if DROP_Z:
    INPUT_SHAPE = (LENGTH,LANDMARKS*2)
else:
    INPUT_SHAPE = (LENGTH,LANDMARKS*3)


## Setup Objects and Functions for MP Holistic Keypoints

### Initialize MP Holistic Model

In [None]:
mp_holistic = mp.solutions.holistic # holistic model
mp_drawing = mp.solutions.drawing_utils # drawing utilities

### Define Functions (later they all go into a python module for multiple use)

#### mediapipe_detection()

In [None]:
# function to detect MP Holistic landmarks from an image, e.g. frames of your camera feed
def mediapipe_detection(image, model): 
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # color conversion BGR to RGB
    image.flags.writeable = False                   # image no longer writeable
    results = model.process(image)                  # make prediction
    image.flags.writeable = True                    # image is writeable again
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)  # color conversion back to original
    return image, results

#### draw_styled_landmarks()

In [None]:
# function to draw landmarks points and connecting lines on top of an image, e.g. on top of your camera feed
def draw_styled_landmarks(image, results): 
    # draw face connections
    mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_TESSELATION, 
                              mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1), 
                              mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1))
    # draw pose connections
    mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                              mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4), 
                              mp_drawing.DrawingSpec(color=(80,256,121), thickness=2, circle_radius=2)) 
    # draw left hand connections
    mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                              mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4), 
                              mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)) 
    # draw right hand connections
    mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                              mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4), 
                              mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)) 

#### extract_keypoints()

In [None]:
# function to extract coordinates (+visibility) of all landmarks --> keypoints
# and concatenates everything into a flattened list 
def extract_keypoints(results): 
    face = np.array([[r.x, r.y, r.z] for r in results.face_landmarks.landmark]) if results.face_landmarks else np.zeros([468, 3])
    left_hand = np.array([[r.x, r.y, r.z] for r in results.left_hand_landmarks.landmark]) if results.left_hand_landmarks else np.zeros([21, 3])
    pose = np.array([[r.x, r.y, r.z] for r in results.pose_landmarks.landmark]) if results.pose_landmarks else np.zeros([33, 3]) # x, y, z and extra value visibility
    right_hand = np.array([[r.x, r.y, r.z] for r in results.right_hand_landmarks.landmark]) if results.right_hand_landmarks else np.zeros([21, 3])
    return np.concatenate([face, pose, left_hand, right_hand]) # original code
    # a flattened list with list of all pose, face, left_hand, right_hand landmark x, y, z, (+visibility) coordinates

#### prob_viz()

In [None]:
# function to visualize predicted word probabilities with a dynamic real-time bar chart
def prob_viz(pred, actions, input_frame, color): 
    output_frame = input_frame.copy() 
    for num, prob in enumerate(pred): 
        cv2.rectangle(output_frame, (0,60+num*20), (int(prob*100), 45+num*20), color, -1)
        # cv2.rectangle(image, start_point, end_point, color, thickness)
        cv2.putText(output_frame, actions[num], (0, 45+num*20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (200, 200, 200), 1, cv2.LINE_AA)
        # cv2.putText(image, 'OpenCV', org, font, fontScale, color, thickness, cv2.LINE_AA)
    return output_frame

## <span style="color:green">Pre-processing Layer</span>


### Helper Function

In [None]:
def tf_nan_mean(x, axis=0):
    #calculates the mean of a TensorFlow tensor x along a specified axis while ignoring any NaN values in the tensor.
    return tf.reduce_sum(tf.where(tf.math.is_nan(x), tf.zeros_like(x), x), axis=axis) / tf.reduce_sum(tf.where(tf.math.is_nan(x), tf.zeros_like(x), tf.ones_like(x)), axis=axis)

### <span style="color:green">In the following cell copy + paste the pre-processing layer from Franziska's notebook (,if you added / changed something)</span>

In [None]:
#generating preprocessing layer that will be added to final model
class FeatureGen(tf.keras.layers.Layer):
    #defines custom tensorflow layer 
    def __init__(self):
        #initializes layer
        super(FeatureGen, self).__init__()
    
    def call(self, x_in):
        #drop z coordinates if required
        if DROP_Z:
            x_in = x_in[:, :, 0:2]
        
        #generates list with mean values for landmarks that will be merged
        x_list = [tf.expand_dims(tf_nan_mean(x_in[:, av_set[0]:av_set[0]+av_set[1], :], axis=1), axis=1) for av_set in averaging_sets]
        #extracts specific columns from input x_in defined by landmarks
        x_list.append(tf.gather(x_in, point_landmarks, axis=1))
        #concatenates the two tensors from above along axis 1/columns
        x = tf.concat(x_list, 1)

        #padding to desired length of sequence (defined by LENGTH)
        #get current number of rows
        x_padded = x
        current_rows = tf.shape(x_padded)[0]
        #if current number of rows is greater than desired number of rows, truncate excess rows
        if current_rows > LENGTH:
            x_padded = x_padded[:LENGTH, :, :]

        #if current number of rows is less than desired number of rows, add padding
        elif current_rows < LENGTH:
            #calculate amount of padding needed
            pad_rows = LENGTH - current_rows

            if PADDING ==4: #copy first/last frame
                if pad_rows %2 == 0: #if pad_rows is even
                    padding_front = tf.repeat(x_padded[0:1, :], pad_rows//2, axis=0)
                    padding_back = tf.repeat(x_padded[-1:, :], pad_rows//2, axis=0)
                else: #if pad_rows is odd
                    padding_front = tf.repeat(x_padded[0:1, :], (pad_rows//2)+1, axis=0)
                    padding_back = tf.repeat(x_padded[-1:, :], pad_rows//2, axis=0)
                x_padded = tf.concat([padding_front, x_padded, padding_back], axis=0)
            elif PADDING == 5: #copy last frame
                padding_back = tf.repeat(x_padded[-1:, :], pad_rows, axis=0)
                x_padded = tf.concat([x_padded, padding_back], axis=0)
            else:
                if PADDING ==1: #padding at start and end
                    if pad_rows %2 == 0: #if pad_rows is even
                        paddings = [[pad_rows//2, pad_rows//2], [0, 0], [0, 0]]
                    else: #if pad_rows is odd
                        paddings = [[pad_rows//2+1, pad_rows//2], [0, 0], [0, 0]]
                elif PADDING ==2: #padding only at the end of sequence
                    paddings = [[0, pad_rows], [0, 0], [0, 0]]
                elif PADDING ==3: #no padding
                    paddings = [[0, 0], [0, 0], [0, 0]]
                x_padded = tf.pad(x_padded, paddings, mode='CONSTANT', constant_values=CONSTANT_VALUE)

        x = x_padded
        current_rows = tf.shape(x)[0]

        #interpolate single missing values
        x = pd.DataFrame(np.array(x).flatten()).interpolate(method='linear', limit=2, limit_direction='both')
        #fill missing values with zeros
        x = tf.where(tf.math.is_nan(x), tf.zeros_like(x), x)
        
        #reshape data to 2D or 3D array
        if FLATTEN:
            x = tf.reshape(x, (1, current_rows*INPUT_SHAPE[1]))
        else:
            x = tf.reshape(x, (1, current_rows, INPUT_SHAPE[1]))

        return x

#define converter using generated layer
feature_converter = FeatureGen()

## Real Time Prediction / Detection

Press "Q" to interrupt the camera feed. 

In [None]:
# 1. New detection variables 
sequence = [] # to collect all 22 frames for prediction
sentence = [] # history of all predictions (predicted words)
predictions = []
threshold = 0.4 # confidence metrics (only render prediction results, if confidence is above threshold)

cap = cv2.VideoCapture(0) # grabbing webcam
# set mediapipe model
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic: 
    while cap.isOpened(): # loop through all frames 

        # read feed
        ret, frame = cap.read()

        # make detections 
        image, results = mediapipe_detection(frame, holistic)
        #print(results)

        #draw_landmarks(image, results)
        draw_styled_landmarks(image, results)

        # 2. Prediction logic
        keypoints = extract_keypoints(results) # extract keypoints x, y, z for face, left_hand, pose, right_hand from mediapipe holistic predictions, keypoints.shape e.g. (543, 3)
        sequence.append(keypoints) # keep appending keypoints (frames) to a sequence, np.array(sequence).shape e.g. (22, 543, 3)
        sequence = sequence[-LENGTH:] # takes last LENGTH frames of the sequence

        # 
        if len(sequence) == LENGTH: 
            # pre-processing
            model_input = feature_converter(np.array(sequence))
            print(f'OMG! Frenzy Franzi is converting your mediapipe input! See how the shape is changing from {np.array(sequence).shape} to {model_input.shape}! SO AWESOME!!!')
            
            # prediction
            pred = model.predict(model_input)[0] # model.fit() expects something in shape (num_sequences, 30, 1662), e.g. (1, 30, 1662) for a single sequence
            predictions.append(np.argmax(pred))
            #print(actions[np.argmax(res)])

            # 3. Visualization logic
            # makes sure the last 15 frames had the same prediction (more stable transition from one sign to another) 
            if np.unique(predictions[-15:])[0]==np.argmax(pred): 
                # if the confidence of the most confident prediction is above threshold
                if pred[np.argmax(pred)] > threshold: 
                    # if there is already a last prediction
                    if len(sentence) > 0: 
                        # only append the predicted word, if it differs from the last prediction (prevent double actions)
                        if actions[np.argmax(pred)] != sentence[-1]: 
                            sentence.append(actions[np.argmax(pred)])
                    # just append if there is no last prediction (first prediction)
                    else: 
                        sentence.append(actions[np.argmax(pred)])

            # limit the history to the last 5 predictions
            if len(sentence) > 5: 
                sentence = sentence[-5:]

            # viz probabilities
            color = (150, 150, 150) # color for bars
            print(f'Example prediction for "{actions[0]}": {pred[0]}')
            image = prob_viz(pred, actions, image, color)

        # some rendering
        #cv2.rectangle(image, (0, 0), (1280, 40), (200, 200, 200), -1)
        #cv2.putText(image, ' '.join(sentence), (323,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (52, 75, 102), 2, cv2.LINE_AA)
        # cv2.putText(image, 'OpenCV', org, font, fontScale, color, thickness, cv2.LINE_AA)

        # show to screen
        cv2.imshow("OpenCV Feed", image)

        # break gracefully 
        if cv2.waitKey(10) & 0xFF == ord('q'): 
            break 
        
# release camera and close feed window 
cap.release()
cv2.destroyAllWindows() 
cv2.waitKey(1) # some workaround to fix the bug, that window doesn't close
            

## Next steps

Save the recordings and predictions for later use? 

Add a feature, so that the user can type into a textbox to correct wrong prediction + real-time training / update of the model? 