# Real-Time Prediction

## Reference

The code in this notebook is adapted and modified from the following Youtube tutorial: https://www.youtube.com/watch?v=doDUihpj6ro 

## Usage

With this notebook, real-time predictions can be made on the signs you perform in front of your webcam. The model used for predicting the corresponding words needs to be trained beforehand in the notebook "2_Modeling.ipynb". 

Here, the model weights are loaded from the 'first_model_whoop_whoop.h5' file. But you can change this to any other model, that was compiled and trained analogously. 

You can choose between two execution modes using `MODE = 'kaggle'` or `MODE = 'tutorial'`


## 1. Install and Import Dependencies

### Install Dependencies

In [17]:
%pip install tensorflow-macos opencv-python mediapipe-silicon sklearn matplotlib
#!pip install tensorflow==2.4.1 tensorflow-gpu==2.4.1 opencv-python mediapipe sklearn matplotlib # original code line from tutorial (he had a windows system)

Note: you may need to restart the kernel to use updated packages.


### Import Dependencies

In [18]:
# general
import numpy as np
import pandas as pd
import os # easier file path handling

# for camera feed
import cv2 # opencv
from matplotlib import pyplot as plt # imshow for easy visualization
import time # to insert "sleep" in between frames
import mediapipe as mp # for accessing and reading from webcam

# for model re-building
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

## 2. Setup Objects and Functions for MP Holistic Keypoints

### Execution Mode

In [19]:
# model to be used for real-time prediction
#model_path = 'tutorial/Easter_model_10signs.h5'
model_path = 'Modeling_logs/Easter_model_10signs_j_test/cp-0800.ckpt' # load model from checkpoint

### Initialize MP Holistic Model

In [20]:
mp_holistic = mp.solutions.holistic # holistic model
mp_drawing = mp.solutions.drawing_utils # drawing utilities

### Define Functions (later they all go into a python module for multiple use)

In [21]:
# function to detect MP Holistic landmarks from an image, e.g. frames of your camera feed
def mediapipe_detection(image, model): 
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # color conversion BGR to RGB
    image.flags.writeable = False                   # image no longer writeable
    results = model.process(image)                  # make prediction
    image.flags.writeable = True                    # image is writeable again
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)  # color conversion back to original
    return image, results

In [22]:
# function to draw landmarks points and connecting lines on top of an image, e.g. on top of your camera feed
def draw_styled_landmarks(image, results): 
    # draw face connections
    mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_TESSELATION, 
                              mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1), 
                              mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1))
    # draw pose connections
    mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                              mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4), 
                              mp_drawing.DrawingSpec(color=(80,256,121), thickness=2, circle_radius=2)) 
    # draw left hand connections
    mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                              mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4), 
                              mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)) 
    # draw right hand connections
    mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                              mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4), 
                              mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)) 

In [23]:
# function to extract coordinates (+visibility) of all landmarks --> keypoints
# and concatenates everything into a flattened list 
def extract_keypoints(mph_results): 
    pose = np.array([[r.x, r.y] for r in mph_results.pose_landmarks.landmark]).flatten() if mph_results.pose_landmarks else np.zeros(33*2) # x, y, z and extra value visibility
    lips = face = np.array([[r.x, r.y] for r in mph_results.face_landmarks.landmark]).flatten() if mph_results.face_landmarks in [61, 185, 40, 39, 37,  0, 267, 269, 270, 409, 291,146, 91,181, 84, 17, 314, 405, 321, 375, 78, 191, 80, 81, 82, 13, 312, 311, 310, 415, 95, 88, 178, 87, 14,317, 402, 318, 324, 308] else np.zeros(468*2)
    #face = np.array([[r.x, r.y] for r in mph_results.face_landmarks.landmark]).flatten() if mph_results.face_landmarks else np.zeros(468*2)
    lh = np.array([[r.x, r.y] for r in mph_results.left_hand_landmarks.landmark]).flatten() if mph_results.left_hand_landmarks else np.zeros(21*2)
    rh = np.array([[r.x, r.y] for r in mph_results.right_hand_landmarks.landmark]).flatten() if mph_results.right_hand_landmarks else np.zeros(21*2)
    return np.concatenate([pose, face, lh, rh])
    # a flattened list with list of all pose, face, lh, rh landmark x, y, z, (+visibility) coordinates

In [24]:
# function to visualize predicted word probabilities with a dynamic real-time bar chart
def prob_viz(pred, actions, input_frame, colors): 
    output_frame = input_frame.copy()
    for num, prob in enumerate(pred): 
        #if prob >=
        cv2.rectangle(output_frame, (0,60+num*40), (int(prob*100), 90+num*40), colors[num], -1)
        cv2.putText(output_frame, actions[num], (0, 85+num*40), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 2, cv2.LINE_AA)
    
    return output_frame

## 3. Load Saved Model

### Some Global Stuff (edit to your needs)

In [26]:
# actions to detect
actions = np.array (['alligator', 'radio', 'moon', 'sleep', 'grandpa', 'tiger', 'pencil', 'sleepy', 'grandma', 'chocolate'])
n_keypoints = 104 # kaggle only lips from face: 104
n_sequences = 22 # 22 videos per sequence / word / sign (get it later from input file)

label_map = {label:num for num, label in enumerate(actions)} # create label map (dict, later our .json file)

### Initialize Same Model Architecture and Load Weights

In [27]:
# re-initialize the model
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(n_sequences, n_keypoints))) # input_shape=(#sequences, #keypoints)
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(actions.shape[0], activation='softmax'))

# compile the model
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])

# load model
model.load_weights(model_path)

NameError: name 'MODE' is not defined

## Real Time Prediction / Detection

Press "Q" to interrupt the camera feed. 

In [None]:
# 1. New detection variables 
sequence = [] # to collect all 30 frames for prediction
sentence = [] # history of all predictions (predicted words)
predictions = []
threshold = 0.3 # confidence metrics (only render prediction results, if confidence is above threshold)

cap = cv2.VideoCapture(0) # grabbing webcam
# set mediapipe model
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic: 
    while cap.isOpened(): # loop through all frames 

        # read feed
        ret, frame = cap.read()

        # make detections 
        image, mph_results = mediapipe_detection(frame, holistic)
        #print(results)

        # draw landmarks
        #draw_landmarks(image, results)
        draw_styled_landmarks(image, mph_results)

        # 2. Prediction logic
        keypoints = extract_keypoints(mph_results)
        sequence.append(keypoints)
        sequence = sequence[-30:] # takes last thirty frames

        if len(sequence) == 30: 
            pred = model.predict(np.expand_dims(sequence, axis=0))[0]
            # np.expand_dims to adjust input for a single sequence (word) as it would be of shape (30, 1662)
            # but the model.fit() expects something in shape (num_sequences, 30, 1662), e.g. (1, 30, 1662) for a single sequence
            predictions.append(np.argmax(pred))
            #print(actions[np.argmax(res)])

        # 3. Visualization logic
            # makes sure the last 15 frames had the same prediction (more stable transition from one sign to another) 
            if np.unique(predictions[-15:])[0]==np.argmax(pred): 
                # if the confidence of the most confident prediction is above threshold
                if pred[np.argmax(pred)] > threshold: 
                    # if there is already a last prediction
                    if len(sentence) > 0: 
                        # only append the predicted word, if it differs from the last prediction (prevent double actions)
                        if actions[np.argmax(pred)] != sentence[-1]: 
                            sentence.append(actions[np.argmax(pred)])
                    # just append if there is no last prediction (first prediction)
                    else: 
                        sentence.append(actions[np.argmax(pred)])

            # limit the history to the last 5 predictions
            if len(sentence) > 5: 
                sentence = sentence[-5:]

            # viz probabilities
            if MODE == 'kaggle': 
                colors = [(245, 117, 16), (117, 245, 16), (16, 117, 245),
                        (245, 117, 16), (117, 245, 16), (16, 117, 245),
                        (245, 117, 16), (117, 245, 16), (16, 117, 245),
                        (245, 117, 16)] # colors for each word (bars)
            elif MODE == 'tutorial': 
                colors = [(245, 117, 16), (117, 245, 16), (16, 117, 245)] # colors for each word (bars)

            image = prob_viz(pred, actions, image, colors)

        # some rendering
        cv2.rectangle(image, (0, 0), (640, 40), (245, 11, 16), -1)
        cv2.putText(image, ' '.join(sentence), (3,30), 
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)

        # show to screen
        cv2.imshow("OpenCV Feed", image)

        # break gracefully 
        if cv2.waitKey(10) & 0xFF == ord('q'): 
            # release camera and close feed window 
            cap.release()
            cv2.waitKey(1) # some workaround to fix the bug, that window doesn't close
            cv2.destroyAllWindows() 
            cv2.waitKey(1) # some workaround to fix the bug, that window doesn't close
            break 

    

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
2023-04-19 13:29:51.737028: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz




error: OpenCV(4.7.0) /Users/xperience/GHA-OCV-Python/_work/opencv-python/opencv-python/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'


## Next steps

Save the recordings and predictions for later use? 

Add a feature, so that the user can type into a textbox to correct wrong prediction + real-time training / update of the model? 