## Body language decoder and segmentation

In the following project I will first use a *mediapipe* package which allows to detect face, body and hands landmarks. These landmarks are basically x, y and z coordinates of many different points. The goal is to train a classification model on these landmarks to detect eg. facial expressions, body poses or gestures. We can get the landmarks from webcam frames and store them in a csv file. Then they can be loaded for training. The model can be then used for real time prediction. 

As the second thing I will use a *BodyPix* package for segmentation in real time and to apply an image as a background. Then I will build a Streamlit app with both features plus the face mask detection from the previous project.

To build both features I have used these tutorials [YouTube1](https://www.youtube.com/watch?v=We1uB79Ci-w&t=2690s) and [YouTube2](https://www.youtube.com/watch?v=0tB6jG55mig&t=317s).

In [156]:
import mediapipe as mp
import cv2
import os
import numpy as np
import csv
import pandas as pd
import pickle
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

Firstly we have to load a function which will draw our landmarks and then we can open our webcam inside a *holistic* model. We can specify the confidence for detecting landmarks and tracking them. Then we capture every frame and pass it to the model which will make predictions. For every frame we can draw our face, pose and hands landmarks.

In [157]:
mp_drawing = mp.solutions.drawing_utils
mp_holistic = mp.solutions.holistic

In [158]:
cap = cv2.VideoCapture(0)
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
    
    while cap.isOpened():
        ret, frame = cap.read()
        
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False        
        
        results = holistic.process(image)
        
        image.flags.writeable = True   
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        # 1. Draw face landmarks
        mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),
                                 mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                                 )
        
        # 2. Right hand
        mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                                 )

        # 3. Left Hand
        mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                                 )

        # 4. Pose Detections
        mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                                 )
                        
        cv2.imshow('Webcam', image)
        
        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
        
cap.release()
cv2.destroyAllWindows()

To be able to save landmarks for further training we have to first check how many there are in total. Then we have to create feature names for all of them and also a *class* name which will be our eg. gesture or pose. Every landmark is described by *x*, *y*, *z* coordinates and a visibility *v*.

In [34]:
num_coords = len(results.pose_landmarks.landmark) + len(results.face_landmarks.landmark) + len(results.left_hand_landmarks.landmark) + len(results.right_hand_landmarks.landmark) 

In [69]:
num_coords

543

In [36]:
landmarks = ['class']
for val in range(num_coords):
    landmarks.extend(['x'+str(val), 'y'+str(val), 'z'+str(val), 'v'+str(val)])

In [37]:
len(landmarks)

2173

Now we have to write feature names to our csv file as a first row. Then we will use the same code to write landmarks from every frame as new rows in our file.

In [235]:
with open('coords.csv', mode='w', newline='') as f:
    csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerow(landmarks)

Lets first detect *smile* and *sad* expressions. We have to convert all the landmarks to a list so that they can be saved in our *coords.csv* file together with the class. It is done in the *try except* block. For each class we have to do it separately.

In [236]:
class_name = 'Sad'
new_list = []

In [237]:
cap = cv2.VideoCapture(0)
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
    
    while cap.isOpened():
        ret, frame = cap.read()
        
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False        
        
        results = holistic.process(image)

        
        image.flags.writeable = True   
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        # 1. Draw face landmarks
        mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),
                                 mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                                 )
        
        # 2. Right hand
        mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                                 )

        # 3. Left Hand
        mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                                 )

        # 4. Pose Detections
        mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                                 )
        
        try:
#             pose_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in results.pose_landmarks.landmark]).flatten())
            face_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in results.face_landmarks.landmark]).flatten())
#             left_hand_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in results.left_hand_landmarks.landmark]).flatten())
#             right_hand_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in results.right_hand_landmarks.landmark]).flatten())
            row = [class_name] + face_row
            new_list.append(row)
            with open('coords.csv', mode='a', newline='') as f:
                csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
                csv_writer.writerow(row)
            
        except:
            pass
        
        cv2.imshow('Webcam', image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()

Now lets load our csv file to a dataframe and split it for features and labels and then for a train and test sets.

In [142]:
df = pd.read_csv('coords.csv')
df.head()

Unnamed: 0,class,x0,y0,z0,v0,x1,y1,z1,v1,x2,...,z540,v540,x541,y541,z541,v541,x542,y542,z542,v542
0,Smile,0.903682,0.192208,-0.042738,0.0,0.927332,0.143924,-0.043661,0.0,0.906092,...,,,,,,,,,,
1,Smile,0.870717,0.262017,-0.045533,0.0,0.904262,0.20193,-0.052269,0.0,0.872757,...,,,,,,,,,,
2,Smile,0.837109,0.280314,-0.048564,0.0,0.856691,0.224545,-0.060735,0.0,0.836927,...,,,,,,,,,,
3,Smile,0.544744,0.391306,-0.053102,0.0,0.527131,0.32664,-0.097666,0.0,0.532537,...,,,,,,,,,,
4,Smile,0.535172,0.40455,-0.052481,0.0,0.519283,0.333279,-0.096503,0.0,0.522814,...,,,,,,,,,,


In [143]:
df.dropna(axis=1, inplace=True)
X = df.drop('class', axis=1)
y = df['class']

In [174]:
y

0        Sad
1        Sad
2        Sad
3        Sad
4        Sad
       ...  
318    Smile
319    Smile
320    Smile
321    Smile
322    Smile
Name: class, Length: 323, dtype: object

In [175]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1234)

We can test couple of models to see how they will perform in predicting classes.

In [176]:
pipelines = {
    'lr': make_pipeline(StandardScaler(), LogisticRegression(n_jobs=-1)),
    'rf': make_pipeline(StandardScaler(), RandomForestClassifier(n_jobs=-1)),
    'gb': make_pipeline(StandardScaler(), XGBClassifier(n_jobs=-1))
}

In [177]:
fit_models = {}
for algo, pipeline in pipelines.items():
    model = pipeline.fit(X_train, y_train)
    fit_models[algo] = model





In [178]:
for algo, model in fit_models.items():
    yhat = model.predict(X_test)
    print(algo, accuracy_score(y_test, yhat))

lr 1.0
rf 1.0
gb 1.0




It looks like that all models perform the same so lets pick one of them and save it. Then we can load it and make predictions in real time.

In [179]:
pickle.dump(fit_models['rf'], open('body_language.pkl', 'wb'))

In [180]:
model = pickle.load(open('body_language.pkl', 'rb'))

After predicting the class based on the captured landmarks we have to also show the class and its probability on the screen. This can be easily done using *cv2*.

In [197]:
cap = cv2.VideoCapture(0)
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
    
    while cap.isOpened():
        ret, frame = cap.read()
        
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False        
        
        results = holistic.process(image)
        
        image.flags.writeable = True   
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        # 1. Draw face landmarks
        mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),
                                 mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                                 )
        
        # 2. Right hand
        mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                                 )

        # 3. Left Hand
        mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                                 )

        # 4. Pose Detections
        mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                                 mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
                                 mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                                 )
        
        try:
            face_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in results.face_landmarks.landmark]).flatten())
            row = face_row
            X = pd.DataFrame([row])
            body_language_class = model.predict(X)[0]
            body_language_prob = model.predict_proba(X)[0]
            
            coords = tuple(np.multiply(np.array((
                results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EAR].x, 
                results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EAR].y)), [640, 480]).astype(int))
            
            cv2.rectangle(image, (coords[0], coords[1] + 5),
                         (coords[0] + len(body_language_class) * 20, coords[1] - 30),
                         (245, 117, 16), -1)
            cv2.putText(image, body_language_class, coords, 
                       cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)
            
            cv2.rectangle(image, (0,0), (250,60), (245,117,16), -1)
            
            cv2.putText(image, 'CLASS', (95,12), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
            cv2.putText(image, body_language_class, (90,40), 
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)
            
            cv2.putText(image, 'PROB', (15,12), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
            cv2.putText(image, str(np.max(body_language_prob)), (10,40), 
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)
            
        except:
            pass
        
        cv2.imshow('Webcam', image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()