# TECHIN 510
## Milestone 1

**Name**: Saif Mustafa

**Email**: saifm@uw.edu

**Student Number**: 1428039

---

As part of the first milestone for your projects, you will practice what you have learned so far by developing the visual recognition capabilities of the robot you choose to develop. As a first step read the project topics listed below and decide which topic you would like to work on. You will choose and implement two visual recognition functionalities from the specs listed in the project topic you choose.

Your implementation will be in Python. You can use everything covered in class as well as any new functionality you discover yourself.

You will complete this assignment by submitting a demo video and any materials (code, data, physical prompts) that would allow us to recreate your demo, on Canvas by Oct 29, 2021 (Friday). The video should clearly illustrate two functionalities. Make sure you demonstrate the functionality in varied scenarios (e.g., for face detection make sure you have at least two different people's faces in different poses relative to the camera). At this point, your video does not need to include narrative about the project. It can be as simple as a screenshot video showing the captured images from a camera with annotations that reflect the implemented functionalities (e.g. square around faces for face detection).

---

## Topic: Nike Exercise and Wellness Robot

**Problem:** Many people who know the importance and potential benefits of exercising and meditation have a hard time motivating themselves to actually do them.

**Proposed solution:** Social robots have been demonstrated to have the impact of a social accountability partner in committing to difficult behavioral changes. This project will explore this potential of social robots for exercise, yoga, and/or meditation motivation and guidance.

**Prototype specifications:** The robot will have one user. The robot should interact with the user to introduce itself, meet its user, set user goals, and motivate the user to reach those goals. It should also guide the user through a sample exercise.

**Image processing capabilities for this robot (Milestone 1):**
- determine when a person is in front of the robot, 
- recognize whether the person is the owner of the robot, 
- determine the mood of the person, 
- determine when a person has completed an exercise, 
- determine when a functionality activation card (e.g. to start a specific exercise) is visually shown to the robot.

---

##  Installs / Imports

In [2]:
# installing opencv and mediapipe https://google.github.io/mediapipe/
# !pip install mediapipe
# !pip install opencv-python
# !pip install tensorflow
# !pip install deepface 
# !pip install tflearn

import mediapipe as mp
import cv2
import uuid
import time
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression

import os
import json

mp_drawing = mp.solutions.drawing_utils # drawing model
mp_pose = mp.solutions.pose # pose estimation model

Instructions for updating:
non-resource variables are not supported in the long term


## Define training set + folders

In [3]:
emotions = ['happy','sad','angry','surprised','neutral']

In [4]:
def generate_dataset():
    face_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
    def face_cropped(img):
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        faces = face_classifier.detectMultiScale(gray, 1.3, 5)
        
        if faces is ():
            return None
        for (x,y,w,h) in faces:
            cropped_face = img[y:y+h,x:x+w]
        return cropped_face
    
    cap = cv2.VideoCapture(0)
    
    
    for emotion in emotions:
        
        img_id = 0
        time.sleep(5)
        
        while True:
            ret, frame = cap.read()
            if face_cropped(frame) is not None:
                img_id +=1
                face = cv2.resize(face_cropped(frame), (200,200))
                face = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY)
                file_name_path = "data/"+emotion+"."+str(img_id)+".jpg"
                cv2.imwrite(file_name_path, face)
                cv2.putText(face, str(img_id), (50,50), cv2.FONT_HERSHEY_DUPLEX, 
                            1, (0,255,0), 2)
                cv2.imshow("Cropped", face)
                if cv2.waitKey(1) == 13 or int(img_id)==250 :
                    break
        
    cap.release()
    cv2.destroyAllWindows()
    print("All image data collected.")

  if faces is ():


In [None]:
generate_dataset()

## Label captured images

In [5]:
def my_label(image_name):
    emotion = image_name.split('.')[-3]
    
    #print("Emotion =",emotion)
    
    if emotion == 'happy':
        return np.array([1,0,0,0,0])
    elif emotion == 'sad':
        return np.array([0,1,0,0,0])
    elif emotion == 'angry':
        return np.array([0,0,1,0,0])
    elif emotion == 'surprised':
        return np.array([0,0,0,1,0])
    else:
        return np.array([0,0,0,0,1])

## Create model data using captured images

In [6]:
from random import shuffle
from tqdm import tqdm

def my_data():
    data = []
    for img in tqdm(os.listdir("data")):
        path = os.path.join("data",img)
        img_data=cv2.imread(path, cv2.IMREAD_GRAYSCALE)
        img_data=cv2.resize(img_data, (50,50))
        data.append([np.array(img_data), my_label(img)])
    shuffle(data)
    return data

In [7]:
data = my_data() # remove DS.STORE file from github and rerun

100%|██████████| 1250/1250 [00:01<00:00, 786.23it/s]


## Train Test Split

In [8]:
train = data[:875]
test = data[875:1250]

X_train, y_train, X_test, y_test = (np.array([i[0] for i in train]).reshape(-1,50,50,1), 
                                    [i[1] for i in train],
                                    np.array([i[0] for i in test]).reshape(-1,50,50,1),
                                    [i[1] for i in test])

In [10]:
# tf.reset_default_graph() # deprecated function
# https://stackoverflow.com/questions/40782271/attributeerror-module-tensorflow-has-no-attribute-reset-default-graph

tf.compat.v1.reset_default_graph()

convnet = input_data(shape=[50,50,1])
convnet = conv_2d(convnet,32,5,activation='relu') # 32 filters with stride = 5
convnet = max_pool_2d(convnet,5)
convnet = conv_2d(convnet,64,5,activation='relu')
convnet = max_pool_2d(convnet,5)
convnet = conv_2d(convnet,128,5,activation='relu')
convnet = max_pool_2d(convnet,5)
convnet = conv_2d(convnet,64,5,activation='relu')
convnet = max_pool_2d(convnet,5)
convnet = conv_2d(convnet,32,5,activation='relu')
convnet = max_pool_2d(convnet,5)

convnet = fully_connected(convnet, 1024, activation='relu')
convnet = dropout(convnet, 0.8)
convnet = fully_connected(convnet, 5, activation='softmax') # 5 emotions for output layer
convnet = regression(convnet, optimizer='adam', learning_rate = 0.001, loss='categorical_crossentropy')

model = tflearn.DNN(convnet, tensorboard_verbose=3)

model.fit(X_train, 
          y_train,
          n_epoch=20, 
          validation_set=(X_test, y_test),
          show_metric=True,
          run_id="emotion_detector")

Training Step: 279  | total loss: [1m[32m0.33422[0m[0m | time: 3.882s
| Adam | epoch: 020 | loss: 0.33422 - acc: 0.9549 -- iter: 832/875
Training Step: 280  | total loss: [1m[32m0.30310[0m[0m | time: 5.182s
| Adam | epoch: 020 | loss: 0.30310 - acc: 0.9594 | val_loss: 0.06271 - val_acc: 0.9787 -- iter: 875/875
--


In [None]:
'''
Model Created:::::

Training Step: 280  | total loss: 0.11289 | time: 4.963s
| Adam | epoch: 020 | loss: 0.11289 - acc: 0.9788 | val_loss: 0.02673 - val_acc: 0.9947 -- iter: 875/875

'''

## Video Code + Basic detection

In [None]:
# find out my mac's webcam dimensions

vcap = cv2.VideoCapture(0) # 0=camera
 
if vcap.isOpened(): 
    # get vcap property 
    width  = vcap.get(cv2.CAP_PROP_FRAME_WIDTH)   # float `width`
    height = vcap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float `height`

    # it gives me 0.0 :/
    fps = vcap.get(cv2.CAP_PROP_FPS)
    
    print("width =",width)
    print("height =",height)
    
    # 1280 x 720

In [None]:
# https://docs.opencv.org/master/dd/d43/tutorial_py_video_display.html

cap = cv2.VideoCapture(0)

if not cap.isOpened():
    print("Cannot open camera")
    exit()
    
# setting detection and tracking confidence    
with mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5) as pose:
    while True:
        
        # Capture frame-by-frame
        ret, frame = cap.read()
        
        # if frame is read correctly ret is True
        if not ret:
            print("Can't receive frame (stream end?). Exiting ...")
            break
        
        # Our operations on the frame come here
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False
        
        # Make detection
        results = pose.process(image)
        
        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        landmarks = results.pose_landmarks.landmark
        
        # basic detection using the draw_landmarks utility
        mp_drawing.draw_landmarks(image, # image
                                  results.pose_landmarks, # coordinates
                                  mp_pose.POSE_CONNECTIONS, # pose connections
                                  mp_drawing.DrawingSpec(color=(0,0,255), thickness=2, circle_radius=2), # dots
                                  mp_drawing.DrawingSpec(thickness=2, circle_radius=2)) # connections
        
        # Display the resulting frame
        cv2.imshow('NIKE WELLNESS DETECTOR', image)
        
        if cv2.waitKey(1) == ord('q'):
            break
    
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

In [None]:
# mp_drawing.draw_landmarks??

## Identifying joints

Mediapipe all usable body landmarks.

- 33 landmarks in total
- index starting at 0
- represent joints within the pose


![Mediapipe body landmarks](https://google.github.io/mediapipe/images/mobile/pose_tracking_full_body_landmarks.png)

In [11]:
# Mappings
i = 0
for dot in mp_pose.PoseLandmark:
    print(i, "=", dot)
    i+=1

0 = PoseLandmark.NOSE
1 = PoseLandmark.LEFT_EYE_INNER
2 = PoseLandmark.LEFT_EYE
3 = PoseLandmark.LEFT_EYE_OUTER
4 = PoseLandmark.RIGHT_EYE_INNER
5 = PoseLandmark.RIGHT_EYE
6 = PoseLandmark.RIGHT_EYE_OUTER
7 = PoseLandmark.LEFT_EAR
8 = PoseLandmark.RIGHT_EAR
9 = PoseLandmark.MOUTH_LEFT
10 = PoseLandmark.MOUTH_RIGHT
11 = PoseLandmark.LEFT_SHOULDER
12 = PoseLandmark.RIGHT_SHOULDER
13 = PoseLandmark.LEFT_ELBOW
14 = PoseLandmark.RIGHT_ELBOW
15 = PoseLandmark.LEFT_WRIST
16 = PoseLandmark.RIGHT_WRIST
17 = PoseLandmark.LEFT_PINKY
18 = PoseLandmark.RIGHT_PINKY
19 = PoseLandmark.LEFT_INDEX
20 = PoseLandmark.RIGHT_INDEX
21 = PoseLandmark.LEFT_THUMB
22 = PoseLandmark.RIGHT_THUMB
23 = PoseLandmark.LEFT_HIP
24 = PoseLandmark.RIGHT_HIP
25 = PoseLandmark.LEFT_KNEE
26 = PoseLandmark.RIGHT_KNEE
27 = PoseLandmark.LEFT_ANKLE
28 = PoseLandmark.RIGHT_ANKLE
29 = PoseLandmark.LEFT_HEEL
30 = PoseLandmark.RIGHT_HEEL
31 = PoseLandmark.LEFT_FOOT_INDEX
32 = PoseLandmark.RIGHT_FOOT_INDEX


In [None]:
# Getting the specific coordinates to a given landmark:
landmarks[mp_pose.PoseLandmark.NOSE.value] # nose
# or
landmarks[0]

In [None]:
# measure curls -- need 11, 13, 15
landmarks[11] # left shoulder
landmarks[13] # left elbow
landmarks[15] # left wrist

In [None]:
# mouth example
mouth = [landmarks[mp_pose.PoseLandmark.MOUTH_LEFT.value].x,
         landmarks[mp_pose.PoseLandmark.MOUTH_RIGHT.value].y,
         landmarks[mp_pose.PoseLandmark.MOUTH_RIGHT.value].z]

mouth

In [None]:
# angle needed between shoulder, elbow, and wrist to determine curl
# just doing x and y for now

shoulder = [landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x,
            landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y]

elbow = [landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].x,
         landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].y]

wrist = [landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].x,
         landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].y]

shoulder, elbow, wrist

get_angle(shoulder, elbow, wrist)

## Angle calculations

In [12]:
# get angle between any 3 given points
def get_angle(p1, p2, p3):
    
    p1 = np.array(p1)
    p2 = np.array(p2)
    p3 = np.array(p3)
    
    radians = np.arctan2(p3[1]-p2[1], p3[0]-p2[0]) - np.arctan2(p1[1]-p2[1], p1[0]-p2[0])
    angle = np.abs(radians * 180.0 / np.pi)
    
    if angle > 180.0:
        angle = 360-angle
        
    return angle

## Building a counter / tracker

In [19]:
##### https://docs.opencv.org/master/dd/d43/tutorial_py_video_display.html

cap = cv2.VideoCapture(0)

faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

print("Saif")

if not cap.isOpened():
    print("Cannot open camera")
    exit()
    
curl_count=0
curl_flag_left="down"
curl_flag_right="False"
current_emotion = "neutral"
    
# setting detection and tracking confidence    
with mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5) as pose:
    while True:
        
        # Capture frame-by-frame
        ret, frame = cap.read()
        
        #result = DeepFace.analyze(frame, actions=['emotion'], enforce_detection=False)
        faces = faceCascade.detectMultiScale(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY),1.1,4)

        for(x,y,w,h) in faces:
            cv2.rectangle(frame, (x,y), (x+w, y+h), (0,255,0),2)
            gray_img = cv2.cvtColor(frame[y:y+h, x:x+w], cv2.COLOR_BGR2GRAY)
            gray_img = cv2.resize(gray_img,(50,50),interpolation=cv2.INTER_AREA)
            gray_img = gray_img.reshape(50,50,1) #  needs to be 50,50,1
            # print(gray_img.shape) needs to be 50,50,1
            emotion_detection = model.predict([gray_img])[0]
            # print(np.argmax(emotion_detection))
            # print(emotions[np.argmax(emotion_detection)])
        
        # if frame is read correctly ret is True
        if not ret:
            print("Can't receive frame (stream end?). Exiting ...")
            break
        
        # Our operations on the frame come here
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        image.flags.writeable = False
        
        # Make pose detection
        results = pose.process(image)
        
        # Make emotion detection
#         gray_img = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
#         gray_img = cv2.resize(gray_img,(50,50),interpolation=cv2.INTER_AREA)
#         gray_img = gray_img.reshape(50,50,1) #  needs to be 50,50,1
#         # print(gray_img.shape) needs to be 50,50,1
#         emotion_detection = model.predict([gray_img])[0]
#         # print(np.argmax(emotion_detection))
#         # print(emotions[np.argmax(emotion_detection)])
        
        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        # Extract all joints
        try:
            landmarks = results.pose_landmarks.landmark
            
            # shoulder coordinates
            shoulder_left = [landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x,
                             landmarks[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y]
            #print("Shoulder left =",shoulder_left)
            
            shoulder_right = [landmarks[mp_pose.PoseLandmark.RIGHT_SHOULDER.value].x,
                              landmarks[mp_pose.PoseLandmark.RIGHT_SHOULDER.value].y]
            #print("Shoulder right =",shoulder_right)
            
            # elbow coordinates
            elbow_left = [landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].x,
                          landmarks[mp_pose.PoseLandmark.LEFT_ELBOW.value].y]
            
            elbow_right = [landmarks[mp_pose.PoseLandmark.RIGHT_ELBOW.value].x,
                           landmarks[mp_pose.PoseLandmark.RIGHT_ELBOW.value].y]

            # wrist coordinates
            wrist_left = [landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].x,
                          landmarks[mp_pose.PoseLandmark.LEFT_WRIST.value].y]
            
            wrist_right = [landmarks[mp_pose.PoseLandmark.RIGHT_WRIST.value].x,
                           landmarks[mp_pose.PoseLandmark.RIGHT_WRIST.value].y]
            
            # calculate angle
            # logic:
            # - if angle is > 90, then it's a curl down
            # - if angle < 90, the it's a curl up
            # - some kind of counter that starts and countfs every 2 as 1 successful curl
            # - visualize on screen
            curl_angle_left = get_angle(shoulder_left, elbow_left, wrist_left)
            #print("Curl angle left =", curl_angle_left)
            curl_angle_right = get_angle(shoulder_right, elbow_right, wrist_right)
            #print("Curl angle right =", curl_angle_right)
            
            # show angle at elbow
            cv2.putText(image, 
                        str(round(curl_angle_right)), 
                        tuple(np.multiply(elbow_right, [1280,720]).astype(int)),
                        cv2.FONT_HERSHEY_DUPLEX, 0.5, (0,255,0), 1, cv2.LINE_AA)

            cv2.putText(image, 
                        str(round(curl_angle_left)), 
                        tuple(np.multiply(elbow_left, [1280,720]).astype(int)),
                        cv2.FONT_HERSHEY_DUPLEX, 0.5, (0,255,0), 1, cv2.LINE_AA)
            
            # print("Count = ",curl_count)
            
            if curl_angle_left > 160:
                curl_flag_left = False # curl down
            if curl_angle_left < 50 and not curl_flag_left:
                curl_flag_left = True # curl up
                curl_count+=1
                #print("Count inside = ", curl_count)
                
            if curl_angle_right > 160:
                curl_flag_right=False # curl down
            if curl_angle_right < 50 and not curl_flag_right:
                curl_flag_right=True # curl up
                curl_count+=1
                #print("Count inside = ", curl_count)
            
        except:
            pass
        
        # display count
        #cv2.rectangle(image, start_point, end_point, color, thickness)
        #cv2.rectangle(image,(0,620),(1280,720),(0,0,0),-1)
        cv2.putText(image, "NIKE FITNESS TRACKER", (500,600),
                    cv2.FONT_HERSHEY_DUPLEX, 0.8, (255,255,255), 2, cv2.LINE_AA)
        cv2.rectangle(image, (600,610), (700,710), (69,255,213), -1) # rgba(213,255,69,255)
        cv2.putText(image, "REPS", (635,635),
                    cv2.FONT_HERSHEY_DUPLEX, .4, (120,116,124), 1, cv2.LINE_AA) # rgb(124,116,120)
        cv2.putText(image, str(curl_count), (640,675),
                    cv2.FONT_HERSHEY_DUPLEX, 1, (0,0,0), 2, cv2.LINE_AA) # rgb(124,116,120)
        
        ###
        
        completed = False
        
        if(curl_count==0):
            cv2.putText(image, 
                    "Let's see you do 30 rep curls!",
                    (570,30),
                    cv2.FONT_HERSHEY_DUPLEX, 0.75, (255,255,255), 2, cv2.LINE_AA)
            if completed:
                time.sleep(5)
                completed=False
        
        if(curl_count>0 and curl_count<30):
            cv2.putText(image, 
                    "Nice form, keep going! " + str(30-curl_count) + " more reps to go",
                    (500,30),
                    cv2.FONT_HERSHEY_DUPLEX, 0.75, (255,255,255), 2, cv2.LINE_AA)
            
        if(curl_count>=30):
            cv2.putText(image, 
                    "Well done! Take a break!",
                    (570,30),
                    cv2.FONT_HERSHEY_DUPLEX, 0.75, (255,255,255), 2, cv2.LINE_AA)
            
            curl_count=0
            completed=True
        
        cv2.putText(image, 
                    "Welcome Saif, M25",
                    (x,y),
                    cv2.FONT_HERSHEY_DUPLEX, 0.75, (255,255,255), 2, cv2.LINE_AA)
        
        cv2.putText(image, 
                    "why u " + emotions[np.argmax(emotion_detection)],
                    (x+25,y+h+25),
                    cv2.FONT_HERSHEY_DUPLEX, 1, (0,0,255), 2, cv2.LINE_AA)
        
        ###
        
        
        # basic detection using the draw_landmarks utility
        mp_drawing.draw_landmarks(image, # image
                                  results.pose_landmarks, # coordinates
                                  mp_pose.POSE_CONNECTIONS, # pose connections
                                  mp_drawing.DrawingSpec(color=(0,0,255), thickness=2, circle_radius=2), # dots
                                  mp_drawing.DrawingSpec(thickness=2, circle_radius=2)) # connections
        
                
        # Display the resulting frame
        cv2.imshow('NIKE WELLNESS DETECTOR', image)
        
        if cv2.waitKey(1) == ord('q'):
            break
    
    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()

Saif


In [None]:
faces