<a href="https://colab.research.google.com/github/tinsirius/Week09/blob/colab/Week09/Practical09.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ECE4078 2023 Workshop 9: Human Robot Interaction (HRI)

By Dr. Leimin Tian

In this notebook, you will use hand gestures to communicate to a robot: 👍, 👎, ✌️, ☝️, ✊, 👋, 🤟

("Thumb_Up", "Thumb_Down", "Victory", "Pointing_Up", "Closed_Fist", "Open_Palm", "ILoveYou")

## Preparation

Let's start with installing MediaPipe.

### The hand gesture recognizer is developed with MediaPipe Python API  (Copyright 2023 The MediaPipe Authors)

In [None]:
#@title MediaPipe Python API Licensed under the Apache License, Version 2.0
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [None]:
!pip install -q mediapipe==0.10.0

Download an off-the-shelf model. See [MediaPipe documentation](https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#models) for more details about the model.

In [None]:
!wget -q https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/1/gesture_recognizer.task

### Visualization Utilities

In [None]:
#@markdown Functions to visualize the gesture recognition results. <br/> Run the following cell to activate the functions.
from matplotlib import pyplot as plt
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2
import math

plt.rcParams.update({
    'axes.spines.top': False,
    'axes.spines.right': False,
    'axes.spines.left': False,
    'axes.spines.bottom': False,
    'xtick.labelbottom': False,
    'xtick.bottom': False,
    'ytick.labelleft': False,
    'ytick.left': False,
    'xtick.labeltop': False,
    'xtick.top': False,
    'ytick.labelright': False,
    'ytick.right': False
})

mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles


def display_one_image(image, title, subplot, titlesize=16):
    """Displays one image along with the predicted category name and score."""
    plt.subplot(*subplot)
    plt.imshow(image)
    if len(title) > 0:
        plt.title(title, fontsize=int(titlesize), color='black', fontdict={'verticalalignment':'center'}, pad=int(titlesize/1.5))
    return (subplot[0], subplot[1], subplot[2]+1)


def display_batch_of_images_with_gestures_and_hand_landmarks(images, results):
    """Displays a batch of images with the gesture category and its score along with the hand landmarks."""
    # Images and labels.
    images = [image.numpy_view() for image in images]
    gestures = [top_gesture for (top_gesture, _) in results]
    multi_hand_landmarks_list = [multi_hand_landmarks for (_, multi_hand_landmarks) in results]

    # Auto-squaring: this will drop data that does not fit into square or square-ish rectangle.
    rows = int(math.sqrt(len(images)))
    cols = len(images) // rows

    # Size and spacing.
    FIGSIZE = 10.0
    SPACING = 0.1
    subplot=(rows,cols, 1)
    if rows < cols:
        plt.figure(figsize=(FIGSIZE,FIGSIZE/cols*rows))
    else:
        plt.figure(figsize=(FIGSIZE/rows*cols,FIGSIZE))

    # Display gestures and hand landmarks.
    for i, (image, gestures) in enumerate(zip(images[:rows*cols], gestures[:rows*cols])):
        title = f"{gestures.category_name} ({gestures.score:.2f})"
        dynamic_titlesize = FIGSIZE*SPACING/max(rows,cols) * 40 + 3
        annotated_image = image.copy()

        for hand_landmarks in multi_hand_landmarks_list[i]:
          hand_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
          hand_landmarks_proto.landmark.extend([
            landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in hand_landmarks
          ])

          mp_drawing.draw_landmarks(
            annotated_image,
            hand_landmarks_proto,
            mp_hands.HAND_CONNECTIONS,
            mp_drawing_styles.get_default_hand_landmarks_style(),
            mp_drawing_styles.get_default_hand_connections_style())

        subplot = display_one_image(annotated_image, title, subplot, titlesize=dynamic_titlesize)

    # Layout.
    plt.tight_layout()
    plt.subplots_adjust(wspace=SPACING, hspace=SPACING)
    plt.show()

### Load the hand gesture recognizer

In [None]:
# import necessary modules
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

# create an GestureRecognizer object
base_options = python.BaseOptions(model_asset_path='gesture_recognizer.task')
options = vision.GestureRecognizerOptions(base_options=base_options)
recognizer = vision.GestureRecognizer.create_from_options(options)

## 1. Test the hand gesture recognizer

### Choose which option to use for getting the test image

In [None]:
# specify the input source
img_option = 1 # Option 1: use your webcam
# img_option = 2 # Option 2: upload image from your local machine

#### Option 1: use your webcam

In [None]:
#@markdown (webcam input utility functions based on [this git repo](https://github.com/tugstugi/dl-colab-notebooks))
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode

def take_photo(filename='photo.jpg', quality=0.8):
  js = Javascript('''
    async function takePhoto(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Capture';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
    }
    ''')
  display(js)
  data = eval_js('takePhoto({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename

def take_photo_robot(filename='photo.jpg', quality=0.8):
  js = Javascript('''
    async function takePhoto(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Give Instruction (press after posing)';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
    }
    ''')
  display(js)
  data = eval_js('takePhoto({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename

In [None]:
from IPython.display import Image
frame_ID = 0
if img_option == 1:
    try:
        frame_name='frame_' + str(frame_ID) + '.jpg'
        filename = take_photo(frame_name, 0.8)
        print('Photo saved as {} (see left side menu for all files)'.format(filename))

        # Show the image which was just taken.
        display(Image(filename))
        IMAGE_FILENAMES = [filename]
    except Exception as err:
        # Errors will be thrown if the user does not have a webcam or if they do not
        # grant the page permission to access it.
        print(str(err))

#### Option 2: upload from you local machine

In [None]:
from google.colab import files

if img_option == 2:
    uploaded = files.upload()

    for filename in uploaded:
        content = uploaded[filename]
    with open(filename, 'wb') as f:
        f.write(content)
    IMAGE_FILENAMES = list(uploaded.keys())

    print('Uploaded files:', IMAGE_FILENAMES)

### Running inference and visualizing the results

See [MediaPipe documentation](https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer/python) for more info.

*Note: Gesture Recognizer also returns the hand landmark it detects from the image, together with other useful information such as whether the hand(s) detected are left hand or right hand.*

In [None]:
images = []
results = []
for image_file_name in IMAGE_FILENAMES:
  # load the input image
  image = mp.Image.create_from_file(image_file_name)

  # recognize gestures in the input image
  recognition_result = recognizer.recognize(image)

  # process and visualize the recognition result
  images.append(image)
  top_gesture = recognition_result.gestures[0][0]
  hand_landmarks = recognition_result.hand_landmarks
  results.append((top_gesture, hand_landmarks))

display_batch_of_images_with_gestures_and_hand_landmarks(images, results)

## 2. Using gestures to give feedback to a robot

### Define your hand gesture commands
Here the robot gets a reward of +1.0 if it sees a thumb up, -1.0 for thumb down, 0 for everything else.

In [None]:
# command dictionary for matching recognised hand gesture to robot actions
# available labels: ['Thumb_Up','Thumb_Down','Victory','Pointing_Up','Closed_Fist','Open_Palm','ILoveYou','None']
def commands(top_gesture, conf_threshold=0.5):
    label = top_gesture.category_name
    conf = top_gesture.score
    reward = 0
    if conf >= conf_threshold:
        if label == 'Thumb_Up':
            print(f'Human instruction is {label}: Good Job!')
            reward = 1
        elif label == 'Thumb_Down':
            print(f'Human instruction is {label}: Try harder.')
            reward = -1
        else:
            print(f'Unknown instruction: hand gesture recognised as {label}')
            reward = 0
    else:
        print(f'Human gesture may be {label}, but unsure as my confidence is only {conf:.2f}')
        reward = 0
    return reward

### The interaction scenario
The simple scenario we are creating here is a robot and a human crossing paths when walking to their own goal.

We generate a number of episodes by varying the speed that the robot moves

In [None]:
# generate random episodes where a robot and a person cross path at different speed
import numpy as np
np.random.seed(0) # seed for reproducibility

# how many episodes to generate
ep_num = 5
# a default list of speeds to use
speeds = np.array([1,3,5,7,10])

# you can also generate the speeds randomly
#speeds = np.random.randint(1, 11, ep_num)
speeds = np.sort(speeds)

print(f'Speed in generated episodes: {speeds}')

In [None]:
# plotting the scenario
# part of this cell was implemented with the help of ChatGPT
%matplotlib inline

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
import warnings
import matplotlib.cbook
warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)

# define the start and target points for the person (dot) and robot (square)
# feel free to change the map and the start and goal of human/robot
dot_start = (0, 0)
dot_target = (10, 9)
square_start = (0, 10)
square_target = (9, 0)

# create map with background grid
fig, ax = plt.subplots()
plt.xlim(-1, 11)
plt.ylim(-1, 11)
ax.set_xlabel('red dot is a person, blue square is a robot')
ax.set_ylabel('a simple near miss map')
ax.grid(True)

# paint the start and finish location
plt.plot(0,0,'>', color = 'red', markersize=15)
plt.plot(10,9,'*', color = 'red', markersize=15)
plt.plot(0,10,'>', color = 'blue', markersize=15)
plt.plot(9,0,'*', color = 'blue', markersize=15)

# create a dot that will move from dot_start to dot_target
dot, = ax.plot([], [], 'ro', markersize=10)

# create a square that will move from square_start to square_target
square, = ax.plot([], [], 'bs', markersize=10)

# initialize the animation
def init(robot_speed):
    dot.set_data([], [])
    square.set_data([], [])
    return dot, square, robot_speed

# update the plot per frame
def update(frame, robot_speed):
    dot_progress = frame / num_frames
    x_dot = dot_start[0] + dot_progress * (dot_target[0] - dot_start[0])
    y_dot = dot_start[1] + dot_progress * (dot_target[1] - dot_start[1])
    dot.set_data(x_dot, y_dot)

    # the robot's speed is robot_speed times of the human's speed
    square_progress = (frame  / num_frames) * robot_speed
    x_square = square_start[0] + square_progress * (square_target[0] - square_start[0])
    y_square = square_start[1] + square_progress * (square_target[1] - square_start[1])
    square.set_data(x_square, y_square)

    return dot, square

In [None]:
# view an example episode
num_frames = 100 # Total number of frames
interval = 50 # Interval in milliseconds between frames

# show slowest robot setting
ID = 0
robot_speed = speeds[ID]

# create the animation
animation = FuncAnimation(fig, update, frames=num_frames, fargs=(robot_speed,), init_func=lambda: init(robot_speed), blit=False, interval=interval)

# display the animation
HTML(animation.to_html5_video())

In [None]:
# generate and save a few example episodes
num_frames = 100 # Total number of frames
interval = 50 # Interval in milliseconds between frames controlled by speed
ID = 0

while ID < ep_num:
    robot_speed = speeds[ID]

    # Create the animation
    animation = FuncAnimation(fig, update, frames=num_frames, fargs=(robot_speed,), init_func=lambda: init(robot_speed), blit=False, interval=interval)
    video_name = 'interaction_' + str(ID) + '.mp4'
    animation.save(video_name, writer="ffmpeg")

    ID = ID + 1

### Let's give it a try
- webcam required for interactive feedback
- you can use a simulated human instead that provides a feedback list based on the speeds

In [None]:
# change to False if you want to use simulated feedback instead of your webcam
webcam = True

In [None]:
# a simulated human
# prefer speed between [5,6,7], don't like too slow [1,2] or too fast [9,10], otherwise [3,4,8] indifferent
feedback_sim = -1 + 1 * (speeds>2) - 1 * (speeds>8) + 1 * (speeds>4) - 1 * (speeds>7)
print(f'Human feedback given: {feedback_sim}')

In [None]:
# video display function
from IPython.display import HTML, clear_output
from base64 import b64encode

def show_video(video_path, video_width = 500):

  video_file = open(video_path, "r+b").read()

  video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"
  return HTML(f"""<video width={video_width} controls><source src="{video_url}"></video>""")

In [None]:
# give responses to each interaction episode
from IPython.display import Video
ID = 0
reward = 0
feedback = []

# webcam interactive feedback
if webcam:
    while ID < ep_num:
        # show example episode
        ep_video = 'interaction_' + str(ID) + '.mp4'
        print(f'Showing example episode {ep_video} with robot speed = {speeds[ID]}')
        display(show_video(ep_video))
        print('\nPress \'Give Instruction\' button to capture you hand gestures as feedback to the robot')

        # capture gesture response from webcam
        frame_name = 'frame_' + str(ID) + '.jpg'
        filename = take_photo_robot(frame_name, 0.8)
        ID = ID + 1
        image = mp.Image.create_from_file(filename)

        # recognize gesture in input image as feedback
        recognition_result = recognizer.recognize(image)
        if recognition_result.gestures == []:
            print('No hands found!')
            reward = 0
        else:
            top_gesture = recognition_result.gestures[0][0]
            reward = commands(top_gesture, 0.5)
        print(f'\nReward given by human = {reward}')
        feedback.append(reward)

        # remove old video before showing new one
        clear_output()

# use simulated feedback if webcam not used
else:
    # use simulated human
    feedback = feedback_sim

print(f'Human feedback given: {feedback}')

Here we design a simple robot policy:


*   If there are episodes rated as preferred by the human, use the fastest speed that the human prefer
*   If there are no episodes rated as preferred, but there are episodes rated as indifferent, use the fastest speed that the human is indifferent about
*   If the humans rated all episodes as undesirable, show a new set of episodes with speed different from those already shown.






In [None]:
# print human feedback summary
feedback = np.array(feedback)
likes = speeds[feedback == 1]
dislikes = speeds[feedback == -1]
indifferent = speeds[feedback == 0]
print(f'Preferred speeds: {likes}; Undesirable speeds: {dislikes}; Indifferent speeds: {indifferent}')

In [None]:
# decide on speed based on human feedback
chosen_speed = 0

# new speed options that's not yet shown to the user
all_speeds = np.arange(1, 11, dtype=int)
new_speeds = np.setdiff1d(all_speeds,speeds)
new_speeds = np.sort(new_speeds)

# if there are episodes that the user liked, use the fastest speed that's rated as liked
if len(likes) > 0:
    likes = np.sort(likes)
    chosen_speed = likes[-1]
    print(f'Thank you for the feedback! I will use speed = {chosen_speed} then, since you liked it and it will get me to my goal faster.')
# if there are no episodes the user liked, use the fastest speed that they didn't hate
elif len(indifferent) > 0:
    indifferent = np.sort(indifferent)
    chosen_speed = indifferent[-1]
    print(f'Thank you for the feedback! I will use speed = {chosen_speed} then, since you didn\'t hate it and it will get me to my goal faster.')
# if they dislike all episodes, show some new options
else:
    print(f'Thank you for the feedback! I\'m sorry that you didn\'t like any of the shown speeds, here are some alternatives: {new_speeds}')

In [None]:
new_feedback = []
# show the chosen speed
if chosen_speed > 0:
    robot_speed = chosen_speed
    # create the animation
    animation = FuncAnimation(fig, update, frames=num_frames, fargs=(robot_speed,), init_func=lambda: init(robot_speed), blit=False, interval=interval)
    video_name = 'interaction_chosen_speed_' + str(chosen_speed) + '.mp4'

    # display the animation
    print(f'Here is what we\'ve decided: robot speed = {chosen_speed}')
    animation.save(video_name, writer="ffmpeg")
    display(show_video(video_name))

# show an alternative with the slowest new speed
else:
    robot_speed = new_speeds[0]
    # create the animation
    animation = FuncAnimation(fig, update, frames=num_frames, fargs=(robot_speed,), init_func=lambda: init(robot_speed), blit=False, interval=interval)
    video_name = 'interaction_new_speed_' + str(robot_speed) + '.mp4'
    animation.save(video_name, writer="ffmpeg")

    # display the animation
    print(f'How about this one: robot speed = {robot_speed}')
    display(show_video(video_name))

    # capture gesture response from webcam
    frame_name = 'frame_new_speed_' + str(robot_speed) + '.jpg'
    filename = take_photo_robot(frame_name, 0.8)
    image = mp.Image.create_from_file(filename)

    # recognize gesture in input image as feedback
    recognition_result = recognizer.recognize(image)
    if recognition_result.gestures == []:
        print('No hands found!')
        reward = 0
    else:
        top_gesture = recognition_result.gestures[0][0]
        reward = commands(top_gesture, 0.5)
    print(f'\nReward given by human = {reward}')
    new_feedback.append(reward)

Feel free to modify the gesture commands, map, robot's decision policy etc. and develop your own gesture controlled robot!

Beyond gestures, there are many ways humans and robots can interact naturally.