<a href="https://colab.research.google.com/github/taylan-sen/CIS490b_computer_vision/blob/main/pose_detection_mediapipe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Human pose detection using MediaPipe
MediaPipe is a powerful set of tools developed by Google, allowing to apply diversified tasks linked to computer vision (*eg* object detection, gesture recognition), text (*eg* text embedding) and audio data (audio classification). In this tutorial, I am going to show you how to use MediaPipe to detect pose landmarks in videos, following these steps:


*   Import and install the necessary Python packages
*   Download the MediaPipe pose landmark detection model
*   Load an input video
*   Extract the pose landmarks on each video frame, and visualize them
*   Save the current frame alongside with the extracted pose landmarks



## Required packages

To be able to follow this notebook, these packages should be installed:


*   MediaPipe
*   OpenCV
*   NumPy

This tutorial does not cover opencv and numpy installation (you can visit these links to install [NumPy](https://numpy.org/install/) and [OpenCV](https://pypi.org/project/opencv-python/)). You can get MediaPipe installed using:



In [1]:
# Install MediaPipe
!pip install -q mediapipe==0.10.0

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m33.9/33.9 MB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[?25h

Let's begin with importing the above-mentioned packages:

In [2]:
# Import the required packages
import cv2
import numpy as np
import mediapipe as mp

If you are running this notebook on Google Colab, and need to access files in your Google Drive, you will need to mount Google Drive. This can be achieved with:

In [3]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

MessageError: Error: credential propagation was unsuccessful

## Pose landmark detection model

The pose landmarker model proposed by MediaPipe allows to extract 33 body landmark 3D coordinates (the landmarks are shown in the figure below), estimate for each landmark whether it is visible in the input frame (meaning not hidden by another body part or by an object), and indicate the probability that it is present in the frame (i.e. inside the frame).

<p align="center">
  <img src="https://github.com/rmeziatisab/PoseDetection/blob/main/pose_landmarks_index.png?raw=true:, width=250" alt="My Image" width=250>
  <p style="font-size: 14px;" align="center">
      <i>MediaPipe pose landmarks</i> *.
  </p>
</p>

You can download the pose landmarker model [here](https://developers.google.com/mediapipe/solutions/vision/pose_landmarker#models). For this tutorial, I will use the Full model.<!-- Here is another option to get the pose landmarker model:
!wget -O pose_landmarker.task -q https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_heavy/float16/1/pose_landmarker_full.task
-->

<p id="landmarks_img_src" style="fint-size: 10px;">
   <i>* <a href="https://developers.google.com/mediapipe/solutions/vision/pose_landmarker#pose_landmarker_model">Source</a></i>
</p>

## Pose landmark visualization
For body landmark visualization, I will be using the following function, developed by the MediaPipe authors. This function will be called to annotate the current frame with the detected body landmarks, as well as draw the connections between them.

In [None]:
# Use function allowing to draw pose landmarks on an image
from mediapipe.python import solutions
from mediapipe.framework.formats import landmark_pb2

def draw_landmarks_on_image(rgb_image, detection_result):
    pose_landmarks_list = detection_result.pose_landmarks
    annotated_image = np.copy(rgb_image)

    for idx in range(len(pose_landmarks_list)):
        pose_landmarks = pose_landmarks_list[idx]

        pose_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
        pose_landmarks_proto.landmark.extend([
            landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in pose_landmarks
        ])
        solutions.drawing_utils.draw_landmarks(
            annotated_image,
            pose_landmarks_proto,
            solutions.pose.POSE_CONNECTIONS,
            solutions.drawing_styles.get_default_pose_landmarks_style()
        )
    return annotated_image

## Pose landmark extraction
The first thing to do is load the video the pose detection model will be applied to. The frame rate (expressed in frames per second, fps) and the frame dimensions (width and height) are extracted in order to create an ouput video that will contain the input frames annotated with the detected pose landmarks. This new video will have the same frame rate and frame dimensions as the input video.

In [None]:
# Read the input video and extract video information
input_path = 'my_input_path'+'.mp4'
cap = cv2.VideoCapture(input_path)

if cap.isOpened() is False:
    print('Video not found')
else:
  fps = cap.get(cv2.CAP_PROP_FPS)
  width = int(cap.get(cv2. CAP_PROP_FRAME_WIDTH))
  height = int(cap.get(cv2. CAP_PROP_FRAME_HEIGHT))

We need to create a VideoWriter object to be able to save the output video. The video path - containing the video file extension - should be specified. The VideoWriter object constructor takes the path, the [FourCC](https://https://en.wikipedia.org/wiki/FourCC) that identifies the video format, the frame rate as well as the frame size. An `'MP4V'` FourCC corresponds to the MP4 format.

In [None]:
# Create a VideoWriter object
output_path = 'my_output_path'+'.mp4'
output_vid = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'MP4V'), fps, (width, height))

Next, we set the options needed to create a PoseLandmarker object. Among the options, the downloaded model path and the video mode are indicated.

In [None]:
# Set options for the PoseLandmarker object
from mediapipe.tasks import python

model_path = 'my_model_path'+'.task'

BaseOptions = python.BaseOptions
PoseLandmarker = python.vision.PoseLandmarker
PoseLandmarkerOptions = python.vision.PoseLandmarkerOptions
VisionRunningMode = python.vision.RunningMode
options = PoseLandmarkerOptions(base_options=BaseOptions(model_asset_path=model_path),
                                running_mode=VisionRunningMode.VIDEO)

Pose landmarks are detected on each video frame by calling the `detect_for_video` method. This function needs the current frame to be converted to an Image object, and takes as second parameter its corresponding frame timestamp in milliseconds ($ms$). The frame timestamp can be obtained as:

<center> $ frame\,timestamp = 1000*\frac{frame\,index}{frame\,rate} $ </center>

The frame index is an integer between $1$ and the total number of frames, describing the current frame number. The frame index can be obtained either with a counter variable incremented each time a new frame is read, or using the VideoCapure `get` method as follows:
`int(cap.get(cv2.CAP_PROP_POS_FRAMES))`.

Then, the `draw_landmarks_on_image` function is called to draw the detected body landmark positions and connections. I use the `cv2_imshow` function to display the annotated frames since I am using Google Colab. If you are using another environment, you can use `cv2.imshow`.

Finally, the annotated frame is saved as an output video frame.


In [None]:
from google.colab.patches import cv2_imshow

# Extract pose landmarks
with PoseLandmarker.create_from_options(options) as landmarker:
    frame_index = 0
    while cap.isOpened():
        hasFrame, image = cap.read()
        if not hasFrame:
            print('No more frames to read!')
            break

        # Reorder the RGB color channels as data is loaded with the BGR order with the read method
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        # Transform the frame to a NumPy ndarray before converting it to an Image
        numpy_frame_from_opencv = np.asarray(image)
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=numpy_frame_from_opencv)
        frame_index += 1 # you can use cap.get(cv2.CAP_PROP_POS_FRAMES) instead
        # Compute the frame timestamp and cast it to int as required by the detect_for_video function
        frame_timestamp_ms = int(1000*frame_index / fps)
        result = landmarker.detect_for_video(mp_image, frame_timestamp_ms)
        # Draw the landmarks on the frame as a NumPy ndarray
        annotated_image = draw_landmarks_on_image(mp_image.numpy_view(), result)
        # Order color channels back to BGR
        cv_image = cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)
        # Show the annotated frame
        cv2_imshow(cv_image)

        # Stop reading the video if 'q' key is pressed
        if cv2.waitKey(5) & 0xFF == ord('q'):
            break

        output_vid.write(cv_image)

    cap.release()
    output_vid.release()

cv2.destroyAllWindows()

## Conclusion
In this tutorial we have seen how to use MediaPipe for human pose detection on a video. To this end, we've followed these steps: install MediaPipe, download the pose detection model, load the input video, detect pose landmarks on the input frames and save them on each frame to constitute an ouptut video. I hope you have learnt something new through this tutorial and see you next time!

<p align="center">
  <img src="https://github.com/rmeziatisab/PoseDetection/blob/main/goodbye_img.png?raw=true:, width=250" alt="My Image" width=250>
  <p style="font-size: 14px;" align="center">
      <i>Goodbye :-) </i>
  </p>
</p>