# Video Processing Notebook |
##### <strong>Author:</strong> <u>Walter Dych</u> <em>(walterpdych@gmail.com)</em>
##### <strong>Edits/Documentation:</strong> <u>Karee Garvin</u> <em>(kgarving@fas.harvard.edu)</em>

This notebook serves the purpose of video processing using various computational techniques.

A Python script that processes a video file using the MediaPipe Holistic model. The script reads in a video file and extracts pose landmarks from each frame of the video. The pose landmarks are then stored in a DataFrame along with the corresponding time stamp.

The script uses the `cv2.VideoCapture` function from the OpenCV library to read in the video file. The `isOpened` method is used to check if the video file was successfully opened. If the video file was opened successfully, the script reads in each frame of the video using the `read` method. The `ret` variable is used to check if the frame was successfully read in. If the frame was not successfully read in, the script breaks out of the loop.

The `cv2.cvtColor` function is used to convert the color space of the image from BGR to RGB. The `holistic.process` method is then used to extract pose landmarks from the image. If pose landmarks are detected in the image, the x and y coordinates of the right wrist landmark are extracted and stored in the DataFrame along with the corresponding time stamp.

The `cv2.CAP_PROP_POS_MSEC` method is used to get the time stamp of the current frame in milliseconds. This time stamp is stored in the `time_ms` variable and appended to the DataFrame along with the right wrist landmark coordinates.

Finally, the `cap.release` method is used to release the video file and free up system resources.


## Importing Libraries
Here, we import essential libraries:
- `cv2`: OpenCV for image and video processing
- `mediapipe`: Google"s MediaPipe for pose estimation
- `os`: For operating system related tasks
- `pandas`: For DataFrame support

In [1]:
import cv2
import mediapipe as mp
import os
import pandas as pd

## Setting Parameters
In this section, you can modify the following parameters:
- `MODEL`: Choose between Lite model (`1`) and Full model (`2`). Lite Model (`1`) is the  `Default`.
- `video_path`: Path to the video file.

In [2]:
MODEL = 1  # 1 = Lite model, 2 = Full model
video_path = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5012_I.MOV"  # Add your video file path here

if os.path.exists(video_path) == True:
    print(f"{video_path} is a valid file. Proceed with processing.")

else:
    raise ValueError(f"{video_path} does not exist. Try adding the entire file path.")

C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5012_I.MOV is a valid file. Proceed with processing.


## Initialization
This part initializes MediaPipe components used in the notebook.

In [3]:
# Initialize MediaPipe components
mp_drawing = mp.solutions.drawing_utils
mp_holistic = mp.solutions.holistic

## Processing Loop
The core logic of video processing is performed in this loop.

In [4]:
print(f"Processing video at {video_path}")
with mp_holistic.Holistic(static_image_mode=False, model_complexity=MODEL) as holistic:
    # Initialize DataFrame to store data
    data = []
    cap = cv2.VideoCapture(video_path)
    while cap.isOpened():
        ret, image = cap.read()
        
        if not ret:
            print("Ignoring empty camera frame.")
            break

        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = holistic.process(image)

        # Append data to list
        time_ms = cap.get(cv2.CAP_PROP_POS_MSEC)
        
        if results.pose_landmarks is not None:
            right_shoulder_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_SHOULDER].x
            right_shoulder_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_SHOULDER].y
            left_shoulder_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_SHOULDER].x
            left_shoulder_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_SHOULDER].y
            right_elbow_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_ELBOW].x
            right_elbow_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_ELBOW].y
            left_elbow_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_ELBOW].x
            left_elbow_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_ELBOW].y
            right_wrist_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_WRIST].x
            right_wrist_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_WRIST].y
            left_wrist_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_WRIST].x
            left_wrist_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_WRIST].y
            right_eye_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_EYE].x
            right_eye_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_EYE].y
            left_eye_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EYE].x
            left_eye_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EYE].y
            nose_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].x
            nose_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].y
            
            data.append([time_ms, right_shoulder_x, right_shoulder_y, left_shoulder_x, left_shoulder_y, right_elbow_x, right_elbow_y, left_elbow_x, left_elbow_y, right_wrist_x, right_wrist_y, left_wrist_x, left_wrist_y, right_eye_x, right_eye_y, left_eye_x, left_eye_y, nose_x, nose_y])

    cap.release()

Processing video at C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5012_I.MOV
Ignoring empty camera frame.


In [6]:
# Convert to DataFrame
df = pd.DataFrame(data, columns=[
    "time_ms", 
    "right_shoulder_x", "right_shoulder_y", 
    "left_shoulder_x", "left_shoulder_y", 
    "right_elbow_x", "right_elbow_y", 
    "left_elbow_x", "left_elbow_y", 
    "right_wrist_x", "right_wrist_y", 
    "left_wrist_x", "left_wrist_y", 
    "right_eye_x", "right_eye_y", 
    "left_eye_x", "left_eye_y",
    "nose_x", "nose_y"
    ])
df

Unnamed: 0,time_ms,right_shoulder_x,right_shoulder_y,left_shoulder_x,left_shoulder_y,right_elbow_x,right_elbow_y,left_elbow_x,left_elbow_y,right_wrist_x,right_wrist_y,left_wrist_x,left_wrist_y,right_eye_x,right_eye_y,left_eye_x,left_eye_y,nose_x,nose_y
0,0.000000,0.344286,0.405075,0.397223,0.370856,0.269023,0.595567,0.399092,0.530511,0.243734,0.850244,0.413310,0.686442,0.440987,0.291602,0.448068,0.289044,0.448677,0.316802
1,33.366667,0.344271,0.402060,0.396578,0.370613,0.271608,0.594896,0.388240,0.529965,0.243635,0.849359,0.414341,0.692046,0.441684,0.291520,0.448552,0.289132,0.448726,0.316247
2,66.733333,0.345014,0.398962,0.397147,0.369188,0.273000,0.594947,0.386080,0.525001,0.243705,0.845760,0.406889,0.688886,0.443281,0.291793,0.449803,0.290035,0.449091,0.316252
3,100.100000,0.345261,0.395175,0.397936,0.368232,0.273986,0.593536,0.385965,0.522503,0.243675,0.845083,0.400157,0.682124,0.446239,0.292305,0.451779,0.291149,0.450707,0.316327
4,133.466667,0.345270,0.394534,0.397938,0.366750,0.274239,0.593546,0.385206,0.520537,0.243587,0.846099,0.402443,0.679806,0.447379,0.293865,0.452677,0.292768,0.451655,0.317066
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19494,650449.800000,0.240905,0.373103,0.319563,0.336576,0.241385,0.628577,0.328566,0.528700,0.367020,0.706016,0.372243,0.657512,0.313093,0.205210,0.331602,0.209070,0.328180,0.233482
19495,650483.166667,0.240761,0.370612,0.319837,0.336627,0.235982,0.626117,0.327542,0.528356,0.361741,0.698572,0.373554,0.649609,0.313067,0.205326,0.331730,0.209633,0.328170,0.233772
19496,650516.533333,0.240556,0.369116,0.320789,0.337115,0.229665,0.622187,0.326674,0.526433,0.356640,0.684939,0.373548,0.635563,0.312859,0.204874,0.332066,0.209643,0.328163,0.233597
19497,650549.900000,0.240576,0.365816,0.321137,0.337745,0.223751,0.616461,0.326185,0.524564,0.352522,0.674295,0.373544,0.624797,0.314836,0.204793,0.334420,0.209684,0.330186,0.233430


## Data Output
Finally, the processed data is stored in a DataFrame and saved as a pickle file and csv file.

In [7]:
# Print DataFrame shape
print(f"DataFrame Head: {df.head()}")

# Save DataFrame as pickle file
pickle_file_name = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/MOTION_TRACKING_FILES/" + os.path.splitext(os.path.basename(video_path))[0] + "_keypoints.pkl"
df.to_pickle(pickle_file_name)
print(f"DataFrame saved as {pickle_file_name}")

# Save DataFrame as CSV file
csv_file_name = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/MOTION_TRACKING_FILES/" + os.path.splitext(os.path.basename(video_path))[0] + "_keypoints.csv"
df.to_csv(csv_file_name, index=False)
print(f"DataFrame saved as {csv_file_name}")

DataFrame Head:       time_ms  right_shoulder_x  right_shoulder_y  left_shoulder_x  \
0    0.000000          0.344286          0.405075         0.397223   
1   33.366667          0.344271          0.402060         0.396578   
2   66.733333          0.345014          0.398962         0.397147   
3  100.100000          0.345261          0.395175         0.397936   
4  133.466667          0.345270          0.394534         0.397938   

   left_shoulder_y  right_elbow_x  right_elbow_y  left_elbow_x  left_elbow_y  \
0         0.370856       0.269023       0.595567      0.399092      0.530511   
1         0.370613       0.271608       0.594896      0.388240      0.529965   
2         0.369188       0.273000       0.594947      0.386080      0.525001   
3         0.368232       0.273986       0.593536      0.385965      0.522503   
4         0.366750       0.274239       0.593546      0.385206      0.520537   

   right_wrist_x  right_wrist_y  left_wrist_x  left_wrist_y  right_eye_x  \
0     