# Video Processing Notebook |
##### <strong>Author:</strong> <u>Walter Dych</u> <em>(walterpdych@gmail.com)</em>
##### <strong>Edits/Documentation:</strong> <u>Karee Garvin</u> <em>(kgarvin@fas.harvard.edu)</em>

This notebook serves the purpose of video processing using various computational techniques.

A Python script that processes a video file using the MediaPipe Holistic model. The script reads in a video file and extracts pose landmarks from each frame of the video. The pose landmarks are then stored in a DataFrame along with the corresponding time stamp.

The script uses the `cv2.VideoCapture` function from the OpenCV library to read in the video file. The `isOpened` method is used to check if the video file was successfully opened. If the video file was opened successfully, the script reads in each frame of the video using the `read` method. The `ret` variable is used to check if the frame was successfully read in. If the frame was not successfully read in, the script breaks out of the loop.

The `cv2.cvtColor` function is used to convert the color space of the image from BGR to RGB. The `holistic.process` method is then used to extract pose landmarks from the image. If pose landmarks are detected in the image, the x and y coordinates of the right wrist landmark are extracted and stored in the DataFrame along with the corresponding time stamp.

The `cv2.CAP_PROP_POS_MSEC` method is used to get the time stamp of the current frame in milliseconds. This time stamp is stored in the `time_ms` variable and appended to the DataFrame along with the right wrist landmark coordinates.

Finally, the `cap.release` method is used to release the video file and free up system resources.


## Importing Libraries
Here, we import essential libraries:
- `cv2`: OpenCV for image and video processing
- `mediapipe`: Google"s MediaPipe for pose estimation
- `os`: For operating system related tasks
- `pandas`: For DataFrame support

In [7]:
import cv2
import mediapipe as mp
import os
import pandas as pd

## Setting Parameters
In this section, you can modify the following parameters:
- `MODEL`: Choose between Lite model (`1`) and Full model (`2`). Lite Model (`1`) is the  `Default`.
- `video_path`: Path to the video file.

In [8]:
MODEL = 2  # 1 = Lite model, 2 = Full model
video_path = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5003_I.MOV"  # Add your video file path here

if os.path.exists(video_path) == True:
    print(f"{video_path} is a valid file. Proceed with processing.")

else:
    raise ValueError(f"{video_path} does not exist. Try adding the entire file path.")

C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5003_I.MOV is a valid file. Proceed with processing.


## Initialization
This part initializes MediaPipe components used in the notebook.

In [9]:
# Initialize MediaPipe components
mp_drawing = mp.solutions.drawing_utils
mp_holistic = mp.solutions.holistic

## Processing Loop
The core logic of video processing is performed in this loop.

In [10]:
print(f"Processing video at {video_path}")
with mp_holistic.Holistic(static_image_mode=False, model_complexity=MODEL) as holistic:
    # Initialize DataFrame to store data
    data = []
    cap = cv2.VideoCapture(video_path)
    while cap.isOpened():
        ret, image = cap.read()
        
        if not ret:
            print("Ignoring empty camera frame.")
            break

        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = holistic.process(image)

        # Append data to list
        time_ms = cap.get(cv2.CAP_PROP_POS_MSEC)
        
        # Dictionary to store data
        if results.pose_landmarks is not None:
            right_shoulder_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_SHOULDER].x
            right_shoulder_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_SHOULDER].y
            left_shoulder_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_SHOULDER].x
            left_shoulder_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_SHOULDER].y
            right_elbow_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_ELBOW].x
            right_elbow_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_ELBOW].y
            left_elbow_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_ELBOW].x
            left_elbow_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_ELBOW].y
            right_wrist_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_WRIST].x
            right_wrist_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_WRIST].y
            left_wrist_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_WRIST].x
            left_wrist_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_WRIST].y
            right_eye_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_EYE].x
            right_eye_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_EYE].y
            left_eye_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EYE].x
            left_eye_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EYE].y
            nose_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].x
            nose_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].y
            
            data.append([time_ms, right_shoulder_x, right_shoulder_y, left_shoulder_x, left_shoulder_y, right_elbow_x, right_elbow_y, left_elbow_x, left_elbow_y, right_wrist_x, right_wrist_y, left_wrist_x, left_wrist_y, right_eye_x, right_eye_y, left_eye_x, left_eye_y, nose_x, nose_y])

    cap.release()

Processing video at C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5003_I.MOV
Ignoring empty camera frame.


In [11]:
# Convert to DataFrame
df = pd.DataFrame(data, columns=[
    "time_ms", 
    "right_shoulder_x", "right_shoulder_y", 
    "left_shoulder_x", "left_shoulder_y", 
    "right_elbow_x", "right_elbow_y", 
    "left_elbow_x", "left_elbow_y", 
    "right_wrist_x", "right_wrist_y", 
    "left_wrist_x", "left_wrist_y", 
    "right_eye_x", "right_eye_y", 
    "left_eye_x", "left_eye_y",
    "nose_x", "nose_y"
    ])
df

Unnamed: 0,time_ms,right_shoulder_x,right_shoulder_y,left_shoulder_x,left_shoulder_y,right_elbow_x,right_elbow_y,left_elbow_x,left_elbow_y,right_wrist_x,right_wrist_y,left_wrist_x,left_wrist_y,right_eye_x,right_eye_y,left_eye_x,left_eye_y,nose_x,nose_y
0,0.000000,0.199463,0.340649,0.307089,0.326427,0.209208,0.516104,0.317789,0.469892,0.309411,0.596308,0.333597,0.567451,0.261056,0.208618,0.284653,0.215582,0.278071,0.237287
1,33.366667,0.199205,0.340690,0.307200,0.324761,0.209811,0.517547,0.317766,0.469894,0.309902,0.598627,0.333460,0.570136,0.267084,0.210641,0.285232,0.215626,0.283179,0.236994
2,66.733333,0.198797,0.340773,0.307264,0.323920,0.210236,0.518136,0.317766,0.469977,0.310517,0.600213,0.333448,0.572002,0.269497,0.211850,0.285584,0.215714,0.285020,0.236878
3,100.100000,0.198196,0.341093,0.307282,0.323479,0.210782,0.518799,0.317788,0.470244,0.311279,0.601455,0.333448,0.573991,0.268893,0.213135,0.285734,0.216393,0.284711,0.237088
4,133.466667,0.197653,0.341525,0.307296,0.323301,0.211025,0.519491,0.317821,0.470756,0.311851,0.602099,0.333478,0.575677,0.268379,0.214034,0.285785,0.217125,0.284319,0.237508
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27142,905638.066667,0.206125,0.338583,0.302290,0.322288,0.233486,0.508812,0.318644,0.467695,0.320299,0.606390,0.325333,0.594677,0.278026,0.216274,0.294548,0.216091,0.291911,0.234981
27143,905671.433333,0.206185,0.338427,0.302592,0.322326,0.233556,0.508712,0.318653,0.467706,0.320339,0.606360,0.325574,0.594616,0.278034,0.216282,0.294653,0.216103,0.291932,0.234917
27144,905704.800000,0.206348,0.337833,0.302903,0.322060,0.233571,0.507235,0.318699,0.466909,0.320347,0.605873,0.325613,0.594055,0.277924,0.216243,0.294691,0.215957,0.291850,0.234888
27145,905738.166667,0.206676,0.336745,0.303402,0.321701,0.233593,0.505657,0.318764,0.466658,0.320415,0.605315,0.325464,0.593937,0.277642,0.216103,0.294705,0.215690,0.291659,0.234785


## Data Output
Finally, the processed data is stored in a DataFrame and saved as a pickle file and csv file.

In [12]:
# Print DataFrame shape
print(f"DataFrame Head: {df.head()}")

# Save DataFrame as pickle file
pickle_file_name = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/MOTION_TRACKING_FILES/" + os.path.splitext(os.path.basename(video_path))[0] + "_keypoints.pkl"
df.to_pickle(pickle_file_name)
print(f"DataFrame saved as {pickle_file_name}")

# Save DataFrame as CSV file
csv_file_name = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/MOTION_TRACKING_FILES/" + os.path.splitext(os.path.basename(video_path))[0] + "_keypoints.csv"
df.to_csv(csv_file_name, index=False)
print(f"DataFrame saved as {csv_file_name}")

DataFrame Head:       time_ms  right_shoulder_x  right_shoulder_y  left_shoulder_x  \
0    0.000000          0.199463          0.340649         0.307089   
1   33.366667          0.199205          0.340690         0.307200   
2   66.733333          0.198797          0.340773         0.307264   
3  100.100000          0.198196          0.341093         0.307282   
4  133.466667          0.197653          0.341525         0.307296   

   left_shoulder_y  right_elbow_x  right_elbow_y  left_elbow_x  left_elbow_y  \
0         0.326427       0.209208       0.516104      0.317789      0.469892   
1         0.324761       0.209811       0.517547      0.317766      0.469894   
2         0.323920       0.210236       0.518136      0.317766      0.469977   
3         0.323479       0.210782       0.518799      0.317788      0.470244   
4         0.323301       0.211025       0.519491      0.317821      0.470756   

   right_wrist_x  right_wrist_y  left_wrist_x  left_wrist_y  right_eye_x  \
0     