# Multimodal Learning Analytics Assignments

As discussed in classroom, you will be given the features extracted from the raw recordings.  It is up to you to create any type of report that help us understand better the collaboration.

Here is the data:

## OpenPose - Skeleton

The video was feed to the <a href="https://github.com/CMU-Perceptual-Computing-Lab/openpose">OpenPose</a> library to extract the skeleton of body landmarks of the participant.

The data was originally in json, but it was converted into a csv file for your convinience.  It only contains the 1000 first video frames (that is almost 67 seconds of the video).

Let's load it:

In [5]:
import pandas as pd

skeletons= pd.read_csv("openPose.csv")
skeletons.columns

Index(['frame', 'Nose_x', 'Nose_y', 'Nose_c', 'Neck_x', 'Neck_y', 'Neck_c',
       'RShoulder_x', 'Shoulder_y', 'RShoulder_c', 'RElbow_x', 'RElbow_y',
       'RElbow_c', 'RWrist_x', 'RWrist_y', 'RWrist_c', 'LShoulder_x',
       'LShoulder_y', 'LShoulder_c', 'LElbow_x', 'LElbow_y', 'LElbow_c',
       'LWrist_x', 'LWrist_y', 'LWrist_c', 'MidHip_x', 'MidHip_y', 'MidHip_c',
       'RHip_x', 'RHip_y', 'RHip_c', 'RKnee_x', 'RKnee_y', 'RKnee_c',
       'RAnkle_x', 'RAnkle_y', 'RAnkle_c', 'LHip_x', 'LHip_y', 'LHip_c',
       'LKnee_x', 'LKnee_y', 'LKnee_c', 'LAnkle_x', 'LAnkle_y', 'LAnkle_c',
       'REye_x', 'REye_y', 'REye_c', 'LEye_x', 'LEye_y', 'LEye_c', 'REar_x',
       'REar_y', 'REar_c', 'LEar_x', 'LEar_y', 'LEar_c', 'LBigToe_x',
       'LBigToe_y', 'LBigToe_c', 'LSmallToe_x', 'LSmallToe_y', 'LSmallToe_c',
       'LHeel_x', 'LHeel_y', 'LHeel_c', 'RBigToe_x', 'RBigToe_y', 'RBigToe_c',
       'RSmallToe_x', 'RSmallToe_y', 'RSmallToe_c', 'RHeel_x', 'RHeel_y',
       'RHeel_c'],
      dtype

Each row in the datasframe contains the frame number (first column) followed by the position and confidence of each one of the body landmarks.  These body landmarks can be seen in the columns names.  'x' and 'y' are the coordinates of that landmark and 'c' is the confidence of the detection.

In [6]:
skeletons.columns

Index(['frame', 'Nose_x', 'Nose_y', 'Nose_c', 'Neck_x', 'Neck_y', 'Neck_c',
       'RShoulder_x', 'Shoulder_y', 'RShoulder_c', 'RElbow_x', 'RElbow_y',
       'RElbow_c', 'RWrist_x', 'RWrist_y', 'RWrist_c', 'LShoulder_x',
       'LShoulder_y', 'LShoulder_c', 'LElbow_x', 'LElbow_y', 'LElbow_c',
       'LWrist_x', 'LWrist_y', 'LWrist_c', 'MidHip_x', 'MidHip_y', 'MidHip_c',
       'RHip_x', 'RHip_y', 'RHip_c', 'RKnee_x', 'RKnee_y', 'RKnee_c',
       'RAnkle_x', 'RAnkle_y', 'RAnkle_c', 'LHip_x', 'LHip_y', 'LHip_c',
       'LKnee_x', 'LKnee_y', 'LKnee_c', 'LAnkle_x', 'LAnkle_y', 'LAnkle_c',
       'REye_x', 'REye_y', 'REye_c', 'LEye_x', 'LEye_y', 'LEye_c', 'REar_x',
       'REar_y', 'REar_c', 'LEar_x', 'LEar_y', 'LEar_c', 'LBigToe_x',
       'LBigToe_y', 'LBigToe_c', 'LSmallToe_x', 'LSmallToe_y', 'LSmallToe_c',
       'LHeel_x', 'LHeel_y', 'LHeel_c', 'RBigToe_x', 'RBigToe_y', 'RBigToe_c',
       'RSmallToe_x', 'RSmallToe_y', 'RSmallToe_c', 'RHeel_x', 'RHeel_y',
       'RHeel_c'],
      dtype

## OpenFace - Face Landmarks and Action Units

Then we extract the facial features from each frame using the <a href="https://github.com/TadasBaltrusaitis/OpenFace">OpenFace library</a>. 

The format of the CSV is explained in this <a href="https://github.com/TadasBaltrusaitis/OpenFace/wiki/Output-Format">webpage</a>.  

In [7]:
faces = pd.read_csv('openFace.csv')
faces.head()

Unnamed: 0.1,Unnamed: 0,frame,face_id,timestamp,confidence,success,gaze_0_x,gaze_0_y,gaze_0_z,gaze_1_x,...,AU12_c,AU14_c,AU15_c,AU17_c,AU20_c,AU23_c,AU25_c,AU26_c,AU28_c,AU45_c
0,0,1,0,0.0,0.98,1,0.381562,0.025194,-0.924,0.287852,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0
1,1,1,1,0.0,0.88,1,-0.244583,-0.097918,-0.964672,-0.378948,...,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0
2,2,1,2,0.0,0.98,1,-0.278326,0.092222,-0.956049,-0.408839,...,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0
3,3,1,3,0.0,0.88,1,-0.509604,-0.072266,-0.857369,-0.600042,...,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0
4,4,2,0,0.067,0.98,1,0.354185,0.053495,-0.933644,0.262728,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0


The most important features extracted are:

gaze_angle_x, gaze_angle_y Eye gaze direction in radians in world coordinates averaged for both eyes and converted into more easy to use format than gaze vectors. If a person is looking left-right this will results in the change of gaze_angle_x (from positive to negative) and, if a person is looking up-down this will result in change of gaze_angle_y (from negative to positive), if a person is looking straight ahead both of the angles will be close to 0 (within measurement error).

pose_Tx, pose_Ty, pose_Tz the location of the head with respect to camera in millimeters (positive Z is away from the camera)

pose_Rx, pose_Ry, pose_Rz Rotation is in radians around X,Y,Z axes with the convention R = Rx * Ry * Rz, left-handed positive sign. This can be seen as pitch (Rx), yaw (Ry), and roll (Rz). The rotation is in world coordinates with camera being the origin.

## Direction of Arrival - DOA

Now we will get from the audio the direction of arrival of the sound.  This is an indicator of who is talking.

Each audio frame is equal to 3 video frames.  A conversion needs to be done to synch the two signals.

Each line in the data correspond to the angle from which the audio was comming.

In [11]:
doa = pd.read_csv('doa.csv')
doa.head()

Unnamed: 0,frame,doa
0,0,[62.]
1,1,[61.]
2,2,[59.]
3,3,[58.]
4,4,[58.]


# Words being said

We also have an automatic transcript of all the words being said.

In the data we have the following columns:

* type:  if it is text or punctuation
* value:  word or puntuation sign
* start: time at which the word started (only for type text)
* end:  time at which the word ended (only for type text)
* confidence: probability that the word is correct


In [10]:
words = pd.read_csv('words.csv')
words.head()

Unnamed: 0,type,value,start,end,confidence
0,text,Physical,0.24,0.72,1.0
1,punct,,0.0,0.0,0.0
2,text,assessments,0.72,1.32,0.95
3,punct,.,0.0,0.0,0.0
4,punct,,0.0,0.0,0.0


## Useful functions to mix features

### Getting the person out of the skeleton position

In [14]:
def identifyPerson(positon_X):
    posX=position_X
    if posX<400:
        return "A"
    if posX>399 and posX<800:
        return "B"
    if posX>799 and posX<1200:
        return "C"
    if posX>1119 and posX<1500:
        return "D"
    if posX>1499:
        return "E"

In [None]:
for skeleton in skeletons.iterrows():
    #print("Frame:", skeleton["frame"],"- Positon:",skeleton["Neck_x"],"- Identity", identifyPerson(skeleton["Neck_x"]))
    print(skeleton["Neck_x"])