# Multimodal Learning Analytics Assignments

As discussed in classroom, you will be given the features extracted from the raw recordings.  It is up to you to create any type of report that help us understand better the collaboration.

Here is the data:

## OpenPose - Skeleton

The video was feed to the <a href="https://github.com/CMU-Perceptual-Computing-Lab/openpose">OpenPose</a> library to extract the skeleton of body landmarks of the participant.

The data was originally in json, but it was converted into a csv file for your convinience.  It only contains the 1000 first video frames (that is almost 67 seconds of the video).

Let's load it:

In [33]:
import pandas as pd

skeletons= pd.read_csv("openPose.csv")
skeletons.columns

Index(['frame', 'Nose_x', 'Nose_y', 'Nose_c', 'Neck_x', 'Neck_y', 'Neck_c',
       'RShoulder_x', 'RShoulder_y', 'RShoulder_c', 'RElbow_x', 'RElbow_y',
       'RElbow_c', 'RWrist_x', 'RWrist_y', 'RWrist_c', 'LShoulder_x',
       'LShoulder_y', 'LShoulder_c', 'LElbow_x', 'LElbow_y', 'LElbow_c',
       'LWrist_x', 'LWrist_y', 'LWrist_c', 'MidHip_x', 'MidHip_y', 'MidHip_c',
       'RHip_x', 'RHip_y', 'RHip_c', 'RKnee_x', 'RKnee_y', 'RKnee_c',
       'RAnkle_x', 'RAnkle_y', 'RAnkle_c', 'LHip_x', 'LHip_y', 'LHip_c',
       'LKnee_x', 'LKnee_y', 'LKnee_c', 'LAnkle_x', 'LAnkle_y', 'LAnkle_c',
       'REye_x', 'REye_y', 'REye_c', 'LEye_x', 'LEye_y', 'LEye_c', 'REar_x',
       'REar_y', 'REar_c', 'LEar_x', 'LEar_y', 'LEar_c', 'LBigToe_x',
       'LBigToe_y', 'LBigToe_c', 'LSmallToe_x', 'LSmallToe_y', 'LSmallToe_c',
       'LHeel_x', 'LHeel_y', 'LHeel_c', 'RBigToe_x', 'RBigToe_y', 'RBigToe_c',
       'RSmallToe_x', 'RSmallToe_y', 'RSmallToe_c', 'RHeel_x', 'RHeel_y',
       'RHeel_c'],
      dtyp

Each row in the datasframe contains the frame number (first column) followed by the position and confidence of each one of the body landmarks.  These body landmarks can be seen in the columns names.  'x' and 'y' are the coordinates of that landmark and 'c' is the confidence of the detection.

In [6]:
skeletons.columns

Index(['frame', 'Nose_x', 'Nose_y', 'Nose_c', 'Neck_x', 'Neck_y', 'Neck_c',
       'RShoulder_x', 'Shoulder_y', 'RShoulder_c', 'RElbow_x', 'RElbow_y',
       'RElbow_c', 'RWrist_x', 'RWrist_y', 'RWrist_c', 'LShoulder_x',
       'LShoulder_y', 'LShoulder_c', 'LElbow_x', 'LElbow_y', 'LElbow_c',
       'LWrist_x', 'LWrist_y', 'LWrist_c', 'MidHip_x', 'MidHip_y', 'MidHip_c',
       'RHip_x', 'RHip_y', 'RHip_c', 'RKnee_x', 'RKnee_y', 'RKnee_c',
       'RAnkle_x', 'RAnkle_y', 'RAnkle_c', 'LHip_x', 'LHip_y', 'LHip_c',
       'LKnee_x', 'LKnee_y', 'LKnee_c', 'LAnkle_x', 'LAnkle_y', 'LAnkle_c',
       'REye_x', 'REye_y', 'REye_c', 'LEye_x', 'LEye_y', 'LEye_c', 'REar_x',
       'REar_y', 'REar_c', 'LEar_x', 'LEar_y', 'LEar_c', 'LBigToe_x',
       'LBigToe_y', 'LBigToe_c', 'LSmallToe_x', 'LSmallToe_y', 'LSmallToe_c',
       'LHeel_x', 'LHeel_y', 'LHeel_c', 'RBigToe_x', 'RBigToe_y', 'RBigToe_c',
       'RSmallToe_x', 'RSmallToe_y', 'RSmallToe_c', 'RHeel_x', 'RHeel_y',
       'RHeel_c'],
      dtype

## OpenFace - Face Landmarks and Action Units

Then we extract the facial features from each frame using the <a href="https://github.com/TadasBaltrusaitis/OpenFace">OpenFace library</a>. 

The format of the CSV is explained in this <a href="https://github.com/TadasBaltrusaitis/OpenFace/wiki/Output-Format">webpage</a>.  

In [7]:
faces = pd.read_csv('openFace.csv')
faces.head()

Unnamed: 0.1,Unnamed: 0,frame,face_id,timestamp,confidence,success,gaze_0_x,gaze_0_y,gaze_0_z,gaze_1_x,...,AU12_c,AU14_c,AU15_c,AU17_c,AU20_c,AU23_c,AU25_c,AU26_c,AU28_c,AU45_c
0,0,1,0,0.0,0.98,1,0.381562,0.025194,-0.924,0.287852,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0
1,1,1,1,0.0,0.88,1,-0.244583,-0.097918,-0.964672,-0.378948,...,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0
2,2,1,2,0.0,0.98,1,-0.278326,0.092222,-0.956049,-0.408839,...,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0
3,3,1,3,0.0,0.88,1,-0.509604,-0.072266,-0.857369,-0.600042,...,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0
4,4,2,0,0.067,0.98,1,0.354185,0.053495,-0.933644,0.262728,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0


The most important features extracted are:

gaze_angle_x, gaze_angle_y Eye gaze direction in radians in world coordinates averaged for both eyes and converted into more easy to use format than gaze vectors. If a person is looking left-right this will results in the change of gaze_angle_x (from positive to negative) and, if a person is looking up-down this will result in change of gaze_angle_y (from negative to positive), if a person is looking straight ahead both of the angles will be close to 0 (within measurement error).

pose_Tx, pose_Ty, pose_Tz the location of the head with respect to camera in millimeters (positive Z is away from the camera)

pose_Rx, pose_Ry, pose_Rz Rotation is in radians around X,Y,Z axes with the convention R = Rx * Ry * Rz, left-handed positive sign. This can be seen as pitch (Rx), yaw (Ry), and roll (Rz). The rotation is in world coordinates with camera being the origin.

## Direction of Arrival - DOA

Now we will get from the audio the direction of arrival of the sound.  This is an indicator of who is talking.

Each audio frame is equal to 3 video frames.  A conversion needs to be done to synch the two signals.

Each line in the data correspond to the angle from which the audio was comming.

In [11]:
doa = pd.read_csv('doa.csv')
doa.head()

Unnamed: 0,frame,doa
0,0,[62.]
1,1,[61.]
2,2,[59.]
3,3,[58.]
4,4,[58.]


# Words being said

We also have an automatic transcript of all the words being said.

In the data we have the following columns:

* type:  if it is text or punctuation
* value:  word or puntuation sign
* start: time at which the word started (only for type text)
* end:  time at which the word ended (only for type text)
* confidence: probability that the word is correct


In [10]:
words = pd.read_csv('words.csv')
words.head()

Unnamed: 0,type,value,start,end,confidence
0,text,Physical,0.24,0.72,1.0
1,punct,,0.0,0.0,0.0
2,text,assessments,0.72,1.32,0.95
3,punct,.,0.0,0.0,0.0
4,punct,,0.0,0.0,0.0


## Useful functions to mix features

### Getting the person out of the skeleton position

Based on the position of the center of the person (for example their neck), we classify it as person A, B, C, D or E corresponding to the 5 persons present in the video.

In [27]:
def identifyPerson(position_X):
    posX=position_X
    if posX<400:
        return "A"
    if posX>399 and posX<800:
        return "B"
    if posX>799 and posX<1200:
        return "C"
    if posX>1119 and posX<1500:
        return "D"
    if posX>1499:
        return "E"

Example: 

In [34]:
skeletons["Person"]=skeletons.apply(lambda x: identifyPerson(x["Neck_x"]),axis=1)
skeletons.head()

Unnamed: 0,frame,Nose_x,Nose_y,Nose_c,Neck_x,Neck_y,Neck_c,RShoulder_x,RShoulder_y,RShoulder_c,...,RBigToe_x,RBigToe_y,RBigToe_c,RSmallToe_x,RSmallToe_y,RSmallToe_c,RHeel_x,RHeel_y,RHeel_c,Person
0,1,1380.12,237.946,0.842359,1378.95,320.47,0.628099,1324.67,321.617,0.474629,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,D
1,1,636.008,222.131,0.818794,662.023,322.781,0.326987,573.81,328.43,0.131443,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,B
2,1,1658.23,240.215,0.60757,1657.12,288.842,0.491983,1602.81,291.126,0.446046,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,E
3,2,1380.12,237.972,0.868445,1377.8,321.625,0.675298,1324.65,321.663,0.54846,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,D
4,2,1098.49,249.242,0.841802,1134.67,320.484,0.661226,1073.62,325.023,0.497669,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,C


## Elimante background people

We use the distance between the shoulders to elimante small persons in the background.

In [37]:
import math

def isInterestingPerson(RShoulder_x,RShoulder_y,LShoulder_x,LShoulder_y):
    RSX=RShoulder_x
    RSY=RShoulder_y
    LSX=LShoulder_x
    LSY=LShoulder_y
    distShoulders=math.sqrt(math.pow(RSX - LSX, 2) + math.pow(RSY - LSY, 2))
    if distShoulders>60:
        return True
    else:
        return False

In [38]:
skeletons["Interesting"]=skeletons.apply(lambda x: isInterestingPerson(x["RShoulder_x"],x["RShoulder_y"],x["LShoulder_x"],x["LShoulder_y"]),axis=1)
skeletons.head()

Unnamed: 0,frame,Nose_x,Nose_y,Nose_c,Neck_x,Neck_y,Neck_c,RShoulder_x,RShoulder_y,RShoulder_c,...,RBigToe_y,RBigToe_c,RSmallToe_x,RSmallToe_y,RSmallToe_c,RHeel_x,RHeel_y,RHeel_c,Person,Interesting
0,1,1380.12,237.946,0.842359,1378.95,320.47,0.628099,1324.67,321.617,0.474629,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,D,True
1,1,636.008,222.131,0.818794,662.023,322.781,0.326987,573.81,328.43,0.131443,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,B,True
2,1,1658.23,240.215,0.60757,1657.12,288.842,0.491983,1602.81,291.126,0.446046,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,E,True
3,2,1380.12,237.972,0.868445,1377.8,321.625,0.675298,1324.65,321.663,0.54846,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,D,True
4,2,1098.49,249.242,0.841802,1134.67,320.484,0.661226,1073.62,325.023,0.497669,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,C,True


## Conversion from position to angle 

To convert from the positon to the angle at which a person is, we use this function:


In [39]:
def anglePerson(posX):
    return math.degrees((posX/1935)*(2*math.pi))

In [40]:
skeletons["Angle"]=skeletons.apply(lambda x: anglePerson(x["Neck_x"]),axis=1)
skeletons.head()

Unnamed: 0,frame,Nose_x,Nose_y,Nose_c,Neck_x,Neck_y,Neck_c,RShoulder_x,RShoulder_y,RShoulder_c,...,RBigToe_c,RSmallToe_x,RSmallToe_y,RSmallToe_c,RHeel_x,RHeel_y,RHeel_c,Person,Interesting,Angle
0,1,1380.12,237.946,0.842359,1378.95,320.47,0.628099,1324.67,321.617,0.474629,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,D,True,256.548837
1,1,636.008,222.131,0.818794,662.023,322.781,0.326987,573.81,328.43,0.131443,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,B,True,123.16707
2,1,1658.23,240.215,0.60757,1657.12,288.842,0.491983,1602.81,291.126,0.446046,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,E,True,308.301395
3,2,1380.12,237.972,0.868445,1377.8,321.625,0.675298,1324.65,321.663,0.54846,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,D,True,256.334884
4,2,1098.49,249.242,0.841802,1134.67,320.484,0.661226,1073.62,325.023,0.497669,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,C,True,211.101395


## Rough gaze direction from skeleton

In [47]:
def gaze(person):
    NoX=person["Nose_x"]
    NoY=person["Nose_y"]
    NeX=person["Neck_x"]
    NeY=person["Neck_y"]
    RSX=person["RShoulder_x"]
    RSY=person["RShoulder_y"]
    LSX=person["LShoulder_x"]
    LSY=person["LShoulder_y"]
    distShoulders=math.sqrt(math.pow(RSX - LSX, 2) + math.pow(RSY - LSY, 2))
    distanceNN=math.sqrt(math.pow(NeX - NoX, 2) + math.pow(NeY - NoY, 2))
    distanceX=NoX-NeX
    if distShoulders>0:
        ratio=distanceX/(distShoulders/2)
        if ratio<-1:
            ratio=-1
        if ratio>1:
            ratio=1
        return math.degrees(math.asin(ratio))
    else:
        return None

In [48]:
skeletons["Gaze"]=skeletons.apply(lambda x: gaze(x),axis=1)
skeletons.head()

Unnamed: 0,frame,Nose_x,Nose_y,Nose_c,Neck_x,Neck_y,Neck_c,RShoulder_x,RShoulder_y,RShoulder_c,...,RSmallToe_x,RSmallToe_y,RSmallToe_c,RHeel_x,RHeel_y,RHeel_c,Person,Interesting,Angle,Gaze
0,1,1380.12,237.946,0.842359,1378.95,320.47,0.628099,1324.67,321.617,0.474629,...,0.0,0.0,0.0,0.0,0.0,0.0,D,True,256.548837,1.220548
1,1,636.008,222.131,0.818794,662.023,322.781,0.326987,573.81,328.43,0.131443,...,0.0,0.0,0.0,0.0,0.0,0.0,B,True,123.16707,-17.614787
2,1,1658.23,240.215,0.60757,1657.12,288.842,0.491983,1602.81,291.126,0.446046,...,0.0,0.0,0.0,0.0,0.0,0.0,E,True,308.301395,1.208299
3,2,1380.12,237.972,0.868445,1377.8,321.625,0.675298,1324.65,321.663,0.54846,...,0.0,0.0,0.0,0.0,0.0,0.0,D,True,256.334884,2.448429
4,2,1098.49,249.242,0.841802,1134.67,320.484,0.661226,1073.62,325.023,0.497669,...,0.0,0.0,0.0,0.0,0.0,0.0,C,True,211.101395,-36.121693


## Person that a person is looking

In [49]:
skeletons["LookingAt"]=(skeletons["Angle"]+skeletons["Gaze"]+180) % 360
skeletons.head()

Unnamed: 0,frame,Nose_x,Nose_y,Nose_c,Neck_x,Neck_y,Neck_c,RShoulder_x,RShoulder_y,RShoulder_c,...,RSmallToe_y,RSmallToe_c,RHeel_x,RHeel_y,RHeel_c,Person,Interesting,Angle,Gaze,LookingAt
0,1,1380.12,237.946,0.842359,1378.95,320.47,0.628099,1324.67,321.617,0.474629,...,0.0,0.0,0.0,0.0,0.0,D,True,256.548837,1.220548,77.769385
1,1,636.008,222.131,0.818794,662.023,322.781,0.326987,573.81,328.43,0.131443,...,0.0,0.0,0.0,0.0,0.0,B,True,123.16707,-17.614787,285.552283
2,1,1658.23,240.215,0.60757,1657.12,288.842,0.491983,1602.81,291.126,0.446046,...,0.0,0.0,0.0,0.0,0.0,E,True,308.301395,1.208299,129.509695
3,2,1380.12,237.972,0.868445,1377.8,321.625,0.675298,1324.65,321.663,0.54846,...,0.0,0.0,0.0,0.0,0.0,D,True,256.334884,2.448429,78.783313
4,2,1098.49,249.242,0.841802,1134.67,320.484,0.661226,1073.62,325.023,0.497669,...,0.0,0.0,0.0,0.0,0.0,C,True,211.101395,-36.121693,354.979702


In [64]:
import statistics
print(statistics.mean(skeletons[skeletons["Person"]=="E"]["Angle"]))

304.0404416801875


In [67]:
def degreesToPerson(degrees):
    if degrees<50:
        return "A"
    if degrees>49 and degrees<160:
        return "B"
    if degrees>159 and degrees<230:
        return "C"
    if degrees>229 and degrees<280:
        return "D"
    if degrees>279:
        return "E"

In [68]:
skeletons["LookingAtPerson"]=skeletons.apply(lambda x: degreesToPerson(x['LookingAt']),axis=1)
skeletons.head()

Unnamed: 0,frame,Nose_x,Nose_y,Nose_c,Neck_x,Neck_y,Neck_c,RShoulder_x,RShoulder_y,RShoulder_c,...,RSmallToe_c,RHeel_x,RHeel_y,RHeel_c,Person,Interesting,Angle,Gaze,LookingAt,LookingAtPerson
0,1,1380.12,237.946,0.842359,1378.95,320.47,0.628099,1324.67,321.617,0.474629,...,0.0,0.0,0.0,0.0,D,True,256.548837,1.220548,77.769385,B
1,1,636.008,222.131,0.818794,662.023,322.781,0.326987,573.81,328.43,0.131443,...,0.0,0.0,0.0,0.0,B,True,123.16707,-17.614787,285.552283,E
2,1,1658.23,240.215,0.60757,1657.12,288.842,0.491983,1602.81,291.126,0.446046,...,0.0,0.0,0.0,0.0,E,True,308.301395,1.208299,129.509695,B
3,2,1380.12,237.972,0.868445,1377.8,321.625,0.675298,1324.65,321.663,0.54846,...,0.0,0.0,0.0,0.0,D,True,256.334884,2.448429,78.783313,B
4,2,1098.49,249.242,0.841802,1134.67,320.484,0.661226,1073.62,325.023,0.497669,...,0.0,0.0,0.0,0.0,C,True,211.101395,-36.121693,354.979702,E


In [71]:
pd.crosstab(skeletons["Person"],skeletons["LookingAtPerson"])

LookingAtPerson,A,B,C,D,E
Person,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,0,60,887,29,6
B,1,0,117,131,744
C,394,7,0,0,598
D,22,980,0,0,0
E,0,910,107,15,0


# Words per person

In [72]:
def timeToAudioFrame(time):
    frame=round(time*5)
    return frame

In [75]:
words["startFrame"]=words.apply(lambda x: timeToAudioFrame(x["start"]),axis=1)
words["endFrame"]=words.apply(lambda x: timeToAudioFrame(x["end"]),axis=1)
words.head()

Unnamed: 0,type,value,start,end,confidence,startFrame,endFrame
0,text,Physical,0.24,0.72,1.0,1,4
1,punct,,0.0,0.0,0.0,0,0
2,text,assessments,0.72,1.32,0.95,4,7
3,punct,.,0.0,0.0,0.0,0,0
4,punct,,0.0,0.0,0.0,0,0


In [110]:
import statistics
def framesToAngle(start,end):
    angles=doa[start:end+1]
    addition=0
    count=0
    for angle in angles["doa"]:
        count=count+1
        doaValue= angle.replace('[','').replace(']','').replace('.','')
        doaValue=int(doaValue)
        addition=addition+doaValue
    return addition/count

In [120]:
words["Person"]=words.apply(lambda x: degreesToPerson(framesToAngle(x["startFrame"],x["endFrame"])),axis=1)
words.head(20)

Unnamed: 0,type,value,start,end,confidence,startFrame,endFrame,Person
0,text,Physical,0.24,0.72,1.0,1,4,B
1,punct,,0.0,0.0,0.0,0,0,B
2,text,assessments,0.72,1.32,0.95,4,7,C
3,punct,.,0.0,0.0,0.0,0,0,B
4,punct,,0.0,0.0,0.0,0,0,B
5,text,Okay,2.34,2.73,0.99,12,14,B
6,punct,.,0.0,0.0,0.0,0,0,B
7,punct,,0.0,0.0,0.0,0,0,B
8,text,So,2.82,2.94,1.0,14,15,B
9,punct,,0.0,0.0,0.0,0,0,B


## Sentences per Person

In [125]:
sentence=""
persons=[]
for index, uterance in words.iterrows():
    if uterance["type"]=="text":
        sentence=sentence+uterance["value"]+" "
        persons.append(uterance["Person"])
    else:
        if uterance["value"]=="." or uterance["value"]=="?":
            sentence=sentence+uterance["value"]
            person=max(set(persons), key=persons.count)
            print("Person: ",person,"said",sentence)
            sentence=""
            persons=[]
    

Person:  B said Physical assessments .
Person:  B said Okay .
Person:  B said So what did we say ?
Person:  B said High school teachers ?
Person:  B said Yeah it sounded like high school cause it was like Oh there's some college thing when there wasn't it college readiness or something .
Person:  D said But that's still K-12 K-12 probably high school kind of thing .
Person:  C said Great .
Person:  B said Was that all good ?
Person:  B said I said credits plus all the basic to complicate it was so bad because all the pages were kind of consistent .
Person:  B said So you would have a tab and then the it within the time .
Person:  E said Yeah .
Person:  D said But it would be very difficult to find something if you want to do that .
Person:  D said How would you actually know where that information was in those times ?
Person:  B said I thought it was very fair .
Person:  C said Yeah .
Person:  C said And then also there's like you can remove columns or add columns but does the teacher 

Person:  B said You have to get the full picture .
Person:  C said It's like an extra layer .
Person:  C said They'd have to get that from some kind of yeah of course .
Person:  E said Feedback I guess really just to analyze it is part of it .
Person:  E said And then if you see the students flipping then I feel that's more of an emotional or you know somebody needing to address this to themselves and I don't think that could really be done digital necessarily .
Person:  D said So like the first step what I put you to see someone in great .
Person:  D said So what would you do ?
Person:  C said Two one in red assuming they would go ahead and talk worth they log in to that .
Person:  C said It depends on what red news .
Person:  C said Yeah .
Person:  C said Like you said there's no yellow like no like there's no yellow .
Person:  C said It's very fine .
Person:  D said who is an interesting um how could you improve it ?
Person:  B said Yeah .
Person:  E said Yeah .
Person:  B said So l