In [91]:
# data xplore

# MuSe-Stress and MuSe-Physio dataset

In the Multimodal Emotional Stress sub-challenge (MuSe-Stress), valence and arousal are predicted, from people in stressed dispositions, motivated by the high level of stress many people face in modern societies.

Given the increasing availability of low-resource equipment (e. g., smart-watches) able to record biological signals to track wellbeing, we propose the Multimodal Physiological-Arousal sub-challenge (MuSe-Physio ).

The arousal annotations from humans are fused (using RAAW) with galvanic skin response (also known as Electrodermal Activity (EDA))
signals for predicting physiological-arousal.

Both are set up as regression tasks offering additional biological signals (e. g., heart rate, and respiration) for modelling.


## Feature segments

- Acoustic
    - eGeMAPS 
        - The prevalent open-source openSMILE toolkit is used to extract the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) 
    - DeepSpectrum
        - The prime function of DeepSpectrum is to utilise the spectral features acquired from speech instances within a pre-trained image recognition Convolutional Neural Networks (CNNs)
    - VGGish
        - In addition, we extract VGGish functions pretrained on an extensive YouTube audio dataset (AudioSet).

- Vision
    - VGGFace
        - VGGface (version 1) is aimed at the extraction of general facial features for images obtained by MTCNNin cropped versions.
        - The visual geometry group of Oxford introduced the deep CNN referred to as VGG16.
    - OpenFace (fau_intensity)
        - OpenFace is a Python and Torch implementation of face recognition with deep neural networks.
        - The OpenFace toolkit is used to extract facial features from images.
        - FAU (Facial Action Units) intensity is a measure of the intensity of facial expressions.
    - Xception
        - Xception is a deep learning model that is pre-trained on the ImageNet dataset.
        - The Xception model is used to extract features from images.
- Language
    - BERT
        - BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based deep learning model.
        - The BERT model is used to extract features from text.
        - Our features are the sum of the last four BERT layers resulting in a 768 dimensional feature vector

- Physiological
    - BPM
        - The BPM (Beats Per Minute) is a measure of the heart rate.
    - ECG
        - The ECG (Electrocardiogram) is a measure of the electrical activity of the heart.
    - resp
        - The respiration rate is a measure of the number of breaths per minute.


In [92]:
'''
No primeiro trabalho vamo usar os dados physiological (BPM, ECG e Resp)
'''

'\nNo primeiro trabalho vamo usar os dados physiological (BPM, ECG e Resp)\n'

In [93]:
import pandas as pd
import plotly.graph_objects as go

subject = 1
window = 100

df_bpm = pd.read_csv(f'c3_muse_stress/feature_segments/BPM/{subject}.csv')

df_bpm['timestamp'] = pd.to_datetime(df_bpm['timestamp'], unit='ms')
# moving average
df_bpm['BPM_MA'] = df_bpm['BPM'].rolling(window=window).mean()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df_bpm['timestamp'], y=df_bpm['BPM'], mode='lines', name='BPM'))
fig.add_trace(go.Scatter(x=df_bpm['timestamp'], y=df_bpm['BPM_MA'], mode='lines+markers', name='BPM_MA'))
fig.show()

In [94]:
df_ecg = pd.read_csv(f'c3_muse_stress/feature_segments/ECG/{subject}.csv')
df_ecg['timestamp'] = pd.to_datetime(df_ecg['timestamp'], unit='ms')

df_ecg['ECG_MA'] = df_ecg['ECG'].rolling(window=window).mean()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df_ecg['timestamp'], y=df_ecg['ECG'], mode='lines', name='ECG'))
fig.add_trace(go.Scatter(x=df_ecg['timestamp'], y=df_ecg['ECG_MA'], mode='lines+markers', name='ECG_MA'))
fig.show()

In [95]:
df_resp = pd.read_csv(f'c3_muse_stress/feature_segments/resp/{subject}.csv')
df_resp['timestamp'] = pd.to_datetime(df_resp['timestamp'], unit='ms')

df_resp['resp_MA'] = df_resp['resp'].rolling(window=window).mean()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df_resp['timestamp'], y=df_resp['resp'], mode='lines', name='resp'))
fig.add_trace(go.Scatter(x=df_resp['timestamp'], y=df_resp['resp_MA'], mode='lines+markers', name='resp_MA'))
fig.show()

In [96]:
# stress ground truth 

df_arousal = pd.read_csv(f'c3_muse_stress/label_segments/arousal/{subject}.csv')
df_valence = pd.read_csv(f'c3_muse_stress/label_segments/valence/{subject}.csv')

fig = go.Figure()
fig.add_trace(go.Scatter(x=df_arousal['timestamp'], y=df_arousal['value'], mode='lines', name='arousal'))
fig.add_trace(go.Scatter(x=df_valence['timestamp'], y=df_valence['value'], mode='lines', name='valence'))
fig.show()

# Arousal and Valence
The relationship between arousal and valence in the context of emotions can be visualized using the Circumplex Model of Affect, which is a widely accepted framework in psychology. This model represents emotions on a two-dimensional plane where arousal and valence are the two axes.

- High Arousal, High Valence: Positive, energetic emotions (e.g., joy, excitement).
- High Arousal, Low Valence: Negative, energetic emotions (e.g., anxiety, stress).
- Low Arousal, High Valence: Positive, calm emotions (e.g., relaxation, contentment).
- Low Arousal, Low Valence: Negative, calm emotions (e.g., sadness, boredom).

In [97]:
df_partitions = pd.read_csv("c3_muse_stress/metadata/partition.csv")

In [98]:
df_ecg

Unnamed: 0,timestamp,segment_id,ECG,ECG_MA
0,1970-01-01 00:00:00.500,1,-0.066748,
1,1970-01-01 00:00:01.000,1,0.015070,
2,1970-01-01 00:00:01.500,1,-0.071821,
3,1970-01-01 00:00:02.000,1,0.011130,
4,1970-01-01 00:00:02.500,1,0.017750,
...,...,...,...,...
593,1970-01-01 00:04:57.000,47,-0.062233,0.003488
594,1970-01-01 00:04:57.500,47,-0.006452,0.003604
595,1970-01-01 00:04:58.000,47,-0.041463,0.003390
596,1970-01-01 00:04:58.500,47,0.054703,0.003844


Machine Learning training description:

We have the df_partitions dataframe which contains the following columns:
- "Id": the unique identifier of the subject
- "Proposal": is either "train" or "devel" or "test"
We will use the train Ids to train the model and the devel Ids to validate the model. test will be left for the final submission.

For each Id, we have the following dataframes we can load from csv:
- df_bpm: contains the BPM data
    - columns: "timestamp", "BPM"
- df_ecg: contains the ECG data
    - columns: "timestamp", "ECG"
- df_resp: contains the respiration data
    - columns: "timestamp", "resp"
- df_arousal: contains the arousal data
    - columns: "timestamp", "value"
- df_valence: contains the valence data
    - columns: "timestamp", "value"

I want to use the physiological data to predict the arousal and valence values.
I want to use pytorch to build a model that can predict the arousal and valence values.

In [99]:
df = pd.read_csv(f'c3_muse_stress/feature_segments/fau_intensity/{subject}.csv')
df

Unnamed: 0,timestamp,segment_id,AU01_r,AU02_r,AU04_r,AU05_r,AU06_r,AU07_r,AU09_r,AU10_r,AU12_r,AU14_r,AU15_r,AU17_r,AU20_r,AU23_r,AU25_r,AU26_r,AU45_r
0,0,1,0.096,0.000,0.114,0.000,0.000,0.170,0.0,0.000,0.022,0.000,0.000,0.000,0.000,0.0,0.000,0.070,0.102
1,500,1,0.016,0.000,0.020,0.000,0.002,0.000,0.0,0.164,0.108,0.050,0.000,0.000,0.000,0.0,0.232,0.108,0.000
2,1000,1,0.244,0.016,0.000,0.000,0.000,0.118,0.0,0.042,0.078,0.000,0.044,0.000,0.000,0.0,0.000,0.068,0.000
3,1500,1,0.258,0.074,0.000,0.094,0.000,0.230,0.0,0.000,0.106,0.000,0.000,0.000,0.000,0.0,0.000,0.184,0.000
4,2000,1,0.182,0.000,0.000,0.010,0.090,0.042,0.0,0.046,0.128,0.000,0.000,0.000,0.000,0.0,0.110,0.000,0.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
593,296500,47,0.000,0.000,0.000,0.000,0.000,0.000,0.0,0.000,0.000,0.000,0.040,0.000,0.000,0.0,0.000,0.000,0.008
594,297000,47,0.038,0.000,0.000,0.052,0.000,0.000,0.0,0.000,0.000,0.034,0.000,0.000,0.000,0.0,0.028,0.064,0.000
595,297500,47,0.000,0.000,0.000,0.000,0.000,0.000,0.0,0.000,0.030,0.000,0.000,0.000,0.000,0.0,0.166,0.080,0.240
596,298000,47,0.000,0.000,0.000,0.000,0.000,0.070,0.0,0.000,0.002,0.072,0.064,0.000,0.000,0.0,0.000,0.000,0.168


In [100]:
df.columns

Index(['timestamp', 'segment_id', 'AU01_r', 'AU02_r', 'AU04_r', 'AU05_r',
       'AU06_r', 'AU07_r', 'AU09_r', 'AU10_r', 'AU12_r', 'AU14_r', 'AU15_r',
       'AU17_r', 'AU20_r', 'AU23_r', 'AU25_r', 'AU26_r', 'AU45_r'],
      dtype='object')