# Multi-Modal Data Fusion - Project Work: Multi-Modal Physical Exercise Classification


In this project, real multi-modal data is studied by utilizing different techniques presented during the course. In addition, there is an optional task to try some different approaches to identify persons from the same dataset. Open MEx dataset from UCI machine learning repository is used. Idea is to apply different techniques to recognize physical exercises from wearable sensors and depth camera, user-independently.

## Author(s)
Add your information here

Name(s):
Aleksander Madajczak,
Jan Fabian

Student number(s):
2207367
2207371


## Description

The goal of this project is to develop user-independent pre-processing and classification models to recognize 7 different physical exercises measured by accelerometer (attached to subject's thigh) and depth camera (above the subject facing downwards recording an aerial view). All the exercises were performed subject lying down on the mat. Original dataset have also another acceleration sensor and pressure-sensitive mat, but those two modalities are ommited in this project. There are totally 30 subjects in the original dataset, and in this work subset of 10 person is utilized. Detailed description of the dataset and original data can be access in [MEx dataset @ UCI machine learning repository](https://archive.ics.uci.edu/ml/datasets/MEx#). We are providing the subset of dataset in Moodle.

The project work is divided on following phases:

1. Data preparation, exploration, and visualization
2. Feature extraction and unimodal fusion for classification
3. Feature extraction and feature-level fusion for multimodal classification
4. Decision-level fusion for multimodal classification
5. Bonus task: Multimodal biometric identification of persons

where 1-4 are compulsory (max. 10 points each), and 5 is optional to get bonus points (max. 5+5 points). In each phase, you should visualize and analyse the results and document the work and findings properly by text blocks and figures between the code. <b> Nice looking </b> and <b> informative </b> notebook representing your results and analysis will be part of the grading in addition to actual implementation.

The results are validated using confusion matrices and F1 scores. F1 macro score is given as
<br>
<br>
$
\begin{equation}
F1_{macro} = \frac{1}{N} \sum_i^N F1_i,
\end{equation}
$
<br>
<br>
where $F1_i = 2  \frac{precision_i * recall_i}{precision_i + recall_i}$, and $N$ is the number of labels.
<br>

## Learning goals

After the project work, you should

- be able to study real world multi-modal data
- be able to apply different data fusion techniques to real-world problem
- be able to evaluate the results
- be able to analyse the outcome
- be able to document your work properly

## Relevant lectures

Lectures 1-8

## Relevant exercises

Exercises 0-6

## Relevant chapters in course book

Chapter 1-14

## Additional Material

* Original dataset [MEx dataset @ UCI machine learning repository](https://archive.ics.uci.edu/ml/datasets/MEx#)
* Related scientific article [MEx: Multi-modal Exercises Dataset for Human Activity Recognition](https://arxiv.org/pdf/1908.08992.pdf)

<a id='task1'></a>
<div class=" alert alert-warning">
    <b>Assigment.</b> <b>Task 1.</b>

Download data from the Moodle's Project section. Get yourself familiar with the folder structure and data. You can read the data files using the function given below. Each file consists one exercise type performed by single user. Data are divided on multiple folders. Note that, in each folder there is one long sequence of single exercise, except exercise 4 which is performed two times in different ways. Those two sequences belongs to same class. Do the following subtasks to pre-analyse data examples and to prepare the training and testing data for next tasks:
<br>
<br>
<p> Read raw data from the files. Prepare and divide each data file to shorter sequences using windowing method. Similar to related article "MEx: Multi-modal Exercises Dataset for Human Activity Recognition", use 5 second window and 3 second overlapping between windows, producing several example sequences from one exercise file for classification purposes. Windowing is working so that starting from the beginning of each long exercise sequence, take 5 seconds of data points (from synchronized acceleration data and depth images) based on the time stamps. Next, move the window 2 seconds forward and take another 5 seconds of data. Then continue this until your are at the end of sequence. Each window will consists 500x3 matrix of acceleration data and 5x192 matrix of depth image data.</p>
<br>
<p> <b>1.1</b> Plot few examples of prepared data for each modalities (accelometer and depth camera). Plot acceleration sensor as multi-dimensional time-series and depth camera data as 2D image. Plot 5 second acceleration sensor and depth image sequences of person 1 and 5 performing exercises 2, 5, and 6. Take the first windowed example from the long exercise sequence. </p>
<br>
<p> <b>1.2</b> Split the prepared dataset to training and testing datasets so that data of persons 1-7 are used for training and data of persons 8-10 are used for testing. In next tasks, training dataset could be further divided on (multiple) validation data folds to tune the models parameters, when needed.<br>

<p> Note: Training set should have 1486 windows and testing set should have 598 windows. In training set, acceleration data will have a window without a pair with depth camera data, that window should be dropped as it doesn't have a pair.<p>




Document your work, calculate the indicator statistics of training and testing datasets (number of examples, dimensions of each example) and visualize prepared examples.

</div>

In [None]:
# Import relevant libraries here
from pathlib import Path
from copy import deepcopy
import numpy as np
import pandas as pd

# Enter data folder location
loc = "./MEx"

In [None]:
# Find, read, and compose the measurements
from utilities.fun_one import path_to_meta
paths_record = Path(loc).glob("*/*/*.csv")

records = []

for path_record in paths_record:
    df = pd.read_csv(path_record, delimiter=",", header=None)
    meta = path_to_meta(path_record)

    if meta["sensor"] == "acc":
        col_names = ["time", "acc_0", "acc_1", "acc_2"]
        df.columns = col_names
    else:
        num_cols = df.shape[1]
        col_names = ["time", ] + [f"dc_{i}" for i in range(num_cols-1)]
        df.columns = col_names

    meta["df"] = df

    records.append(meta)

df_records = pd.DataFrame.from_records(records)

print(f"Total records found: {len(df_records)}")
print("Dataframe with all records:")
display(df_records.head())
print("Dataframe with one measurement series:")
display(df_records["df"].iloc[0].head())

In [None]:
# Extract 5-second long windows with 2-second shift (3-second overlap)

records_windowed = []

time_window = 5000.
time_offset = 2000.

for row_idx, row_data in df_records.iterrows():
    df_tmp = row_data["df"]
    time_start = np.min(df_tmp["time"].to_numpy())
    time_end = np.max(df_tmp["time"].to_numpy())

    for window_idx, t0 in enumerate(np.arange(time_start, time_end, time_offset)):
        t1 = t0 + time_window
        # Handle boundary conditions - skip the measurements from the end shorter than window size
        if t1 > time_end:
            continue

        tmp_data = deepcopy(row_data)
        tmp_data["window_idx"] = window_idx
        tmp_data["df"] = df_tmp[(df_tmp["time"] >= t0) &
                                (df_tmp["time"] < t1)].copy()

        records_windowed.append(tmp_data)

df_records_windowed = pd.DataFrame.from_records(records_windowed)

print(f"Total windows extracted: {len(df_records_windowed)}")
print("Dataframe with all windowed records:")
display(df_records_windowed.head())
print("Dataframe with one windowed measurement series:")
display(df_records_windowed["df"].iloc[0].head())
%store df_records_windowed

## 1.1. Visualize selected samples for both modalities


In [None]:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import importlib, utilities
importlib.reload(utilities)
from utilities.fun_one import visualize, visualize_acceleration, visualize_depth, visualize_depth_series

In [None]:
# Visualisation - Acceleration:
%matplotlib inline
print("Acceleration, Person 1, Exercise 2:")
visualize_acceleration(df_records_windowed,1,2)
print("Acceleration, Person 1, Exercise 5:")
visualize_acceleration(df_records_windowed,1,5)
print("Acceleration, Person 1, Exercise 6:")
visualize_acceleration(df_records_windowed,1,6)
print("Acceleration, Person 5, Exercise 2:")
visualize_acceleration(df_records_windowed,5,2)
print("Acceleration, Person 5, Exercise 5:")
visualize_acceleration(df_records_windowed,5,5)
print("Acceleration, Person 5, Exercise 6:")
visualize_acceleration(df_records_windowed,5,6)

In [None]:
# Visualisation - Depth:
%matplotlib inline
print("Visualisation Depth Series, First Window:")
print("Person 1, Exercise 2:")
visualize_depth_series(df_records_windowed,1,2)
print("Person 1, Exercise 5:")
visualize_depth_series(df_records_windowed,1,5)
print("Person 1, Exercise 6:")
visualize_depth_series(df_records_windowed,1,6)
print("Person 5, Exercise 2:")
visualize_depth_series(df_records_windowed,5,2)
print("Person 5, Exercise 5:")
visualize_depth_series(df_records_windowed,5,5)
print("Person 5, Exercise 6:")
visualize_depth_series(df_records_windowed,5,6)

In [None]:
print("Person 1, Exercise 2:")
%matplotlib notebook
visualize_depth(df_records_windowed,1,2,None)

In [None]:
print("Person 1, Exercise 5:")
%matplotlib notebook
visualize_depth(df_records_windowed,1,5,None)

In [None]:
print("Person 1, Exercise 6:")
%matplotlib notebook
visualize_depth(df_records_windowed,1,6,None)

In [None]:
print("Person 5, Exercise 2:")
%matplotlib notebook
visualize_depth(df_records_windowed,5,2,None)

In [None]:
print("Person 5, Exercise 5:")
%matplotlib notebook
visualize_depth(df_records_windowed,5,5,None)

In [None]:
print("Person 5, Exercise 6:")
%matplotlib notebook
visualize_depth(df_records_windowed,5,6,None)

## 1.2. Split samples based on subject ID into training and testing datasets for futher experiments

In [None]:
%matplotlib inline
import importlib
importlib.reload(utilities.fun_one)
from utilities.fun_one import stringify_id
# 1.2. Split samples based on subject ID into training and testing datasets for futher experiments
print("Splitting data into train ( persons 1-7 ) and test set (persons 8-10)")

#Create training and testing set by combining chosen subject data:
training_records = df_records_windowed[(df_records_windowed.subject_id == stringify_id(1)) |
                                       (df_records_windowed.subject_id == stringify_id(2)) |
                                       (df_records_windowed.subject_id == stringify_id(3)) |
                                       (df_records_windowed.subject_id == stringify_id(4)) |
                                       (df_records_windowed.subject_id == stringify_id(5)) |
                                       (df_records_windowed.subject_id == stringify_id(6)) |
                                       (df_records_windowed.subject_id == stringify_id(7))]
testing_records = df_records_windowed[(df_records_windowed.subject_id == stringify_id(8)) |
                                      (df_records_windowed.subject_id == stringify_id(9)) |
                                      (df_records_windowed.subject_id == stringify_id(10))]

# Drop one row from training set which does not have a pair of sensor readings:
training_records = training_records.drop(training_records.index[(training_records.subject_id == stringify_id(2)) &
                                                                (training_records.exercise_id == stringify_id(6)) &
                                                                (training_records.sensor_code == 'act') &
                                                                (training_records.window_idx == 29)])


#Save for use in other notebooks:
%store training_records
%store testing_records

In [None]:
training_records

In [None]:
testing_records