# Gait Analysis IMU ML Analysis 

*Application of Data Science Methodology toward classifying the gait activity of subjects via an Inertial Measurement Unit (IMU) motion sensor placed on the top of the foot*  

---

## Table Of Contents (so far)

1. Introduction
    * This is a general introduction to the problem and methodology, and we do some light preparation here. Nothing super interesting.

    
 > ⚠ I suggest moving ahead to the links below to get to the more visual stuff ⚠
    
    

2. [Filtering Accelerometer Data](Accelerometer-median-filtering.ipynb)
 * Interactive Median Filtration! 😀


3. [Smoothing Gyroscope Data](./gyroscope-FFT-filtering.ipynb)
 * Interactive FFT filtration! 😊

---

# 💡 New to Jupyter?

> Run cells with <kbd>Shift + Enter</kbd> and have fun!<br/><br/>
> Insert new cells above or below with <br/> <br/>
>  (command mode) <kbd>Esc</kbd> + <kbd>A</kbd> <br/>
>  (command mode) <kbd>Esc</kbd> + <kbd>B</kbd>

# Introduction

## 1. Question or Goal
---

Using 3D accelerometer and gyroscope data gathered from an Inertial Measurement Unit (IMU) placed on the top of the foot of a subject, can we accurately classify their activity (e.g. walking, running, standing, sitting, laying down, etc.)? Further, could refinements be made to integrate the data to model the gait of the subject and analyze their stride length, velocity or other factors?

We can see this potentially as a business problem, perhaps with the goal of developing cheaper diagnostic tools. Or perhaps in applications that can track your workout activities and return relevant statistics to the user.

There are different considerations to be made, but this is intended to serve primarily as a tutorial and data visualization demonstration, so we will have to make some hopefully healthy compromises along the way.

## 2. Analytic Approach
---
We have to understand the types of answers we want from this question, and to assess how the data can be used to give those answers.

We need to discuss how the measurements were taken to come up with the correct goals and approaches and models to suss out reasonably accurate and efficient results.

---

### Procedure

- The IMU after being charged, is secured with tape on top of foot.


-  begins recording after being activated by button press while on the foot


- subject is instructed to perform a prescribed set of activities (similar to):
  - stand
  - stomp (3 times)
  - bend foot
  - stomp 
  - walk
  - rest
  - run
  - rest
  - random stuff - for test data walking


- The data is written to `.csv` file that is downloaded directly from the device after a period of recording.

### Measurements

- It measures **acceleration** in g's, a unit of acceleration that is equivalent to the acceleration of gravity (9.81 $m/s^2$).


- It measures **rotational velocity** in degrees per second (deg/s).


- This makes for a total of 6 physical degrees of freedom, and 7 with time. 


- However, the IMU gives us additional data that is not necessary for our analysis, including temperature and 3D magnetometer data. This will be important to clean at the data preparation stage.


## Considerations / Precautions
---

> It's important to scrutinize and consider the errors or constraints inherent in y/our data. One of our goals is to mitigate these errors. 
>
>While I'd rather skip these parts and shove data into a model, it's important to address these as best we can. Sometimes we cannot, and other times we choose not. But it's important to try to acknowledge these aspects of the data


* The IMU is not exactly state-of-the-art, and gyroscope data is very prone to **drift**. We will use **signal processing** techniques like **FFT analysis** to attempt to remove noise and drift.


* We can use *spectral information* from the signals to design different **filters** to see what provides the clearest and most distinct features.


* Incorrect placement of the IMU.
    
> Naturally, it is not going to sit flat on the top of the foot. We must reorient the device and transform the coordinate system to compensate at times, depending on how we want to model things. There is a question about the efficacy and the effort involved in automating this reorientation since it's also possible to classify the activity well enough with an off-center coordinate system. We might however miss out on the potential to perform some real time modeling of the foot/device in the future.
  
I am choosing to forego performing 3D scaling and transformations on the data for the time being and will focus on the signal analysis and model training aspects of the analysis.


# Types of Answers / Goal
---

1. We want to **classify** the activity of the subject (or user).
     * This means we are going to prefer a supervised learning model, a probabilistic model that tells us the most likely outcome or classification from a set of predetermined (supervised) possibilities.
     * We can choose a model and train it to predict the most likely outcomes based off our input training data and the training labels (the correct classifications).  


2. We also want to integrate the accelerometer data to measure average (or real-time) velocity and perhaps integrate that to measure position. We will leverage the information in the lab notes to help in developing accurate models. Another factor is stride length, periodicity (is the left stride longer than the right?).  


3. We may also eventually want to visually model the activity of the foot.  

> ### Okay, that's asking a lot.

That's fine! Let's just make headway toward classifying between three activities: running, walking and standing.

Now we are entering the data requirements phase. We're getting closer to playing with the data, so hang in there.

## 3. Data Requirements
---


* We need to remove the unnecessary data from the dataframe.


* We need to filter noise and drift out of the data. (i.e. smooth)


* We need to manually label the data with help from `bqplot` and classify it as walking, standing or running so we can train predictive/classification model.


* We need to reorient the data to a body-centered coordinate axis.
    * Or at the very least flatten the coordinate system so that gravity points, (0,0,1) in (x,y,z) acc vector
    * Then we can scale/standardize according to FFT analysis and can look for features

### Okay, this is also asking a lot. Well, let's just make progress!

## 4. Data Collection
---

This phase is essentially complete, unless at some point we'd like to get more data to classify different activities. The data only provide enough information for walking and standing and running, and parameters involving those states.

There are physical notes that were taken as well as video, which I don't have access to.


<div id="playtime"><div/>


## 5. Data Understanding / Visualization

Understanding the nature of the data, and the heuristic properties of the experiment and procedure itself will help us to engineer features to look for that may help us classify the activities.

Examples might be the maximum acceleration in the x direction, minimum acceleration in the z direction, etc.

Alternatively, we can just visualize the accelerometer data in the X,Y,Z components and see if we notice any patterns. Similarly for rotation.


In [1]:
import numpy as np
import pandas as pd
import datetime

import warnings
warnings.filterwarnings("ignore")
file_path = "./data/TAS1F06180329 (2018-10-24)-IMU.csv"

# The header value has to do with how the CSV is formatted.
# The 10th row contains the column names

IMU_data = pd.read_csv(file_path, header=10)
print("File Read")

File Read


Check that the data look right

In [2]:
IMU_data.head()

Unnamed: 0,Timestamp,Accelerometer X,Accelerometer Y,Accelerometer Z,Temperature,Gyroscope X,Gyroscope Y,Gyroscope Z,Magnetometer X,Magnetometer Y,Magnetometer Z
0,2018-10-24T11:20:00.0000000,0.046875,-0.008301,1.015625,31.68979,-1.403809,1.403809,-0.549316,-11.132812,7.03125,24.316405
1,2018-10-24T11:20:00.0100000,0.049316,-0.003906,1.016602,31.698775,-1.647949,1.342774,-0.549316,-11.132812,7.03125,24.316405
2,2018-10-24T11:20:00.0200000,0.045898,-0.010742,1.010254,31.69578,-2.258301,1.159668,-0.549316,-11.132812,7.03125,24.316405
3,2018-10-24T11:20:00.0300000,0.044922,-0.010254,1.01416,31.698775,-2.258301,1.098633,-0.549316,-11.132812,7.03125,24.316405
4,2018-10-24T11:20:00.0400000,0.050781,-0.011719,1.008301,31.677809,-2.380371,1.037598,-0.549316,-11.132812,7.03125,24.316405


## 5.1 Light Data Cleaning

### Standardizing/Selecting Columns

Look at the variables/columns provided by the IMU. We will want to *standardize the column names*--we will do this by casting to lowercase and removing spaces. You'll also notice there's a few more variables other than time, acceleration and gyroscope data. We will `drop` these columns using `df.drop(col_name)`.

In [3]:
IMU_data.columns # or try IMU_data.keys()

Index(['Timestamp', 'Accelerometer X', 'Accelerometer Y', 'Accelerometer Z',
       'Temperature', 'Gyroscope X', 'Gyroscope Y', 'Gyroscope Z',
       'Magnetometer X', 'Magnetometer Y', 'Magnetometer Z'],
      dtype='object')

In [4]:
# This is how I'm choosing to rename/standardize the columns without the rename function.
new_column_names = []

for val in IMU_data.columns:
    # print(f"before: {val}")
    val = val.lower()
    val = val.replace(" ", "_")
    new_column_names.append(val)
    # print(f"after: {val}")`
    
# Assign new values to old columns
IMU_data.columns = new_column_names

In [5]:
IMU_data.columns.values # can also just use '.columns' without looking at values

array(['timestamp', 'accelerometer_x', 'accelerometer_y',
       'accelerometer_z', 'temperature', 'gyroscope_x', 'gyroscope_y',
       'gyroscope_z', 'magnetometer_x', 'magnetometer_y',
       'magnetometer_z'], dtype=object)


Now I want to `df.drop()` all the undesirable columns: the magnetometer and the temperature data.

In [6]:
undesirables = ["temperature", "magnetometer_x", "magnetometer_y", "magnetometer_z"]

IMU_data.drop(undesirables, # List of col names to drop
              axis=1,       # Axis = 1 specifies columns. Axis = 0 specifies rows.
              inplace=True) # inplace = True means modify orig, False means return copy

### Reindexing for time, instead of date

If the start time is '2018-10-24T11:20:00.0000000', then we obtain the initial time of 0.0s by subtracting that from every timestamp in the data frame. 

|DATE|--->|DATETIME OBJ|--->|UNIX TS|--->|TIME(s)|
|----|----|------------|----|----|----|------------|
|2018-10-24T11:20:00.0000000|--->|DT obj|--->|1540405200.0|--->|0.0
|2018-10-24T11:20:20.0000000|--->|DT obj|--->|1540405220.0|--->|20.0

In [7]:
import dateutil

# map is basically saying apply function to every member
# the argument is a function object, not its call/return
IMU_data['time'] = IMU_data['timestamp'].map(dateutil.parser.isoparse)
IMU_data['time'] = IMU_data['time'].map(datetime.datetime.timestamp)

# Difference between each current second and the initial (removes large unix number)
IMU_data['time'] = IMU_data['time'] - IMU_data['time'].iloc[0]

print(IMU_data.head())

                     timestamp  accelerometer_x  accelerometer_y  \
0  2018-10-24T11:20:00.0000000         0.046875        -0.008301   
1  2018-10-24T11:20:00.0100000         0.049316        -0.003906   
2  2018-10-24T11:20:00.0200000         0.045898        -0.010742   
3  2018-10-24T11:20:00.0300000         0.044922        -0.010254   
4  2018-10-24T11:20:00.0400000         0.050781        -0.011719   

   accelerometer_z  gyroscope_x  gyroscope_y  gyroscope_z  time  
0         1.015625    -1.403809     1.403809    -0.549316  0.00  
1         1.016602    -1.647949     1.342774    -0.549316  0.01  
2         1.010254    -2.258301     1.159668    -0.549316  0.02  
3         1.014160    -2.258301     1.098633    -0.549316  0.03  
4         1.008301    -2.380371     1.037598    -0.549316  0.04  


### Reordering Columns

In [11]:
IMU_data = IMU_data[['timestamp','time','accelerometer_x',
                     'accelerometer_y','accelerometer_z',
                    'gyroscope_x','gyroscope_y','gyroscope_z']]

### Saving the data
Now might be a good time to **save** the groomed csv file to save us trouble/cleaning in the future. We do this using
`df.to_csv()` and specifying a location and file name.

In [12]:
IMU_data.to_csv("./data/Motion_data_stage1.csv",index=False)

Verify this new `.csv` file was saved.

In [13]:
df = pd.read_csv("./data/Motion_data_stage1.csv")
print(df.head(),"\n")


# Or, alternatively, without reading a big file into memory:
#import os
#print(os.path.isfile('./data/Motion_data_stage1.csv'))

                     timestamp  time  accelerometer_x  accelerometer_y  \
0  2018-10-24T11:20:00.0000000  0.00         0.046875        -0.008301   
1  2018-10-24T11:20:00.0100000  0.01         0.049316        -0.003906   
2  2018-10-24T11:20:00.0200000  0.02         0.045898        -0.010742   
3  2018-10-24T11:20:00.0300000  0.03         0.044922        -0.010254   
4  2018-10-24T11:20:00.0400000  0.04         0.050781        -0.011719   

   accelerometer_z  gyroscope_x  gyroscope_y  gyroscope_z  
0         1.015625    -1.403809     1.403809    -0.549316  
1         1.016602    -1.647949     1.342774    -0.549316  
2         1.010254    -2.258301     1.159668    -0.549316  
3         1.014160    -2.258301     1.098633    -0.549316  
4         1.008301    -2.380371     1.037598    -0.549316   



# 5.2 More Preparation
---
We are still within the **data understanding** stage, and we have done some light data cleaning/preparation.


In order to get accurate real-world measurements, we need to attempt to **correct for noise**, positioning of the foot, and other factors.


>There is a trade off between performing the most accurate cleaning and the most efficient, as it may take too much time/computation to make a model, or even to efficiently and intelligently process signals for input into an ML model--if this cleaning and processing were to be automated and put into production. Overall, for vectorized operations, they will be efficient, but others will be more computationally difficult.

Just a foreshadowing, but we plan on implementing a basic K-Nearest Neighbors model after exploring some features of the data.


### Positioning/acceleration - Heuristic stuff

At rest, the foot has one unit of acceleration applied opposite of the z-axis. In a 3-dimensional, xyz-coordinate, system this would give a measurement of (0g,0g,1g). Let's average the acceleration of the resting foot to determine if this is the case.

> note that you would not simply just subtract (0,0,1) from all inputs. Since the angle of the foot changes, you need to get the angle of rotation of the foot, and then subsequently rotate a vector (0,0,1) in that same direction, and then subtract the resulting vector from the data.
>
>
> > This physical element would be super fun to implement but for now it is a <span style="color: orange">TODO</span>. Again, we will focus on the signal processing aspects of this analysis.

