# Human Performance Monitoring Module (HPMM)
Authors: Jiawei Chen & Ruoxin Xiong, Carnegie Mellon University

Email: ruoxinx@andrew.cmu.edu

# Overview of HPMM Module
This module uses data collected from ASU's Air Traffic Controller Simulation Experiments where three 25 minute approach scenarios were simulated - a baseline workload, a high workload under nominal conditions, and a high workload under off-nominal conditions. Information on these data can be found on the ASU ULI's website [here](https://uli.asu.edu/wp-content/uploads/2020/08/Presentation-AIAA-talk-2019-Task-3-Human-Systems-Integration.pdf). 

The sample data is collected from controller-in-the-loop simulation experiments during the air traffic control tasks. The performance measure of the ATC experiments in each scenario were Loss of Separation (LoS) where aircraft fail to maintain minimum separation distances in controlled airspace. This module uses LoS as an indicator of the air traffic controller's operational performance. 


## Installing the required Python packages

The required Python packages for this module are:
- ***[```catboost```]***(https://catboost.ai/docs/installation/python-installation-method-pip-install.html#python-installation-method-pip-install)
- ***[```pandas```]***
- ***[```numpy```]***
- ***[```sklearn```]***

In the Ubuntu or Anaconda terminal, execute ```conda install catboost pandas numpy sklearn```. 

## Step 1: Processing and Visualizing ATC Data
### Step 1a: Import ```human_data.csv```

In [None]:
import pandas as pd

df = pd.read_csv('./human_data.csv')

df.head()

### Step 1b. Downselect columns

Drop specified columns of redundant variables for LoS prediction. 

In [None]:
cols_to_drop = ['Ss', 'at_sec', 'condtn', 'ready_latency', 'query_latency', 'response_index', 
                 'los_dur_over5min','query_timed_out', 'ready_timed_out', 'ready_latency_adj',
                 'cum_los_dur','stimuli', 'response_text', 'condtn_num', 'query']

df.drop(columns=cols_to_drop, inplace=True)

df.head()

### Step 1c: Transform LoS into binary class and fill NaN values

In [None]:
# If LoS > 1, impute to 1
df.loc[df.los_freq>1,'los_freq']=1

#If value in column is NaN, replace with mean of column
df.fillna(df.mean(), inplace=True)

df.head()

### Step 1d: Define predictor and criterian 
Define predictor(X) and criterion(Y)

In [None]:
X = df.drop(columns=['los_freq'])
Y = df['los_freq']

## Step 2:  Training and testing LoS prediction models
In this step, we aim to predict the occurrence of LoS with the various classification models:
1. ```catboost``` Python package. ```catboost``` employs a machine learning based classifier. 
2. Support vector machines (SVM)
3. Decision Tree
4. k-Nearest Neighbors (KNN)
5. Naive Bayes Classifier

### Model Option 1: ```catboost``` model

### Step2a: Divide data into train and test sets
Split the dataset into 80% and 20% for training and testing, respectively.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=10)

X_train.head()

In [None]:
y_train.head()

### Step 2b: Define ```catboost``` model, loss function, and evaluation metric
Here we use the ```CatBoostClassiifier``` model with a LogLoss loss function and Accuracy as the evaluation metric.

In [None]:
from catboost import CatBoostClassifier

model = CatBoostClassifier(
    loss_function='Logloss',
    eval_metric='Accuracy')

### Step 2c: Search for learning rate, depth, and L2 regularization
The optimal model parameters for learning rate, depth, and L2 regularization are determined based on a grid search.

In [None]:
# Define grid options
grid = {'learning_rate': [0.01, 0.05, 0.1],
        'depth': [4, 6, 10],
        'l2_leaf_reg': [1, 3, 5, 7, 9],
        } 


# Search grid
model.grid_search(grid, X=X_train, y=y_train, verbose= 2)

### Step 2d: Determine best model parameters


In [None]:
model.get_params()

## Step 3: ```catboost``` Model Evaluation

### Step3a: Prediction with test holdout data
First we will predict the LoS output for X_test and compare with y_test. 

We will evaluate the model based on 3 criteria:
- Precision
- Recall
- F1-score

In [None]:
from sklearn.metrics import classification_report

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

### Step 3b: Visualize confusion matrix for test holdout data
A confusion matrix gives visual indication of accuracy at predicting the two labels
- label(0): no LOS
- label(1): LOS

In [None]:
from sklearn.metrics import plot_confusion_matrix

plot_confusion_matrix(model, X_test, y_test) 

## Step 4: Assessing feature importance
For each of the 11 features in the data, a feature importance plot shows how much the prediction changes if the feature value changes. The bigger the value of the feature importance, the bigger the expected change to the prediction value. Feature importance values are normalized to [0, 100].

### Step 4a: Get feature importances

In [None]:
fea_ = model.feature_importances_

fea_name = list(X.columns)
fea_name = [str(j) for j in fea_name]

for f_name,f in zip(fea_name,fea_):
    print(f_name,':',f)

## Step 5: Assess Other Modeling Options
### Model 2: Support Vector Machines (SVM)

In [None]:
from sklearn.svm import SVC

svm = SVC(gamma=2, C=1)
svm.fit(X_train, y_train)

pred = svm.predict(X_test)
print(classification_report(y_test, pred))

In [None]:
plot_confusion_matrix(svm, X_test, y_test) 

### Model 3: Decision Tree

In [None]:
from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier(max_depth=5)
dt.fit(X_train, y_train)

pred = dt.predict(X_test)
print(classification_report(y_test, pred))

In [None]:
plot_confusion_matrix(dt, X_test, y_test) 

### Model 4: k-Nearest Neighbors (kNN)

In [None]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(3)
knn.fit(X_train, y_train)

pred = knn.predict(X_test)
print(classification_report(y_test, pred))

In [None]:
plot_confusion_matrix(knn, X_test, y_test) 

### Model 5: Naive Bayes

In [None]:
from sklearn.naive_bayes import GaussianNB

nn = GaussianNB()
nn.fit(X_train, y_train)

pred = nn.predict(X_test)
print(classification_report(y_test, pred))

In [None]:
plot_confusion_matrix(nn, X_test, y_test) 