# Human Performance Monitoring Module (HPMM)

Authors: Ruoxin Xiong, Carnegie Mellon University; Jiawei Chen, Arizona State University; Pingbo Tang, Carnegie Mellon University

Email: ruoxinx@andrew.cmu.edu

This module predicts the loss of separation, an indicator of the air traffic controller's operational performance, with five machine learning based classifiers.

Loss of separation is a situation where aircraft fail to maintain minimum distances in controlled airspace.

More information about CatBoost can be found [here](https://arxiv.org/abs/1706.09516)

### Environment Requirements

The required packages are,

- [catboost](https://catboost.ai/docs/installation/python-installation-method-pip-install.html#python-installation-method-pip-install)
- pandas
- numpy
- scikit-learn

## Importing library

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import sklearn.preprocessing
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

from catboost import CatBoostClassifier

## Data Preparation

The sample data is collected from controller-in-the-loop simulation experiments during the air traffic control tasks.

In [None]:
df = pd.read_csv('./human_data.csv')
df.info()

## Data Process
- Drop specified columns of redundant variables for los prediction. 
    - Here we drop 'Ss', 'condtn', 'los_freq', 'los_dur_over5min', 'query_timed_out', 'ready_timed_out', 'ready_latency_adj', 'cum_los_dur','stimuli', 'response_text', 'condtn_num', 'query'.
- Data imputation with mean valus.
- Define predictor(X) and criterion(Y)
    - Transform 'los_freq' into binary classes (0 - no loss of separation; 1 - losses of separation)

In [None]:
df.drop(columns=['Ss', 'at_sec', 'condtn', 'ready_latency', 'query_latency', 'response_index', 'los_dur_over5min','query_timed_out', 'ready_timed_out', 'ready_latency_adj',
                 'cum_los_dur','stimuli', 'response_text', 'condtn_num', 'query'], inplace=True)

df[df['los_freq'] > 1] = 1

df.fillna(df.mean(), inplace=True)

df.info()

In [None]:
X = df.drop(columns=['los_freq'])
Y = df['los_freq']

# Training and testing los prediction model

- Predict the occurence of loss of separation with five machine learning based classifiers
    - CatBoost
    - Support Vector Machines (SVM)
    - Decision Tree
    - k-Nearest Neighbors (kNN)
    - Naive Bayes
 
- Split the dataset into 80% and 20% for training and testing, respectively.
- The optimal CatBoost model parameters, including learning rate, depth, and L2 regularization term, are determined based on the grid search.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=10)

## Model 1: CatBoost model

In [None]:
cb = CatBoostClassifier(
    loss_function='Logloss',
    eval_metric='Accuracy'
)

grid = {'learning_rate': [0.01, 0.05, 0.1],
        'depth': [4, 6, 10],
        'l2_leaf_reg': [1, 3, 5, 7, 9],
        } 

cb.grid_search(grid, X=X_train, y=y_train, verbose= 2)

Obtain the optimal hyperparameters

In [None]:
cb.get_params()

## CatBoost Model Evaluation

- Precision
- Recall
- F1-score

In [None]:
pred = cb.predict(X_test)
print(classification_report(y_test, pred))

Visualize confusion matrix
- label(0): no LOS
- label(1): LOS

In [None]:
plot_confusion_matrix(cb, X_test, y_test) 

## Model 2: Support Vector Machines (SVM)

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB

In [None]:
svm = SVC(gamma=2, C=1)
svm.fit(X_train, y_train)

In [None]:
pred = svm.predict(X_test)
print(classification_report(y_test, pred))

In [None]:
plot_confusion_matrix(svm, X_test, y_test) 

## Model 3: Decision Tree

In [None]:
dt = DecisionTreeClassifier(max_depth=5)
dt.fit(X_train, y_train)

In [None]:
pred = dt.predict(X_test)
print(classification_report(y_test, pred))

In [None]:
plot_confusion_matrix(dt, X_test, y_test) 

## Model 4: k-Nearest Neighbors (kNN)

In [None]:
knn = KNeighborsClassifier(3)
knn.fit(X_train, y_train)

In [None]:
pred = knn.predict(X_test)
print(classification_report(y_test, pred))

In [None]:
plot_confusion_matrix(knn, X_test, y_test) 

## Model 5: Naive Bayes

In [None]:
nn = GaussianNB()
nn.fit(X_train, y_train)

In [None]:
pred = nn.predict(X_test)
print(classification_report(y_test, pred))

In [None]:
plot_confusion_matrix(nn, X_test, y_test) 

## Model 3: Decision Tree

# Feature importance

- For each feature, feature importance shows how much the prediction changes if the feature value changes. The bigger the value of the feature importance, the bigger is the change to the prediction value.

    - facial expressions: postive, neutral, and negative
    - eyeblink
    - head pose: rx, ry, rz
    - heart beat interval: interbeat_interval
    - wl_rating: workload rating
    - sa_correct: situation awarnesss
    - traffic density: number of aircraft

In [None]:
# calculate the feature importance of CatBoost
model = cb

fea_ = model.feature_importances_

fea_name = list(X.columns)
fea_name = [str(j) for j in fea_name]

# plt.figure(figsize=(12, 8))
plt.title('CatBoost feature importance')
plt.xlabel('FEATURE IMPORTANCE')
# plt.ylabel('FEATURE NAMES')
plt.barh(fea_name,fea_,height =0.5)