# Bayesian Neural Networks to Predict Hard Landing with DASHlink Data
Authors: Dr. Yingxiao Kong, Vanderbilt University

Email: yingxiao.kong@vanderbit.edu

## Overview of Research
In this work, we use an open-source dataset - [NASA's DASHlink data](https://c3.ndc.nasa.gov/dashlink/) - to isolate data for landing aircraft that both have hard landing and normal landing occurrences. The objective is to use [this sample data](https://c3.ndc.nasa.gov/dashlink/projects/85/resources/?type=ds) to train a Bayesian Neural Network model to predict touchdown vertical speed for a landing aircraft with the intent to use as a screening for identifying hard landing events before they occur.

This series of Jupyter notebook demonstrations into 3 modules. The presented module is in **bold**:
- Module 1 - Download DASHlink Data
- **Module 2 - DASHlink Data Pre-Processing and Feature Selection with Maximum Relevance and Minimum Reduandancy (MRMR)**
- Module 3 - Bayesian Neural Network Model Training

## Module 2: DASHlink Data Pre-Processing and Parameter Selection
This is a demonstration of how to filter, standardize, and clean the DASHlink data downloaded and viewed in Module 1. In addition, we select the features most relevant to hard landing event.


## Installing the required Python packages

The required Python packages for this module are:
- ***```pandas```***
- ***```numpy```***
- ***```scipy```***
- ***```matplotlib```***

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Step 1: Isolate Data Landing at MSP

### Step 1a: Get all downloaded ```.mat``` files

In [2]:
import glob
OUTPUT_DIRECTORY = r'../../../dashlink-data'
downloaded_mat_files = glob.glob(OUTPUT_DIRECTORY+'/**/*.mat',recursive=True)
downloaded_mat_files

['../../../dashlink-data/Tail_687_8/687200312261442.mat',
 '../../../dashlink-data/Tail_687_8/687200403210418.mat',
 '../../../dashlink-data/Tail_687_8/687200312300437.mat',
 '../../../dashlink-data/Tail_687_8/687200309090752.mat',
 '../../../dashlink-data/Tail_687_8/687200403162255.mat',
 '../../../dashlink-data/Tail_687_8/687200403191335.mat',
 '../../../dashlink-data/Tail_687_8/687200311301726.mat',
 '../../../dashlink-data/Tail_687_8/687200402021225.mat',
 '../../../dashlink-data/Tail_687_8/687200312241558.mat',
 '../../../dashlink-data/Tail_687_8/687200401281006.mat',
 '../../../dashlink-data/Tail_687_8/687200312180425.mat',
 '../../../dashlink-data/Tail_687_8/687200402081856.mat',
 '../../../dashlink-data/Tail_687_8/687200403211154.mat',
 '../../../dashlink-data/Tail_687_8/687200403130700.mat',
 '../../../dashlink-data/Tail_687_8/687200310291853.mat',
 '../../../dashlink-data/Tail_687_8/687200402010708.mat',
 '../../../dashlink-data/Tail_687_8/687200309111920.mat',
 '../../../das

### Step 1b: Isolate landing data at MSP at specified heights
According to the DASHlink website, the ```PH``` enumerated codes are: 
- 0=Unknown
- 1=Preflight
- 2=Taxi
- 3=Takeoff
- 4=Climb
- 5=Cruise
- 6=Approach
- 7=Rollout

Here we are interested in ```PH = 7``` at heights of 200,100,50,40,30,20,10,8,6,4,2, and 0 feet above landing altitude. 

In [4]:
PHASE_NO = 7
MSP_AIRPORT_LAT_LON = [44.88526995556498, -93.2015923365669]
HEIGHT_LIST = np.array([200,100,50,40,30,20,10,8,6,4,2,0])

In [5]:
from dataUtils import DASHlinkData

key_list_25=['LATP','LONP','MSQT_1','BAL1','TAS','GS','TH','FLAP','GLS','LOC','N1_1','PTCH','ROLL','TRK','AIL_1','RUDD','ELEV_1',\
         'BLAC','CTAC','FPAC','CCPC','CWPC','WS','WD','ALTR']

SUBSET_FOR_DEMO = 40
landing_mat_files = []
dfs = []

for i,mat_file in enumerate(downloaded_mat_files[:SUBSET_FOR_DEMO]):
    print("Processing {} of {} .mat files.".format(i+1,len(downloaded_mat_files)))
    dl_data = DASHlinkData(mat_file)
    if dl_data.contains_phase_no(phase=PHASE_NO):
        if dl_data.lands_at_airport(airport_lat_lon=MSP_AIRPORT_LAT_LON):
            for key in key_list_25:
                dl_data.temporal_resample_to_4_seconds(key)
            df_new = dl_data.get_data_at_heights_in_ft(HEIGHT_LIST)
            dfs.append(df_new)

df_landing = pd.concat(dfs)

Processing 1 of 5376 .mat files.
Processing 2 of 5376 .mat files.
Processing 3 of 5376 .mat files.
Processing 4 of 5376 .mat files.
Processing 5 of 5376 .mat files.
Processing 6 of 5376 .mat files.
Processing 7 of 5376 .mat files.
Processing 8 of 5376 .mat files.
Processing 9 of 5376 .mat files.
Processing 10 of 5376 .mat files.
Processing 11 of 5376 .mat files.
Processing 12 of 5376 .mat files.
Processing 13 of 5376 .mat files.
Processing 14 of 5376 .mat files.
Processing 15 of 5376 .mat files.
Processing 16 of 5376 .mat files.
Processing 17 of 5376 .mat files.
Processing 18 of 5376 .mat files.
Processing 19 of 5376 .mat files.
Processing 20 of 5376 .mat files.
Processing 21 of 5376 .mat files.
Processing 22 of 5376 .mat files.
Processing 23 of 5376 .mat files.
Processing 24 of 5376 .mat files.
Processing 25 of 5376 .mat files.
Processing 26 of 5376 .mat files.
Processing 27 of 5376 .mat files.
Processing 28 of 5376 .mat files.
Processing 29 of 5376 .mat files.
Processing 30 of 5376 .

In [6]:
print("Number of Files with Landing Aircraft: {} out of {}.".format(len(landing_mat_files),len(downloaded_mat_files)))

Number of Files with Landing Aircraft: 0 out of 5376.


In [10]:
df_landing

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,CWPC,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST
0,44.891682,-93.241822,1.0,1006.643045,117.933373,115.007588,125.581422,3645.000000,0.014040,-0.003136,...,1782.000000,10.948235,-159.834434,-536.630359,-104.780302,44.886191,-93.228955,806.643045,200,1185.595409
1,44.888764,-93.234960,1.0,906.643045,115.530863,111.638458,125.322379,3645.000000,0.183690,0.003136,...,1795.861409,9.932823,-165.738392,-709.834437,-104.780302,44.886191,-93.228955,806.643045,100,553.902372
2,44.887734,-93.232386,1.0,856.643045,113.021779,110.749442,123.876531,3645.000000,0.133380,0.003332,...,1672.000000,6.212059,-163.252476,-823.996865,-104.780302,44.886191,-93.228955,806.643045,50,320.767080
3,44.887502,-93.231929,1.0,846.643045,112.537305,110.196048,124.172654,3645.000000,0.136028,0.002001,...,1718.580098,6.975467,-166.607074,-688.934215,-104.780302,44.886191,-93.228955,806.643045,40,276.478561
4,44.887351,-93.231586,1.0,836.643045,111.188473,109.765476,124.137817,3645.000000,0.133199,0.002040,...,1748.036175,6.994513,-160.733711,-773.697501,-104.780302,44.886191,-93.228955,806.643045,30,244.578379
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,44.886235,-93.230536,1.0,853.846249,117.818836,105.893662,118.910790,3645.371366,0.037429,-0.000711,...,1799.060325,12.607906,101.716209,-721.149992,-104.394851,44.885846,-93.229629,845.846249,8,83.675778
8,44.886018,-93.229997,1.0,851.846249,118.375135,105.495087,119.058555,3646.000000,0.039390,-0.000588,...,1767.000000,13.923377,104.576085,-699.578103,-104.394851,44.885846,-93.229629,845.846249,6,34.788936
9,44.886018,-93.229997,1.0,849.846249,117.607793,105.041968,119.227015,3646.000000,0.039390,-0.000588,...,1848.057368,14.655849,106.734810,-600.032488,-104.394851,44.885846,-93.229629,845.846249,4,34.788936
10,44.885930,-93.229808,1.0,847.846249,113.495485,104.123699,119.553710,3645.487424,0.302065,-0.000688,...,1788.660839,11.736493,107.132666,-391.609928,-104.394851,44.885846,-93.229629,845.846249,2,16.956967


In [11]:
df_landing.to_csv('processed_data_landing_at_msp.csv',index=False)

## Step 2: Feature Selection with Maximum Relevance Minimum Redundancy (MRMR)
The original 186 parameters are cut down to 26 based on literature review. Then the 26 parameters are further sorted based on Maximum relavance Minimum Redundancy (MRMR). 
Data is first smoothed, and then sliced based on selected heights. The average of each paramters is calculated.

### Step 2a: Group data by height with ```pandas.groupby```

In [22]:
grpby = df_landing.groupby(by='heights')
grpby.get_group(0)

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,CWPC,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST
11,44.886191,-93.228955,1.0,806.643045,103.976395,103.616195,123.378983,3645.0,0.27144,0.002744,...,1667.0,4.278736,-155.074287,-104.780302,-104.780302,44.886191,-93.228955,806.643045,0,0.0
11,44.885846,-93.226045,1.0,799.291242,117.35854,119.301601,123.410491,3645.0,-0.10959,0.00588,...,2236.0,6.403527,-64.065263,-177.267211,-177.267211,44.885846,-93.226045,799.291242,0,0.0
11,44.883786,-93.199606,1.0,822.106482,105.834217,95.831617,-57.981012,3645.0,-0.02184,0.011368,...,1734.0,8.281502,-54.447144,125.745074,125.745074,44.883786,-93.199606,822.106482,0,0.0
11,44.885333,-93.226382,1.0,793.020025,112.328895,108.958568,121.460417,3645.0,-0.24882,0.004704,...,2080.0,3.900689,147.137838,-270.814762,-270.814762,44.885333,-93.226382,793.020025,0,0.0
11,44.882756,-93.198228,1.0,778.709209,110.752766,104.562259,-57.987705,3645.0,0.5421,-0.0049,...,2210.0,6.96871,-33.283197,-32.735479,-32.735479,44.882756,-93.198228,778.709209,0,0.0
11,44.886704,-93.229629,1.0,807.352945,111.45682,118.017037,123.590224,3645.0,-0.26325,0.001568,...,2370.0,7.072058,-104.166199,40.15815,40.15815,44.886704,-93.229629,807.352945,0,0.0
11,44.885846,-93.229629,1.0,845.846249,113.884462,102.947668,119.889758,3645.0,0.55185,-0.000784,...,1617.0,11.175749,108.730358,-104.394851,-104.394851,44.885846,-93.229629,845.846249,0,0.0


In [35]:
grpby.get_group(200).heights

0    200
0    200
0    200
0    200
0    200
0    200
0    200
Name: heights, dtype: int64

### Step 2b: Compute parameter averages at each height

In [29]:
ave_by_height = {}
for g in grpby:
    height = g[0]
    df = g[1]
    ave_by_height.update({height : df.mean()})   

In [43]:
ave_by_height[200]

LATP         44.887838
LONP        -93.226149
MSQT_1        1.000000
BAL1       1007.567028
TAS         127.270687
GS          122.500731
TH           73.804465
FLAP       3645.000000
GLS           0.025367
LOC           0.003916
N1_1         58.613045
PTCH         -2.899810
ROLL         -0.018095
TRK          69.612989
AIL_1        86.132075
RUDD        -27.759303
ELEV_1       -7.708447
BLAC         -0.054787
CTAC         -0.000972
FPAC         -0.004573
CCPC       2343.176232
CWPC       1771.031208
WS           11.695184
WD          -69.981189
ALTR       -642.818195
TD_ALTR     -74.869912
TD_LAT       44.885209
TD_LON      -93.219782
TD_ALT      807.567028
heights     200.000000
DIST       1340.071335
dtype: float64

### Step 2c: Compute MRMR with Spearman Correlation

In [40]:
def spearman(para1,para2):
    return np.abs(stats.spearmanr(para1,para2))[0]

In [42]:
coe_list = [0,.25,.5,.75,1]
sele_key_list=['CCPC','CTAC','PTCH','ELEV_1','BLAC','N1_1','GS','TAS','GLS','WS','ROLL','FPAC','WD','LONP','TH','LATP','DIST','AIL_1','LOC','TRK','BAL1','RUDD','FLAP','ALTR']
unsort_columns= ave_by_height[0].columns
all_sort_columns = []
all_sort_index =[]
for k in coe_list:
    print(k)
    coe =k
    available_index = np.arange(ave_by_sample_array.shape[1]-1)
    select_index = []
    mean_spearman_list = np.abs([stats.spearmanr(ave_by_sample_array[:,-1],ave_by_sample_array[:,i])[0] for i in range(ave_by_sample_array.shape[-1]-1)])
    tem_best = list(mean_spearman_list).index(max(mean_spearman_list))
    select_index.append(tem_best)
    available_index = [i for i in available_index if i!=tem_best]
    while len(available_index)!=0:
        best_value = -10000
        best_index = available_index[0]
        for index in available_index:
            with_tar = spearman(ave_by_sample_array[:,-1],ave_by_sample_array[:,index])
            with_other = np.mean([spearman(ave_by_sample_array[:,index],ave_by_sample_array[:,i]) for i in select_index])
            tem_value = with_tar-with_other*coe
            if tem_value>best_value:
                best_index = index
                best_value = tem_value
        available_index = [j for j in available_index if j!=best_index]
        select_index.append(best_index)
    sort_columns = [unsort_columns[s] for s in select_index ]
    all_sort_columns.append(sort_columns)
    all_sort_index.append(select_index)

AttributeError: 'Series' object has no attribute 'columns'

In [None]:
all_sort_columns_array = np.transpose(np.array(all_sort_columns))
insert = ['ALTR']*all_sort_columns_array.shape[1]

In [None]:
all_sort_columns_array = np.insert(all_sort_columns_array ,0,values='ALTR',axis =0)

In [None]:
all_all_sele_fea_26_spearman = pd.DataFrame(data = all_sort_columns_array,columns = '0 .25 .5 .75 1'.split() )