# Bayesian Neural Networks to Predict Hard Landing with DASHlink Data
Authors: Dr. Yingxiao Kong, Vanderbilt University

Email: yingxiao.kong@vanderbit.edu

## Overview of Research
In this work, we use an open-source dataset - [NASA's DASHlink data](https://c3.ndc.nasa.gov/dashlink/) - to isolate data for landing aircraft that both have hard landing and normal landing occurrences. The objective is to use [this sample data](https://c3.ndc.nasa.gov/dashlink/projects/85/resources/?type=ds) to train a Bayesian Neural Network model to predict touchdown vertical speed for a landing aircraft with the intent to use as a screening for identifying hard landing events before they occur.

This series of Jupyter notebook demonstrations into 3 modules. The presented module is in **bold**:
- Module 1 - Download DASHlink Data
- **Module 2 - DASHlink Data Pre-Processing and Feature Selection with Maximum Relevance and Minimum Reduandancy (mRMR)**
- Module 3 - Bayesian Neural Network Model Training

## Module 2: DASHlink Data Pre-Processing and Parameter Selection
This is a demonstration of how to filter, standardize, and clean the DASHlink data downloaded and viewed in Module 1. In addition, we select the features most relevant to hard landing event.


## Installing the required Python packages

The required Python packages for this module are:
- ***```pandas```***
- ***```numpy```***
- ***```scipy```***
- ***```matplotlib```***

In [4]:
import numpy as np
import pandas as pd

## Step 1: Isolate Data Landing at MSP

### Step 1a: Get all downloaded ```.mat``` files

In [5]:
import glob
OUTPUT_DIRECTORY = r'../../../dashlink-data'
downloaded_mat_files = glob.glob(OUTPUT_DIRECTORY+'/**/*.mat',recursive=True)
downloaded_mat_files

['../../../dashlink-data/Tail_687_8/687200312261442.mat',
 '../../../dashlink-data/Tail_687_8/687200403210418.mat',
 '../../../dashlink-data/Tail_687_8/687200312300437.mat',
 '../../../dashlink-data/Tail_687_8/687200309090752.mat',
 '../../../dashlink-data/Tail_687_8/687200403162255.mat',
 '../../../dashlink-data/Tail_687_8/687200403191335.mat',
 '../../../dashlink-data/Tail_687_8/687200311301726.mat',
 '../../../dashlink-data/Tail_687_8/687200402021225.mat',
 '../../../dashlink-data/Tail_687_8/687200312241558.mat',
 '../../../dashlink-data/Tail_687_8/687200401281006.mat',
 '../../../dashlink-data/Tail_687_8/687200312180425.mat',
 '../../../dashlink-data/Tail_687_8/687200402081856.mat',
 '../../../dashlink-data/Tail_687_8/687200403211154.mat',
 '../../../dashlink-data/Tail_687_8/687200403130700.mat',
 '../../../dashlink-data/Tail_687_8/687200310291853.mat',
 '../../../dashlink-data/Tail_687_8/687200402010708.mat',
 '../../../dashlink-data/Tail_687_8/687200309111920.mat',
 '../../../das

### Step 1b: Isolate landing data at MSP at specified heights
According to the DASHlink website, the ```PH``` enumerated codes are: 
- 0=Unknown
- 1=Preflight
- 2=Taxi
- 3=Takeoff
- 4=Climb
- 5=Cruise
- 6=Approach
- 7=Rollout

Here we are interested in ```PH = 7``` at heights of 200,100,50,40,30,20,10,8,6,4,2, and 0 feet above landing altitude. 

In [6]:
PHASE_NO = 7
MSP_AIRPORT_LAT_LON = [44.88526995556498, -93.2015923365669]
HEIGHT_LIST = np.array([200,100,50,40,30,20,10,8,6,4,2,0])

In [7]:
from dataUtils import DASHlinkData

key_list_25=['LATP','LONP','MSQT_1','BAL1','TAS','GS','TH','FLAP','GLS','LOC','N1_1','PTCH','ROLL','TRK','AIL_1','RUDD','ELEV_1',\
         'BLAC','CTAC','FPAC','CCPC','CWPC','WS','WD','ALTR']

SUBSET_FOR_DEMO = 40
dfs = []

for i,mat_file in enumerate(downloaded_mat_files[:SUBSET_FOR_DEMO]):
    print("Processing {} of {} .mat files.".format(i+1,len(downloaded_mat_files)))
    dl_data = DASHlinkData(mat_file)
    if dl_data.contains_phase_no(phase=PHASE_NO):
        if dl_data.lands_at_airport(airport_lat_lon=MSP_AIRPORT_LAT_LON):
            for key in key_list_25:
                dl_data.temporal_resample_to_4_seconds(key)
            df_new = dl_data.get_data_at_heights_in_ft(HEIGHT_LIST)
            dfs.append(df_new)

df_landing = pd.concat(dfs)

Processing 1 of 5376 .mat files.
Processing 2 of 5376 .mat files.
Processing 3 of 5376 .mat files.
Processing 4 of 5376 .mat files.
Processing 5 of 5376 .mat files.
Processing 6 of 5376 .mat files.
Processing 7 of 5376 .mat files.
Processing 8 of 5376 .mat files.
Processing 9 of 5376 .mat files.
Processing 10 of 5376 .mat files.
Processing 11 of 5376 .mat files.
Processing 12 of 5376 .mat files.
Processing 13 of 5376 .mat files.
Processing 14 of 5376 .mat files.
Processing 15 of 5376 .mat files.
Processing 16 of 5376 .mat files.
Processing 17 of 5376 .mat files.
Processing 18 of 5376 .mat files.
Processing 19 of 5376 .mat files.
Processing 20 of 5376 .mat files.
Processing 21 of 5376 .mat files.
Processing 22 of 5376 .mat files.
Processing 23 of 5376 .mat files.
Processing 24 of 5376 .mat files.
Processing 25 of 5376 .mat files.
Processing 26 of 5376 .mat files.
Processing 27 of 5376 .mat files.
Processing 28 of 5376 .mat files.
Processing 29 of 5376 .mat files.
Processing 30 of 5376 .

In [8]:
print("Number of Files with Landing Aircraft: {} out of {}.".format(len(dfs),len(downloaded_mat_files)))

Number of Files with Landing Aircraft: 7 out of 5376.


In [9]:
df_landing

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,CWPC,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST
0,44.891682,-93.241822,1.0,1006.643045,117.933373,115.007588,125.581422,3645.000000,0.014040,-0.003136,...,1782.000000,10.948235,-159.834434,-536.630359,-104.780302,44.886191,-93.228955,806.643045,200,1185.595409
1,44.888764,-93.234960,1.0,906.643045,115.530863,111.638458,125.322379,3645.000000,0.183690,0.003136,...,1795.861409,9.932823,-165.738392,-709.834437,-104.780302,44.886191,-93.228955,806.643045,100,553.902372
2,44.887734,-93.232386,1.0,856.643045,113.021779,110.749442,123.876531,3645.000000,0.133380,0.003332,...,1672.000000,6.212059,-163.252476,-823.996865,-104.780302,44.886191,-93.228955,806.643045,50,320.767080
3,44.887502,-93.231929,1.0,846.643045,112.537305,110.196048,124.172654,3645.000000,0.136028,0.002001,...,1718.580098,6.975467,-166.607074,-688.934215,-104.780302,44.886191,-93.228955,806.643045,40,276.478561
4,44.887351,-93.231586,1.0,836.643045,111.188473,109.765476,124.137817,3645.000000,0.133199,0.002040,...,1748.036175,6.994513,-160.733711,-773.697501,-104.780302,44.886191,-93.228955,806.643045,30,244.578379
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,44.886235,-93.230536,1.0,853.846249,117.818836,105.893662,118.910790,3645.371366,0.037429,-0.000711,...,1799.060325,12.607906,101.716209,-721.149992,-104.394851,44.885846,-93.229629,845.846249,8,83.675778
8,44.886018,-93.229997,1.0,851.846249,118.375135,105.495087,119.058555,3646.000000,0.039390,-0.000588,...,1767.000000,13.923377,104.576085,-699.578103,-104.394851,44.885846,-93.229629,845.846249,6,34.788936
9,44.886018,-93.229997,1.0,849.846249,117.607793,105.041968,119.227015,3646.000000,0.039390,-0.000588,...,1848.057368,14.655849,106.734810,-600.032488,-104.394851,44.885846,-93.229629,845.846249,4,34.788936
10,44.885930,-93.229808,1.0,847.846249,113.495485,104.123699,119.553710,3645.487424,0.302065,-0.000688,...,1788.660839,11.736493,107.132666,-391.609928,-104.394851,44.885846,-93.229629,845.846249,2,16.956967


In [10]:
df_landing.to_csv('processed_data_landing_at_msp.csv',index=False)

## Step 2: Feature Selection with Maximum Relevance Minimum Redundancy (MRMR)
The original 186 parameters are cut down to 26 based on literature review. Then the 26 parameters are further sorted based on Maximum relavance Minimum Redundancy (MRMR). 
Data is first smoothed, and then sliced based on selected heights. The average of each paramters is calculated.

### Step 2a: Group data by height with ```pandas.groupby```

In [11]:
grpby = df_landing.groupby(by='heights')
grpby.get_group(0)

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,CWPC,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST
11,44.886191,-93.228955,1.0,806.643045,103.976395,103.616195,123.378983,3645.0,0.27144,0.002744,...,1667.0,4.278736,-155.074287,-104.780302,-104.780302,44.886191,-93.228955,806.643045,0,0.0
11,44.885846,-93.226045,1.0,799.291242,117.35854,119.301601,123.410491,3645.0,-0.10959,0.00588,...,2236.0,6.403527,-64.065263,-177.267211,-177.267211,44.885846,-93.226045,799.291242,0,0.0
11,44.883786,-93.199606,1.0,822.106482,105.834217,95.831617,-57.981012,3645.0,-0.02184,0.011368,...,1734.0,8.281502,-54.447144,125.745074,125.745074,44.883786,-93.199606,822.106482,0,0.0
11,44.885333,-93.226382,1.0,793.020025,112.328895,108.958568,121.460417,3645.0,-0.24882,0.004704,...,2080.0,3.900689,147.137838,-270.814762,-270.814762,44.885333,-93.226382,793.020025,0,0.0
11,44.882756,-93.198228,1.0,778.709209,110.752766,104.562259,-57.987705,3645.0,0.5421,-0.0049,...,2210.0,6.96871,-33.283197,-32.735479,-32.735479,44.882756,-93.198228,778.709209,0,0.0
11,44.886704,-93.229629,1.0,807.352945,111.45682,118.017037,123.590224,3645.0,-0.26325,0.001568,...,2370.0,7.072058,-104.166199,40.15815,40.15815,44.886704,-93.229629,807.352945,0,0.0
11,44.885846,-93.229629,1.0,845.846249,113.884462,102.947668,119.889758,3645.0,0.55185,-0.000784,...,1617.0,11.175749,108.730358,-104.394851,-104.394851,44.885846,-93.229629,845.846249,0,0.0


In [12]:
grpby.get_group(200).heights

0    200
0    200
0    200
0    200
0    200
0    200
0    200
Name: heights, dtype: int64

### Step 2b: Compute parameter averages at each height

In [13]:
ave_by_height = pd.DataFrame(columns=df_landing.columns)
for i,g in enumerate(grpby):
    height = g[0]
    df = g[1]
    ave_by_height.loc[i,:] = df.mean()

In [14]:
ave_by_height

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,CWPC,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST
0,44.885209,-93.219782,1.0,807.567028,110.798871,107.604992,70.823022,3645.0,0.103127,0.00294,...,1987.714286,6.86871,-22.166842,-74.869912,-74.869912,44.885209,-93.219782,807.567028,0.0,0.0
1,44.885292,-93.220061,1.0,809.567028,115.779035,111.528436,70.888042,3644.826926,0.025663,0.003298,...,1899.885081,6.444132,-49.312812,-309.321345,-74.869912,44.885209,-93.219782,807.567028,2.0,115.374777
2,44.885309,-93.220072,1.0,811.567028,116.364073,111.890037,71.182145,3645.0,-0.068261,0.00327,...,1850.599264,7.374414,-62.798644,-373.288483,-74.869912,44.885209,-93.219782,807.567028,4.0,132.464885
3,44.885322,-93.220155,1.0,813.567028,117.017529,112.963685,71.076422,3644.857143,-0.09524,0.003569,...,1853.207643,7.399301,-64.779795,-359.425022,-74.869912,44.885209,-93.219782,807.567028,6.0,165.398468
4,44.885437,-93.220402,1.0,815.567028,117.763875,113.717152,71.334015,3644.919432,0.00638,0.003621,...,1935.018441,7.876124,-75.843871,-434.334907,-74.869912,44.885209,-93.219782,807.567028,8.0,194.703497
5,44.885468,-93.220483,1.0,817.567028,117.734275,114.115894,71.538738,3644.934614,-0.013318,0.003597,...,1979.017241,8.339849,-75.876257,-454.53002,-74.869912,44.885209,-93.219782,807.567028,10.0,201.892831
6,44.885777,-93.221147,1.0,827.567028,121.211369,116.157141,72.158272,3644.900607,0.071894,0.003677,...,1896.67887,9.59141,-85.153115,-696.856882,-74.869912,44.885209,-93.219782,807.567028,20.0,292.521859
7,44.885816,-93.221287,1.0,837.567028,122.092625,117.010899,72.145026,3644.857143,0.042707,0.003007,...,1743.389546,10.026518,-89.830353,-754.293095,-74.869912,44.885209,-93.219782,807.567028,30.0,355.04954
8,44.885903,-93.221437,1.0,847.567028,123.812086,117.619566,72.506403,3644.969328,0.014328,0.004703,...,1817.595641,11.163714,-78.886639,-623.181355,-74.869912,44.885209,-93.219782,807.567028,40.0,397.286105
9,44.885986,-93.22175,1.0,857.567028,124.286458,118.233039,72.805718,3645.148804,0.06365,0.004078,...,1909.510095,12.206503,-86.356691,-817.612287,-74.869912,44.885209,-93.219782,807.567028,50.0,450.705803


### Step 2c: Compute MRMR with Spearman Correlation Relative to Veritical Velocity ```ALTR```


In [15]:
sele_key_list=['CCPC','CTAC','PTCH','ELEV_1','BLAC','N1_1','GS','TAS','GLS','WS','ROLL',\
               'FPAC','WD','LONP','TH','LATP','DIST','AIL_1','LOC','TRK','BAL1','RUDD','FLAP','ALTR']
ave_by_height = ave_by_height[sele_key_list]

In [16]:
ave_by_height

Unnamed: 0,CCPC,CTAC,PTCH,ELEV_1,BLAC,N1_1,GS,TAS,GLS,WS,...,TH,LATP,DIST,AIL_1,LOC,TRK,BAL1,RUDD,FLAP,ALTR
0,1970.428571,0.004243,2.281482,-2.050741,-0.047516,33.787262,107.604992,110.798871,0.103127,6.86871,...,70.823022,44.885209,0.0,82.921213,0.00294,69.534167,807.567028,-27.965181,3645.0,-74.869912
1,2140.396325,-0.00159,1.567258,-3.57327,-0.059653,38.930895,111.528436,115.779035,0.025663,6.444132,...,70.888042,44.885292,115.374777,82.593554,0.003298,69.618386,809.567028,-28.225854,3644.826926,-309.321345
2,2065.091959,0.003548,1.440699,-3.91238,-0.056496,40.581005,111.890037,116.364073,-0.068261,7.374414,...,71.182145,44.885309,132.464885,83.693023,0.00327,69.566094,811.567028,-27.917843,3645.0,-373.288483
3,2140.421098,0.002564,1.155466,-4.354561,-0.059085,41.915689,112.963685,117.017529,-0.09524,7.399301,...,71.076422,44.885322,165.398468,81.019443,0.003569,69.552471,813.567028,-28.293366,3644.857143,-359.425022
4,2173.052757,-0.003341,1.1584,-5.083556,-0.060386,43.067311,113.717152,117.763875,0.00638,7.876124,...,71.334015,44.885437,194.703497,84.328221,0.003621,69.588316,815.567028,-26.773494,3644.919432,-434.334907
5,2222.241845,-0.008844,0.959857,-5.125546,-0.060494,44.274136,114.115894,117.734275,-0.013318,8.339849,...,71.538738,44.885468,201.892831,83.692836,0.003597,69.609823,817.567028,-28.181664,3644.934614,-454.53002
6,2256.59739,-0.005391,-0.004611,-6.251553,-0.058589,50.524628,116.157141,121.211369,0.071894,9.59141,...,72.158272,44.885777,292.521859,86.82582,0.003677,69.751541,827.567028,-28.353094,3644.900607,-696.856882
7,2288.497079,0.008084,-0.743175,-7.086001,-0.060726,53.577513,117.010899,122.092625,0.042707,10.026518,...,72.145026,44.885816,355.04954,85.816579,0.003007,69.710377,837.567028,-28.087045,3644.857143,-754.293095
8,2229.638238,0.005031,-1.303745,-6.837614,-0.066619,54.10918,117.619566,123.812086,0.014328,11.163714,...,72.506403,44.885903,397.286105,83.845617,0.004703,69.69648,847.567028,-27.735314,3644.969328,-623.181355
9,2315.506488,-0.017145,-2.127818,-7.850059,-0.074512,54.170875,118.233039,124.286458,0.06365,12.206503,...,72.805718,44.885986,450.705803,84.027791,0.004078,69.760014,857.567028,-27.597884,3645.148804,-817.612287


In [17]:
from scipy import stats
def order_features_mrmr(df,output='ALTR',redundancy_weight=0.5):
    selected_order = []
    
    R = np.abs(stats.spearmanr(df.values,axis=0)[0])
    output_idx = df.columns.get_loc(output)
    input_idx = [i for i in range(df.shape[1]) if i != output_idx]
    
    corr_with_output = R[input_idx,output_idx]
    corr_with_inputs = R[[[i] for i in input_idx],input_idx]
    
    idx_best = np.argmax(corr_with_output)
    selected_order.append(idx_best)
    input_idx.remove(idx_best)
    while len(input_idx)!=0:
        with_output = corr_with_output[input_idx]
        with_others = np.array([np.mean(corr_with_inputs[idx,input_idx]) for idx in input_idx])
            
        mrmr_values = with_output-with_others*redundancy_weight
        mrmr_max_idx = np.argmax(mrmr_values)
        best_index=input_idx[mrmr_max_idx]
        
        input_idx.remove(best_index)
        selected_order.append(best_index)
    ordered_features = df.columns[selected_order]
    return ordered_features

In [18]:
min_redundancy_weight = [0,.25,.5,.75,1] #contribution of minimum redundancy
df_features = pd.DataFrame(np.zeros([len(sele_key_list)-1,len(min_redundancy_weight)]),columns=min_redundancy_weight)
for i,weight in enumerate(min_redundancy_weight):
    ordered_features = order_features_mrmr(ave_by_height,redundancy_weight=weight)
    df_features.iloc[:,i]=ordered_features

In [19]:
df_insert = pd.DataFrame({key:'ALTR' for key in df_features.columns},index=[0])
df_features = pd.concat([df_insert,df_features]).reset_index(drop = True)
df_features

Unnamed: 0,0.00,0.25,0.50,0.75,1.00
0,ALTR,ALTR,ALTR,ALTR,ALTR
1,CCPC,CCPC,CCPC,CCPC,CCPC
2,ELEV_1,WD,WD,WD,TRK
3,WD,ELEV_1,ELEV_1,TRK,WD
4,WS,ROLL,AIL_1,AIL_1,AIL_1
5,ROLL,WS,TRK,ELEV_1,ELEV_1
6,N1_1,AIL_1,ROLL,ROLL,ROLL
7,GS,N1_1,WS,WS,WS
8,LONP,GS,N1_1,N1_1,N1_1
9,LATP,LONP,GS,GS,GS


In [20]:
df_features.to_csv('ordered_features.csv',index=False)