# Bayesian Neural Networks to Predict Hard Landing with DASHlink Data
Authors: Dr. Yingxiao Kong, Vanderbilt University

Email: yingxiao.kong@vanderbit.edu

## Overview of Research
In this work, we use an open-source dataset - [NASA's DASHlink data](https://c3.ndc.nasa.gov/dashlink/) - to isolate data for landing aircraft that both have hard landing and normal landing occurrences. The objective is to use [this sample data](https://c3.ndc.nasa.gov/dashlink/projects/85/resources/?type=ds) to train a Bayesian Neural Network model to predict touchdown vertical speed for a landing aircraft with the intent to use as a screening for identifying hard landing events before they occur.

This series of Jupyter notebook demonstrations into 3 modules. The presented module is in **bold**:
- Module 1 - Download DASHlink Data
- **Module 2 - DASHlink Data Pre-Processing and Feature Selection with Maximum Relevance and Minimum Reduandancy (mRMR)**
- Module 3 - Bayesian Neural Network Model Training

## Module 2: DASHlink Data Pre-Processing and Parameter Selection
This is a demonstration of how to filter, standardize, and clean the DASHlink data downloaded and viewed in Module 1. In addition, we select the features most relevant to hard landing event.


## Installing the required Python packages

The required Python packages for this module are:
- ***```pandas```***
- ***```numpy```***
- ***```scipy```***
- ***```matplotlib```***

In [1]:
import numpy as np
import pandas as pd

## Step 1: Isolate Data Landing at MSP

### Step 1a: Get all downloaded ```.mat``` files

In [2]:
import glob
OUTPUT_DIRECTORY = r'../../../dashlink-data'
downloaded_mat_files = glob.glob(OUTPUT_DIRECTORY+'/**/*.mat',recursive=True)
downloaded_mat_files

['../../../dashlink-data/Tail_687_8/687200312261442.mat',
 '../../../dashlink-data/Tail_687_8/687200403210418.mat',
 '../../../dashlink-data/Tail_687_8/687200312300437.mat',
 '../../../dashlink-data/Tail_687_8/687200309090752.mat',
 '../../../dashlink-data/Tail_687_8/687200403162255.mat',
 '../../../dashlink-data/Tail_687_8/687200403191335.mat',
 '../../../dashlink-data/Tail_687_8/687200311301726.mat',
 '../../../dashlink-data/Tail_687_8/687200402021225.mat',
 '../../../dashlink-data/Tail_687_8/687200312241558.mat',
 '../../../dashlink-data/Tail_687_8/687200401281006.mat',
 '../../../dashlink-data/Tail_687_8/687200312180425.mat',
 '../../../dashlink-data/Tail_687_8/687200402081856.mat',
 '../../../dashlink-data/Tail_687_8/687200403211154.mat',
 '../../../dashlink-data/Tail_687_8/687200403130700.mat',
 '../../../dashlink-data/Tail_687_8/687200310291853.mat',
 '../../../dashlink-data/Tail_687_8/687200402010708.mat',
 '../../../dashlink-data/Tail_687_8/687200309111920.mat',
 '../../../das

### Step 1b: Isolate landing data at MSP at specified heights
According to the DASHlink website, the ```PH``` enumerated codes are: 
- 0=Unknown
- 1=Preflight
- 2=Taxi
- 3=Takeoff
- 4=Climb
- 5=Cruise
- 6=Approach
- 7=Rollout

Here we are interested in ```PH = 7``` at heights of 200,100,50,40,30,20,10,8,6,4,2, and 0 feet above landing altitude. 

In [3]:
PHASE_NO = 7
MSP_AIRPORT_LAT_LON = [44.88526995556498, -93.2015923365669]
HEIGHT_LIST = np.array([200,100,50,40,30,20,10,8,6,4,2,0])

In [4]:
from dataUtils import DASHlinkData
from tqdm import tqdm # conda install -c conda-forge tqdm
import os

key_list_25=['LATP','LONP','MSQT_1','BAL1','TAS','GS','TH','FLAP','GLS','LOC','N1_1','PTCH','ROLL','TRK','AIL_1','RUDD','ELEV_1',\
         'BLAC','CTAC','FPAC','CCPC','CWPC','WS','WD','ALTR']

SUBSET_FOR_DEMO = 5376
dfs = []
for i,mat_file in enumerate(tqdm(downloaded_mat_files[:SUBSET_FOR_DEMO])):
    dl_data = DASHlinkData(mat_file)
    try:
        if dl_data.contains_phase_no(phase=PHASE_NO):
            if dl_data.lands_at_airport(airport_lat_lon=MSP_AIRPORT_LAT_LON):
                for key in key_list_25:
                    dl_data.temporal_resample_to_4_seconds(key)
                df_new = dl_data.get_data_at_heights_in_ft(HEIGHT_LIST)
                df_new['filepath'] = mat_file
                dfs.append(df_new)
    except Exception as e:
        pass

df_landing = pd.concat(dfs)

100%|██████████| 5376/5376 [20:04<00:00,  4.46it/s]


In [5]:
print("Number of Files with Landing Aircraft: {} out of {}.".format(len(dfs),len(downloaded_mat_files)))

Number of Files with Landing Aircraft: 1047 out of 5376.


In [6]:
df_landing = pd.concat(dfs)
df_landing

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST,filepath
0,44.891682,-93.241822,1.0,1006.643045,117.933373,115.007588,125.581422,3645.000000,0.014040,-0.003136,...,10.948235,-159.834434,-536.630359,-104.780302,44.886191,-93.228955,806.643045,200,1185.595409,../../../dashlink-data/Tail_687_8/687200312300...
1,44.888764,-93.234960,1.0,906.643045,115.530863,111.638458,125.322379,3645.000000,0.183690,0.003136,...,9.932823,-165.738392,-709.834437,-104.780302,44.886191,-93.228955,806.643045,100,553.902372,../../../dashlink-data/Tail_687_8/687200312300...
2,44.887734,-93.232386,1.0,856.643045,113.021779,110.749442,123.876531,3645.000000,0.133380,0.003332,...,6.212059,-163.252476,-823.996865,-104.780302,44.886191,-93.228955,806.643045,50,320.767080,../../../dashlink-data/Tail_687_8/687200312300...
3,44.887502,-93.231929,1.0,846.643045,112.537305,110.196048,124.172654,3645.000000,0.136028,0.002001,...,6.975467,-166.607074,-688.934215,-104.780302,44.886191,-93.228955,806.643045,40,276.478561,../../../dashlink-data/Tail_687_8/687200312300...
4,44.887351,-93.231586,1.0,836.643045,111.188473,109.765476,124.137817,3645.000000,0.133199,0.002040,...,6.994513,-160.733711,-773.697501,-104.780302,44.886191,-93.228955,806.643045,30,244.578379,../../../dashlink-data/Tail_687_8/687200312300...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,44.887365,-93.231276,1.0,801.660295,119.045969,122.383108,121.647424,3644.920019,0.080462,0.000180,...,3.032836,-91.452123,-512.653856,-273.900805,44.886876,-93.229997,793.660295,8,114.775389,../../../dashlink-data/Tail_687_9/687200404201...
8,44.887048,-93.230487,1.0,799.660295,118.861990,121.907043,121.667286,3644.000000,0.225420,0.004508,...,2.988509,-93.546465,-519.707422,-273.900805,44.886876,-93.229997,793.660295,6,43.200668,../../../dashlink-data/Tail_687_9/687200404201...
9,44.887048,-93.230487,1.0,797.660295,118.695668,121.564014,121.704144,3644.000000,0.225420,0.004508,...,2.964133,-96.950775,-478.709966,-273.900805,44.886876,-93.229997,793.660295,4,43.200668,../../../dashlink-data/Tail_687_9/687200404201...
10,44.887048,-93.230487,1.0,795.660295,118.034660,121.214700,121.752453,3644.000000,0.225420,0.004508,...,2.938901,-100.353347,-444.224532,-273.900805,44.886876,-93.229997,793.660295,2,43.200668,../../../dashlink-data/Tail_687_9/687200404201...


In [7]:
df_landing.to_csv('processed_data_landing_at_msp.csv',index=False)

## Step 2: Feature Selection with Maximum Relevance Minimum Redundancy (MRMR)
The original 186 parameters are cut down to 26 based on literature review. Then the 26 parameters are further sorted based on Maximum relavance Minimum Redundancy (MRMR). 
Data is first smoothed, and then sliced based on selected heights. The average of each paramters is calculated.

### Step 2a: Group data by height with ```pandas.groupby```

In [8]:
grpby = df_landing.groupby(by='heights')
grpby.get_group(0)

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST,filepath
11,44.886191,-93.228955,1.0,806.643045,103.976395,103.616195,123.378983,3645.0,0.27144,0.002744,...,4.278736,-155.074287,-104.780302,-104.780302,44.886191,-93.228955,806.643045,0,0.0,../../../dashlink-data/Tail_687_8/687200312300...
11,44.885846,-93.226045,1.0,799.291242,117.358540,119.301601,123.410491,3645.0,-0.10959,0.005880,...,6.403527,-64.065263,-177.267211,-177.267211,44.885846,-93.226045,799.291242,0,0.0,../../../dashlink-data/Tail_687_8/687200403191...
11,44.883786,-93.199606,1.0,822.106482,105.834217,95.831617,-57.981012,3645.0,-0.02184,0.011368,...,8.281502,-54.447144,125.745074,125.745074,44.883786,-93.199606,822.106482,0,0.0,../../../dashlink-data/Tail_687_8/687200312231...
11,44.885333,-93.226382,1.0,793.020025,112.328895,108.958568,121.460417,3645.0,-0.24882,0.004704,...,3.900689,147.137838,-270.814762,-270.814762,44.885333,-93.226382,793.020025,0,0.0,../../../dashlink-data/Tail_687_8/687200403071...
11,44.882756,-93.198228,1.0,778.709209,110.752766,104.562259,-57.987705,3645.0,0.54210,-0.004900,...,6.968710,-33.283197,-32.735479,-32.735479,44.882756,-93.198228,778.709209,0,0.0,../../../dashlink-data/Tail_687_8/687200401250...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11,44.882587,-93.198381,1.0,789.102402,110.857279,108.672400,-61.429886,3688.0,0.00000,-0.002156,...,5.082876,-140.733781,-59.922984,-59.922984,44.882587,-93.198381,789.102402,0,0.0,../../../dashlink-data/Tail_687_9/687201002271...
11,44.882756,-93.197186,1.0,801.991366,106.920618,100.670515,-59.594299,3645.0,0.33891,-0.005880,...,6.966231,-72.646176,-42.890425,-42.890425,44.882756,-93.197186,801.991366,0,0.0,../../../dashlink-data/Tail_687_9/687200406261...
11,44.882756,-93.196175,1.0,796.115203,107.899646,107.572900,-58.426334,3645.0,0.50973,-0.007448,...,1.529303,-93.841777,-87.431957,-87.431957,44.882756,-93.196175,796.115203,0,0.0,../../../dashlink-data/Tail_687_9/687200406290...
11,44.890483,-93.215230,1.0,809.288373,75.480865,100.145914,122.675024,3645.0,-0.19851,0.000196,...,3.007192,-79.759240,-72.088302,-72.088302,44.890483,-93.215230,809.288373,0,0.0,../../../dashlink-data/Tail_687_9/687200407261...


In [9]:
grpby.get_group(200).heights

0    200
0    200
0    200
0    200
0    200
    ... 
0    200
0    200
0    200
0    200
0    200
Name: heights, Length: 1047, dtype: int64

### Step 2b: Compute parameter averages at each height

In [10]:
ave_by_height = pd.DataFrame(columns=df_landing.columns)
for i,g in enumerate(grpby):
    height = g[0]
    df = g[1]
    ave_by_height.loc[i,:] = df.mean()

  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()
  ave_by_height.loc[i,:] = df.mean()


In [11]:
ave_by_height

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST,filepath
0,44.884138,-93.210025,0.99618,803.505965,103.634063,105.792649,25.516873,3659.548233,-0.01321,0.000649,...,5.827737,7.941532,-133.301064,-133.301064,44.884138,-93.210025,803.505965,0.0,0.0,
1,44.884043,-93.209836,0.998515,805.505965,109.95573,108.411837,25.551215,3659.524497,0.008009,0.000867,...,5.992653,4.88286,-285.991069,-133.301064,44.884138,-93.210025,803.505965,2.0,77.406844,
2,44.884031,-93.209815,0.999805,807.505965,111.591412,109.454858,25.556126,3659.535463,0.005539,0.000829,...,6.112013,3.619471,-353.788072,-133.301064,44.884138,-93.210025,803.505965,4.0,112.625779,
3,44.884017,-93.209797,1.0,809.505965,112.777486,110.245807,25.592597,3659.534177,0.000494,0.000784,...,6.258673,4.349413,-413.251065,-133.301064,44.884138,-93.210025,803.505965,6.0,140.454476,
4,44.884013,-93.209792,1.0,811.505965,113.831346,110.892648,25.615992,3659.534046,-0.024793,0.000894,...,6.445805,4.056027,-464.152202,-133.301064,44.884138,-93.210025,803.505965,8.0,163.941846,
5,44.884002,-93.209779,1.0,813.505965,114.502494,111.392057,25.62373,3659.543995,-0.036103,0.000949,...,6.619099,1.795087,-497.733009,-133.301064,44.884138,-93.210025,803.505965,10.0,183.103908,
6,44.883983,-93.209756,1.0,823.505965,117.092829,113.040736,25.664367,3659.54026,-0.077999,0.000706,...,7.211016,2.728487,-603.813134,-133.301064,44.884138,-93.210025,803.505965,20.0,257.764422,
7,44.883947,-93.209701,1.0,833.505965,118.811399,114.064172,25.74026,3659.526937,-0.075537,0.000654,...,7.70343,2.387201,-661.9071,-133.301064,44.884138,-93.210025,803.505965,30.0,319.50128,
8,44.88392,-93.209652,1.0,843.505965,119.778736,114.774955,25.774991,3659.536634,-0.046213,0.000478,...,7.94642,0.881228,-667.464383,-133.301064,44.884138,-93.210025,803.505965,40.0,374.074945,
9,44.883896,-93.209619,1.0,853.505965,120.653442,115.367628,25.794586,3659.531975,-0.018925,0.000514,...,8.174262,-0.546913,-679.22914,-133.301064,44.884138,-93.210025,803.505965,50.0,428.866312,


### Step 2c: Compute MRMR with Spearman Correlation Relative to Veritical Velocity ```ALTR```


In [12]:
sele_key_list=['CCPC','CTAC','PTCH','ELEV_1','BLAC','N1_1','GS','TAS','GLS','WS','ROLL',\
               'FPAC','WD','LONP','TH','LATP','DIST','AIL_1','LOC','TRK','BAL1','RUDD','FLAP','ALTR']
ave_by_height = ave_by_height[sele_key_list]

In [13]:
ave_by_height

Unnamed: 0,CCPC,CTAC,PTCH,ELEV_1,BLAC,N1_1,GS,TAS,GLS,WS,...,TH,LATP,DIST,AIL_1,LOC,TRK,BAL1,RUDD,FLAP,ALTR
0,1830.453677,0.001307,2.466353,-0.149562,-0.040573,34.472916,105.792649,103.634063,-0.01321,5.827737,...,25.516873,44.884138,0.0,84.686804,0.000649,25.413313,803.505965,2.594683,3659.548233,-133.301064
1,1901.361659,-0.000347,2.012524,-0.982819,-0.049308,37.182835,108.411837,109.95573,0.008009,5.992653,...,25.551215,44.884043,77.406844,84.587999,0.000867,25.403212,805.505965,2.3735,3659.524497,-285.991069
2,1951.419787,-0.000196,1.780898,-1.46037,-0.052142,38.774293,109.454858,111.591412,0.005539,6.112013,...,25.556126,44.884031,112.625779,84.624489,0.000829,25.395244,807.505965,2.334181,3659.535463,-353.788072
3,1981.926164,-0.001285,1.53322,-1.983872,-0.054104,40.362749,110.245807,112.777486,0.000494,6.258673,...,25.592597,44.884017,140.454476,84.583892,0.000784,25.397615,809.505965,2.286638,3659.534177,-413.251065
4,2009.515247,-0.001486,1.292066,-2.397238,-0.055753,41.865004,110.892648,113.831346,-0.024793,6.445805,...,25.615992,44.884013,163.941846,84.646016,0.000894,25.402212,811.505965,2.26882,3659.534046,-464.152202
5,2045.743032,-0.001598,1.091937,-2.63162,-0.056666,43.163824,111.392057,114.502494,-0.036103,6.619099,...,25.62373,44.884002,183.103908,84.632787,0.000949,25.403419,813.505965,2.342698,3659.543995,-497.733009
6,2136.165468,-0.001374,0.135548,-3.904227,-0.059072,48.064439,113.040736,117.092829,-0.077999,7.211016,...,25.664367,44.883983,257.764422,84.49039,0.000706,25.407619,823.505965,2.588198,3659.54026,-603.813134
7,2199.125907,-0.000794,-0.662727,-4.786796,-0.059939,51.254946,114.064172,118.811399,-0.075537,7.70343,...,25.74026,44.883947,319.50128,84.826446,0.000654,25.414948,833.505965,2.5561,3659.526937,-661.9071
8,2233.370556,-0.000116,-1.250887,-5.327039,-0.060419,52.958434,114.774955,119.778736,-0.046213,7.94642,...,25.774991,44.88392,374.074945,84.938107,0.000478,25.407742,843.505965,2.566811,3659.536634,-667.464383
9,2260.941803,-0.000152,-1.685621,-5.625928,-0.06134,53.995256,115.367628,120.653442,-0.018925,8.174262,...,25.794586,44.883896,428.866312,84.609901,0.000514,25.402929,853.505965,2.726534,3659.531975,-679.22914


In [14]:
from scipy import stats
def order_features_mrmr(df,output='ALTR',redundancy_weight=0.5):
    selected_order = []
    
    R = np.abs(stats.spearmanr(df.values,axis=0)[0])
    output_idx = df.columns.get_loc(output)
    input_idx = [i for i in range(df.shape[1]) if i != output_idx]
    
    corr_with_output = R[input_idx,output_idx]
    corr_with_inputs = R[[[i] for i in input_idx],input_idx]
    
    idx_best = np.argmax(corr_with_output)
    selected_order.append(idx_best)
    input_idx.remove(idx_best)
    while len(input_idx)!=0:
        with_output = corr_with_output[input_idx]
        with_others = np.array([np.mean(corr_with_inputs[idx,input_idx]) for idx in input_idx])
            
        mrmr_values = with_output-with_others*redundancy_weight
        mrmr_max_idx = np.argmax(mrmr_values)
        best_index=input_idx[mrmr_max_idx]
        
        input_idx.remove(best_index)
        selected_order.append(best_index)
    ordered_features = df.columns[selected_order]
    return ordered_features

In [15]:
min_redundancy_weight = [0,.25,.5,.75,1] #contribution of minimum redundancy
df_features = pd.DataFrame(np.zeros([len(sele_key_list)-1,len(min_redundancy_weight)]),columns=min_redundancy_weight)
for i,weight in enumerate(min_redundancy_weight):
    ordered_features = order_features_mrmr(ave_by_height,redundancy_weight=weight)
    df_features.iloc[:,i]=ordered_features

In [16]:
df_insert = pd.DataFrame({key:'ALTR' for key in df_features.columns},index=[0])
df_features = pd.concat([df_insert,df_features]).reset_index(drop = True)
df_features

Unnamed: 0,0.00,0.25,0.50,0.75,1.00
0,ALTR,ALTR,ALTR,ALTR,ALTR
1,CCPC,CCPC,CCPC,CCPC,CCPC
2,ELEV_1,ELEV_1,BLAC,BLAC,BLAC
3,PTCH,PTCH,ELEV_1,ELEV_1,GLS
4,N1_1,N1_1,PTCH,PTCH,ELEV_1
5,GS,GS,N1_1,N1_1,PTCH
6,TAS,TAS,GS,GS,N1_1
7,WS,WS,TAS,TAS,GS
8,LONP,LONP,WS,WS,TAS
9,TH,TH,LONP,LONP,WS


In [17]:
df_features.to_csv('ordered_features.csv',index=False)