<h2><center>Richter's Predictor: Modeling Earthquake Damage</center></h2>
<img style="height:300px" src="earthquake.jpg">
<h4><center>Contributors: Naman Bhargava, Neeraj Alavala, Vidhur Kumar, Jordan Rodrigues</center><h4>

# Introduction

Within the United States alone, earthquakes destroy nearly $4.4B of economic value yearly. 

Our team will be delving into the 2015 Nepal Earthquake Open Data Portal. This data, which was collected using mobile devices following the Gorkha Earthquake of April 2015, details the level of destruction brought upon more than 200,000 buildings in the area. By utilizing various features reported by Nepali citizens (such as building size, purpose, and construction material), we will construct a machine learning classifier capable of determining the extent to which a building would be damaged in a future earthquake. 

This project, combined with technologies such as the CV-based city scanners following earthquakes (Ji, 2018) will ultimately provide a better understanding of susceptibility to earthquake-induced damage, valuable information that can be leveraged by city planners.
<hr>

# Data Exploration

<center>Our initial approach was to scan for data imbalance, search for NaN values, and visualize the distributions of various features as well as the degree to which they correlation with the value we were trying to predict (damage grade).</center>
<br>
<center><img style="height:500px" src="Correlation.png"></center>

<h5>Upon initial visualization, it becomes clear that this isn't a very highly predictive dataset. At most, the correlation of any given variable is +/- 0.2. In other words, it will likely be difficult to accurately classify the damage grades</h5>

<h5> In the initial data, we were also given a few categorical variables. These included land condition, roof type, floor type, legal status, and more. We used one-hot encoding to allow for models to be trained on this non-integer data. Fortunately, this data did not have any null values, so imputation was not necessary. </h5> 

<h5>In terms of the data balance, the data is roughly imbalanced towards damage grade 2. However, it is not severe enough to have to add any special sampling techniques for the minority classes (in this case, damage 1). </h5>
<img style="height:300px" src="piechart_lul.png">
<hr>

# Methods

The data has been collected for us, but we plan to spend a significant amount of time preparing (cleaning, encoding, etc) the data for the model. We will also perform meaningful visualizations to better understand the relationships between our features. We will run a variety of classifiers (as listed below) to best identify damage caused by earthquakes. The hyperparameters will be tuned using the GridSearchCV process. The model performances will be compared  by micro averaged F1 score, which will balance precision and recall modified to gauge accuracy for classification into 3 categories.

#### Classifiers
 - Multiple Logistic regression
 - Support vector regression
 - Kmeans
 - DBSCAN
 - PCA
 - LDA
 - Cross validation
 - Hybrid Neural Network
 - Decision Tree
  https://www.datasciencecentral.com/profiles/blogs/decision-tree-vs-random-forest-vs-boosted-trees-explained
 - Random Forest Decision Tree
 - XGBoost Decision Tree
 (add plots under section option)

# Expected Results

Given the relatively low correlation between the features and the variable we're trying to predict (damage_grade) as well as the fact that the current leaderboard accuracy for this competition is 0.75 accuracy, our group would hope to achieve around ~0.7 accuracy. We assume that an ensemble will produce the highest quality results while KMeans/DBSCAN while produce the lowest quality results as the data likely is not centric / low-dimensional.

# Results

## PCA

PCA was initially conducted to get a visual feel for how seperable the data was if we only focused on low-dimensionality. Min-max normalization was confucted on the data prior to running PCA. Unfortunately, in a two and three dimensional space, there were no clear separating bounds that could be visualized. Given that our data was ~38 dimensions, this meant that dimensionality reduction failed to be an effective technique

<center><img style="height:600px" src="2DPca.png"><img style="height:600px" src="3DPCA.png"></center>

## LDA

Next, we attempted to use LDA. We hoped that using a supervised linear transformation technique would produce greater separability in the data. While this did perform better than the PCA, it only did marginally better. It is also important to note that while the layers do appeared to be "stacked" on top of each other, this is not the result of any 3D Separability -- matplotlib just places new points on top of old ones, and points were graphed one class at a time.

<center><img style="height:600px;width:500px" src="LDA.png"></center>

# Discussion

This project can benefit architects, engineers, and city planners by using the classification model to extrapolate and predict types of buildings that are likely to suffer from earthquake damage. Buildings with attributes similar to those that were more damaged can be reinforced. Both the visualization and classification models can be used in conjunction with earthquake prediction research (Rouet-Leduc, 2017) to provide advance humanitarian aid so buildings can be reinforced to take significantly less damage.

Future plans, etc.

# References

Asim, K. M., Idris, A., Iqbal, T., & Martínez-Álvarez, F. (2018). Earthquake prediction model using support vector regressor and hybrid neural networks. Plos One, 13(7). doi: 10.1371/journal.pone.0199004

Rouet‐Leduc, B.,  Hulbert, C.,  Lubbers, N.,  Barros, K.,  Humphreys, C. J., &  Johnson, P. A. ( 2017).  Machine learning predicts laboratory earthquakes. Geophysical Research Letters,  44,  9276– 9282. https://doi.org/10.1002/2017GL074677 

Ji, M., Liu, L., & Buchroithner, M. (2018). Identifying Collapsed Buildings Using Post-Earthquake Satellite Imagery and Convolutional Neural Networks: A Case Study of the 2010 Haiti Earthquake. Remote Sensing, 10(11), 1689. https://doi.org/10.3390/rs10111689

DrivenData. (n.d.). Richter's Predictor: Modeling Earthquake Damage. Retrieved September 28, 2019, from https://www.drivendata.org/competitions/57/nepal-earthquake/page/136/