# GeoHackWeek2018 Project Presentation: #coastaledges

##### Machine Learning in the Context of Coastal Habitat Classification

Allison Bailey (Project Lead) - Private Consultant at SoundGIS, WA

Jonathan Batchelor - PhD Student, at UW Forestry, WA

Dr. F. Patricia Medina - Post Doc, Worcester Polytechnic Institue, MA

Miya Pavlock McAuliffe - Master's Student, Moss Landing Marine Labs - Geological Oceanography, CA

Dmitra Salmanidou, Post Doc, University College, London, UK

Wenwei Xu, Research Scientist, Pacific Northwest National Lab, WA

![IMG_6305.JPG](attachment:IMG_6305.JPG)

# Problem

Explore and compare a variety of machine learning approaches for land cover classification in the coastal realm.

![SubstrateClasses_example.png](figures/SubstrateClasses_example.png)

# Data

Data are digital color-infrared aerial photography and LiDAR digital elevation models for a section of the Oregon coast.

NOAA 2017 (CIR and RGB) imagery, 0.5m resolution
- CIR (color infrared)
![CIR_image.png](figures/CIR_image.png)

- RGB (red, green, blue)
![RGB_image.png](figures/RGB_image.png)

LiDAR DEMs, 1m resolution
- DEM (digital elevation model) hillshade
![DEM_hillshade.png](figures/DEM_hillshade.png)

# Methods

## Part 1: Google Earth Engine
![GEE_scheme.JPG](figures/GEE_scheme.JPG)

### a) Supervised Learning 

#### CART, SVM, Random Forest with points
##### Jonathan: https://code.earthengine.google.com/?accept_repo=users/jbatchelor78/project

Note: GEE automatically tiles/mosaics when you import, but also does that when you export - therefore GDAL was necessary to stitch the tiles together.

Results: 
-lack of DEM data resulted in wacky classification
-random forest appears to do the best

#### CART, Random Forest, and SVM - with polygons
##### Dmitra: https://code.earthengine.google.com/?accept_repo=users/disalmani/coastaledges
Method: created polygons to train data and tested with the same polygons
Results: 
-cart classification is more noisy than others
-random forest appears to be the best but confuses some trees with marine vegetation
-SVM confuses unconsolidated with rock

##### Generating Training Points (Allison)
![TrainingPts.png](figures/TrainingPts.png)

### b) Unsupervised Learning (Wenwei)
- K-means (clustering)
wx_unsupervised_clustering!
https://code.earthengine.google.com/c9d954a50e981a8d4c7bb59cc4c0d9df

![K-MEANS Clustering results](figures/wx_clusters.png)

- 50 generated clusters that were output to python as polygons for post-processing
- massaged data into dataframe
- used labels to inform what clustering algorithm found 
- training/testing data was split 80% to 20% respectively
- derived confidence levels for each of the 50 clusters
![confidence score for the 50 segment classes](figures/confidence_score.png)
- ran on entire dataset to produce confusion matrix for testing dataset and entire dataset


#### Results:
![Confusion metrics](figures/wx_confusion metrix all data.png)
- rock and marine vegetation get confused, but overall high accuracy for each class.
- F1 score for testing set is .729
- F1 score for entire dataset is .711





## Part 2: Python
![jupyter.png](attachment:jupyter.png)

### Data preprocessing (Miya)
- combined all data into a 5 band/layer tif
- 1 = red, 2 = green, 3 = blue, 4 = nir, 5 = dem

![5%20band%20breakdown.png](attachment:5%20band%20breakdown.png)

- this resulted in MEGA file....so:
    - collectively through out DEM (already did in prior GEE examples)
    - ran gdal_translate in GNU Parallel for each of 4 bands to convert to ASCII files
    - removed invalid pixels (NaNs) in attempt to compress, file still huge
    - resulting file to pass on contained [x,y,red,green,blue,nir].txt

#### a) Supervised Learning

##### Patricia: open Jupyter notebook named ("GeoHackWeek Coastal Edges.ipynb")
- Neural Network Classification

![Heading.PNG](figures/Heading.PNG)


![balanced_classes.png](figures/balanced_classes.png)


![F1_score.PNG](figures/F1_score.PNG)

![KNN_class.PNG](figures/KNN.PNG)


![KNN_confusion.png](figures/KNN_confusion.png)

![ffNN_f1.PNG](figures/ff_NN.PNG)

![RF_confusion.png](figures/RF_confusion.png)

![RF_f1.PNG](figures/RF_f1.PNG)

![ff_NN_02.PNG](figures/ff_NN_02.PNG)

![ffNN_confusion.png](figures/ffNN_confusion.png)

![ffNN_f1.PNG](figures/ffNN_f1.PNG)

#### b) Unsupervised Learning: Clustering

##### ran out of time!

# Discussion

## Google Earth Engine vs. Python Machine Learning 
### GEE pros:
-     Runs fast!
-     Easy to use packages to quickly prototype. 

### GEE cons:
-     Cannot see under the hood/ not easy to customize
-     javascript learning curve
-     good with rasters, but is not mature with features

## Supervised vs. Unsupervised Learning
###  Supervised pros:
-     accurate
###  Supervised Cons:
-     relies heavily on labeled data

###  Unsupervised pros:
-     do not need labeled data
-     allows more clusters (not limited by the labels)
    

# Acknowledgements

Many thanks to all the amazing presenters and support team/coordinators of Geohackweek 2018. Special thanks to David Shean, Shay Strong, Catherine Kuhn, and James Douglass who helped us write scripts, debug technical issues, and discussed machine learning and general data science concepts.

# Questions?
![lidar_selfie.png](figures/group_lidar_pic.png)