# Convolutional Neural Network

__Project Description:__ <br>
This notebook contains exploratory data analysis and the convolutional neural network model used to predict whether or not a location is susceptible to wildfires.  The data for this exploration consists of roughly 20,000 labeled satellite images. 10,000 of the images are locations which have experienced wildfires, while the other 10,000 have never seen a wildfire before.  <br>Examples below: 

While I was unable to collect satellite imagery of the site a few days before the fire, I believe this will suffice as a proof of concept, especially since areas that experience wildfires often experience them again.

# Importing Libraries:

In [5]:
# Test

In [None]:
import os
import shutil
import pandas as pd
import numpy as np

# Plots and Graphs
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import image
import plotly.express as px
import scikitplot as skplt
import folium 
%matplotlib inline

import requests
import random
from IPython.display import Image, display

# API and Requests
import urllib.request

# Keras/Tensorflow
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout , BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
from keras.callbacks import ReduceLROnPlateau

# Shows all columns
pd.set_option('display.max_columns', None)

# Turning off warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
import PIL
from PIL import Image

In [None]:
import time
import matplotlib.pyplot as plt
import scipy
import numpy as np
from PIL import Image
from scipy import ndimage
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

np.random.seed(123)

# Importing Data:

Source: [here](https://geo.wa.gov/datasets/wadnr::dnr-fire-statistics-2008-present-1/data?geometry=-126.579%2C45.325%2C-111.143%2C47.964&orderBy=FIRE_RGE_WHOLE_NO&orderByAsc=false&selectedAttribute=ACRES_BURNED)

In [None]:
df = pd.read_csv('DNR_Fire_Statistics_2008_-_Present.csv')
df = df.sample(frac=1).reset_index(drop=True) # Shuffling the data
df.head()

## Column Descriptions:

__Column:__

- __X:__                            
- __Y:__                              
- __OBJECTID:__ Unique ID
- __FIREEVENT_ID:__ Unique ID
- __INCIDENT_NO:__ Incident Number
- __INCIDENT_NM:__ Incident Name (trail or forest area)
- __INCIDENT_ID:__ 
- __COUNTY_LABEL_NM:__ County Name (King, Stevens, etc. . .)          
- __FIRE_TWP_WHOLE_NO:__ 
- __FIRE_TWP_FRACT_NO:__
- __FIRE_RGE_WHOLE_NO:__
- __FIRE_RGE_FRACT_NO:__
- __FIRE_RGE_DIR_FLG:__
- __FIRE_SECT_NO:__
- __SITE_ELEV:__ Elevation of site
- __FIREGCAUSE_LABEL_NM:__ Cause
- __FIRESCAUSE_LABEL_NM:__ Secondary cause
- __BURNESCAPE_RSN_LABEL_NM:__
- __ACRES_BURNED:__ Acres Burned
- __START_DT:__ Start Date
- __START_TM:__ Start Time
- __DSCVR_DT:__ Discovery Date
- __DSCVR_TM:__ Discovery Time
- __CONTROL_DT:__ Date brought under control
- __CONTROL_TM:__ Time brought under control
- __FIRE_OUT_DT:__ Date fire was put out
- __FIRE_OUT_TM:__ Time fire was put out
- __BURN_MERCH_AREA:__
- __BURN_REPROD_AREA:__
- __BURN_NONSTOCK_AREA:__
- __FIREEVNT_CLASS_CD:__
- __FIREEVNT_CLASS_LABEL_NM:__ 'Classified' or 'Other Agency'
- __SECTION_SUBDIV_PTS_ID:__ 
- __LAT_COORD:__ Longitude
- __LON_COORD:__ Latitude
- __RES_ORDER_NO:__  
- __NON_DNR_RES_ORDER_NO:__ 
- __START_OWNER_AGENCY_NM:__ Owner of land where fire started (private, government, DNR, etc. . .)
- __START_JURISDICTION_AGENCY_NM:__ Jurisdiction where it started
- __PROTECTION_TYPE:__ Type of Protection of area
- __REGION_NAME:__  Region

## Images:

### Wildfire Area Image Previews:

__Image examples:__<br>
__Areas with wildfires:__
![text](example_images/wf1.jpg)
![text](example_images/wf2.jpg)
![text](example_images/wf3.jpg)

### Non-Wildfire Area Image Previews:

__Areas without wildfires:__
![text](example_images/nwf1.jpg)
![text](example_images/nwf2.jpg)
![text](example_images/nwf3.jpg)

# Cleaning:

In [None]:
# Quick spelling error fixed
df['FIREGCAUSE_LABEL_NM'] = df['FIREGCAUSE_LABEL_NM'].map(lambda x: 'Misc' 
                                                          if x == 'Miscellaneou' 
                                                          else x)

In [None]:
# Dealing with dates: 
df['date'] = pd.to_datetime(df.START_DT)
df.date = df.date.dt.strftime('%m/%d/%Y')
df['date'] = pd.to_datetime(df.date)
# Extracting Month
df['month'] = pd.DatetimeIndex(df['date']).month
df['year'] = pd.DatetimeIndex(df['date']).year

# Exploratory Data Analysis:

In [None]:
# Initial Histogram Plot
df.hist(figsize=(15,15));

## Bar Chart by Region: 

In [None]:
# Unique Regions:
print('Regions of Fires:\n\n', df.REGION_NAME.value_counts())

## Bar Chart by County:

In [None]:
print('Top 10 County of Fires:\n\n', df.COUNTY_LABEL_NM.value_counts()[:10])

## Elevation Histogram:

In [None]:
print('Minimum Elevation: ', min(df.SITE_ELEV))
print('Maximum Elevation: ', max(df.SITE_ELEV))

In [None]:
# Histogram of elevation for wildfires
# Most likely due to being clustered around two main sites. . . 
# I'd be interested to see where the high elevation fires are happening. . . 
# Especially the 8000 ft one!
dftest = df[df['SITE_ELEV'] > 0]
sns.distplot(dftest.SITE_ELEV);

## Map with Pins, Size by Acres:

In [None]:
print('Minimum Acres Burned: ', min(df.ACRES_BURNED))
print('Maximum Acres Burned: ', max(df.ACRES_BURNED))
# Change to > 0

__Acre Histograms:__

In [None]:
# Histogram of acres burned
# Mainly very small fires
# Heavy skew
dftest = df[df['ACRES_BURNED'] > 0]
sns.distplot(dftest.ACRES_BURNED, bins = 20, kde = False);

In [None]:
# Histogram of acres burned
# Mainly very small fires
dftest = df[df['ACRES_BURNED'] > 0]
dftest = dftest[dftest['ACRES_BURNED'] < 100]
sns.distplot(dftest.ACRES_BURNED);

In [None]:
# Zooming in on fires under 1 acre
dftest = df[df['ACRES_BURNED'] > 0]
dftest = dftest[dftest['ACRES_BURNED'] < 1]
sns.distplot(dftest.ACRES_BURNED);

In [None]:
# Maybe also try pins with small, med, and large as separate pin colors
wa_coord = (47.4, -120.7401)
# Creating an empty map
map = folium.Map(location = wa_coord, zoom_start = 7.3, tiles='Cartodb Positron')

# Adding markers:
from folium.plugins import HeatMap
HeatMap(data=df[['LAT_COORD', 'LON_COORD', 'ACRES_BURNED']].\
        groupby(['LAT_COORD', 'LON_COORD']).sum().reset_index().\
        values.tolist(), radius=11).add_to(map)

display(map)

In [None]:
df_small = df[df['ACRES_BURNED'] > 0]
df_small = df_small[df_small['ACRES_BURNED'] < 10]
df_med = df[df['ACRES_BURNED'] >= 10]
df_med = df_med[df_med['ACRES_BURNED'] < 500]
df_large = df[df['ACRES_BURNED'] >= 500]

df_small = df_small[:150]
df_med = df_med[:150]
df_large = df_large[:150]

In [None]:
wa_coord = (47.4, -120.7401)
# Creating an empty map
map = folium.Map(location = wa_coord, zoom_start = 7.3, tiles='Cartodb Positron')

# Adding markers:
# Small - Yellow
for i in range(0,len(df_small)):
    folium.CircleMarker([df_small.iloc[i]['LAT_COORD'], 
                         df_small.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'yellow',
                         fill_color='white',
                         popup=df_small.iloc[i]['ACRES_BURNED']).add_to(map)
    
# Med - Orange    
for i in range(0,len(df_med)):
    folium.CircleMarker([df_med.iloc[i]['LAT_COORD'], 
                         df_med.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'orange',
                         fill_color='white',
                         popup=df_med.iloc[i]['ACRES_BURNED']).add_to(map)
# Large - Red    
for i in range(0,len(df_large)):
    folium.CircleMarker([df_large.iloc[i]['LAT_COORD'], 
                         df_large.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'red',
                         fill_color='white',
                         popup=df_large.iloc[i]['ACRES_BURNED']).add_to(map)
display(map)

## Map with Pins, Color by Date: 

In [None]:
print('Earliest Start Date: ', min(df.START_DT))
print('Latest Start Date: ', max(df.START_DT))

In [None]:
# Most fires happening in July/Aug - no surprise
df.month.value_counts()

In [None]:
df_jul = df[df['month'] == 7]
df_aug = df[df['month'] == 8]
df_sept = df[df['month'] ==  9]
#df_other = df[df['month'] ==  [7,8,9]]
# Limiting Sample
df_jul = df_jul[:150]
df_aug = df_aug[:150]
df_sept = df_sept[:150]
#df_other = df_other[:150]

In [None]:
wa_coord = (47.4, -120.7401)
# Creating an empty map
map = folium.Map(location = wa_coord, zoom_start = 7.3, tiles='Cartodb Positron')

# df_jul
# df_aug
# df_sept
# Adding markers:
# July - Yellow
for i in range(0,len(df_jul)):
    folium.CircleMarker([df_jul.iloc[i]['LAT_COORD'], 
                         df_jul.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'yellow',
                         fill_color='white',
                         popup=df_jul.iloc[i]['month']).add_to(map)
    
# Aug - Orange    
for i in range(0,len(df_aug)):
    folium.CircleMarker([df_aug.iloc[i]['LAT_COORD'], 
                         df_aug.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'orange',
                         fill_color='white',
                         popup=df_aug.iloc[i]['month']).add_to(map)
# Sept - Red    
for i in range(0,len(df_sept)):
    folium.CircleMarker([df_sept.iloc[i]['LAT_COORD'], 
                         df_sept.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'red',
                         fill_color='white',
                         popup=df_sept.iloc[i]['month']).add_to(map)
display(map)

## Map with Pins, Color by Cause:

In [None]:
print('Cause of Fires:\n\n',df.FIREGCAUSE_LABEL_NM.value_counts())

In [None]:
df_lightning = df[df['FIREGCAUSE_LABEL_NM'] == 'Lightning']
df_arson = df[df['FIREGCAUSE_LABEL_NM'] == 'Arson']
df_debris = df[df['FIREGCAUSE_LABEL_NM'] == 'Debris Burn']
# Limiting Sample
df_lightning = df_lightning[:150]
df_arson = df_arson[:150]
df_debris = df_debris[:150]
###
df100 = df[:150]

In [None]:
# Folium with Markers
# Idea: Change to arson vs not arson
# add legend
# Shuffle data!!
# Add Lightnight


wa_coord = (47.4, -120.7401)
# Creating an empty map
map = folium.Map(location = wa_coord, zoom_start = 7.3, tiles='Cartodb Positron')

# Lightning - Yellow
for i in range(0,len(df_lightning)):
    folium.CircleMarker([df_lightning.iloc[i]['LAT_COORD'], 
                         df_lightning.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'yellow',
                         fill_color='white',
                         popup=df_lightning.iloc[i]['FIREGCAUSE_LABEL_NM']).add_to(map)
    
# Arson - Red    
for i in range(0,len(df_arson)):
    folium.CircleMarker([df_arson.iloc[i]['LAT_COORD'], 
                         df_arson.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'red',
                         fill_color='white',
                         popup=df_arson.iloc[i]['FIREGCAUSE_LABEL_NM']).add_to(map)
# Debris - Blue    
for i in range(0,len(df_debris)):
    folium.CircleMarker([df_debris.iloc[i]['LAT_COORD'], 
                         df_debris.iloc[i]['LON_COORD']],
                         radius = 3,
                         color= 'blue',
                         fill_color='white',
                         popup=df_debris.iloc[i]['FIREGCAUSE_LABEL_NM']).add_to(map)

display(map)

## Map with Pins, Color by Length of Time of Fire:

## Animated Map Over Time by Year:

In [None]:
df.year.value_counts()

In [None]:
# Use group by over year with sum of acres burned and total number of fires
# Use stacked line plot

## Animated Map Over Time by Month:

In [None]:
# i.e. everything flattened to one year. . . 

In [None]:
# See if there are patterns by size, location, cause, etc. . . 

# Neural Network:

## Import Images:

In [None]:
# REMEMBER TO SHUFFLE IMAGES
# Paths:
# /Users/Thomas/Desktop/capstone/images/test_wf
# /Users/Thomas/Desktop/capstone/images/test_nwf
# Images should be 350x350

In [None]:
print ('WF')

for dirname, _, filenames in os.walk('/Users/Thomas/Desktop/capstone/images/test_wf'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
print ('\nNWF')
        
for dirname, _, filenames in os.walk('/Users/Thomas/Desktop/capstone/images/test_nwf'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
x = '/Users/Thomas/Desktop/capstone/images/test_wf/45.93476,-121.498236.jpg'
image = Image.open(x)
print(image.format)
print(image.mode)
print(image.size)
# show the image
image.show()

# TEST ZONE:

In [None]:
# Paths:
# Train - 15 each
train_folder = '/Users/Thomas/Desktop/split/train'
train_wf = '/Users/Thomas/Desktop/split/train/wf'
train_nwf = '/Users/Thomas/Desktop/split/train/nwf'

# Test - 5 each
test_folder = '/Users/Thomas/Desktop/split/test'
test_wf = '/Users/Thomas/Desktop/split/test/wf'
test_nwf = '/Users/Thomas/Desktop/split/test/nwf'

# Val - 5 each
val_folder = '/Users/Thomas/Desktop/split/val'
val_wf = '/Users/Thomas/Desktop/split/val/wf'
val_nwf = '/Users/Thomas/Desktop/split/val/nwf'

In [None]:
imgs_wf = [file for file in os.listdir(val_wf) if file.endswith('.jpg')]
imgs_wf[:10]

In [None]:
# GET ASSISTANCE HERE - MAKE SURE THIS WORKS 
# This is turning them into 64x64x3 images
train_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(
        train_folder, 
        target_size=(64, 64), batch_size = 2000)

test_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(
        test_folder, 
        target_size=(64, 64), batch_size = 200) 

val_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(
        val_folder, 
        target_size=(64, 64), batch_size = 200)

In [None]:
train_images, train_labels = next(train_generator)
test_images, test_labels = next(test_generator)
val_images, val_labels = next(val_generator)

In [None]:
# Explore your dataset again
m_train = train_images.shape[0]
num_px = train_images.shape[1]
m_test = test_images.shape[0]
m_val = val_images.shape[0]

print ("Number of training samples: " + str(m_train))
print ("Number of testing samples: " + str(m_test))
print ("Number of validation samples: " + str(m_val))
print ("train_images shape: " + str(train_images.shape))
print ("train_labels shape: " + str(train_labels.shape))
print ("test_images shape: " + str(test_images.shape))
print ("test_labels shape: " + str(test_labels.shape))
print ("val_images shape: " + str(val_images.shape))
print ("val_labels shape: " + str(val_labels.shape))

In [None]:
# Reshaping to 1d array:
train_img = train_images.reshape(train_images.shape[0], -1)
test_img = test_images.reshape(test_images.shape[0], -1)
val_img = val_images.reshape(val_images.shape[0], -1)

print(train_img.shape)
print(test_img.shape)
print(val_img.shape)

In [None]:
# Lablels for the images:
train_y = np.reshape(train_labels[:,0], (2000,1))
test_y = np.reshape(test_labels[:,0], (200,1))
val_y = np.reshape(val_labels[:,0], (200,1))

In [None]:
from keras import models
from keras import layers
np.random.seed(123)
model = models.Sequential()
model.add(layers.Dense(20, activation='relu', input_shape=(12288,))) # 2 hidden layers
model.add(layers.Dense(7, activation='relu'))
model.add(layers.Dense(5, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='sgd',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [None]:
histoire = model.fit(train_img,
                    train_y,
                    epochs=50,
                    batch_size=32,
                    validation_data=(val_img, val_y))

## TEST CNN:

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
                        input_shape=(64 ,64,  3)))
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(32, (4, 4), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer="sgd",
              metrics=['acc'])

In [None]:
history = model.fit(train_images,
                    train_y,
                    epochs=30,
                    batch_size=10,
                    validation_data=(val_images, val_y))

# END TEST ZONE:

## Baseline CNN:

## Deeper CNN:

## Neural Network Results:

### Accuracy and Precision:

### Confusion Matrix:

### Other Results:

# Jupyter Deployment Example:

In [None]:
# input lon
# input lat
# api call for url
# run image through neural net
# display image and result, perhaps as a percentage. . . 

In [None]:
# add pic of the whole thing

# Conclusion: