# Challenge 7 - Fight Fire with Data
## Random Forest Model to Predict Fire Spread

The user will be using a Jupyter Notebook to run code that was developed in R. First, the user will check to see if the wind speed and brightness are correlated with the speed that the fire spreads derived from the satellite data. The input data has been prepared for you. Next, the user will run the code that creates a model (random forest) using the features they select (windspeed and brightness) as the inputs and estimates the speed of spread as the target variable (speed of spread). They will train a model, record the Root Mean Squared Error, and save the model into a deployable format also known as Predictive Model Markup Language (PMML). 


## Install and Load Packages

In [1]:
# Install and load packages
library(projectLib)
library(httr)
library(jsonlite)
host <- "https://api.watsonwarriors.ai/workers/validate";

project <- projectLib::Project$new(,"<ProjectId>", "<ProjectToken>")

library(randomForest)
library(caret)
library(data.table)
library(devtools)

install_git("git://github.com/jpmml/r2pmml.git")
library(r2pmml)

randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Loading required package: lattice
Loading required package: ggplot2
Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang

Attaching package: ‘ggplot2’

The following object is masked from ‘package:randomForest’:

    margin


Attaching package: ‘caret’

The following object is masked from ‘package:httr’:

    progress

Downloading git repo git://github.com/jpmml/r2pmml.git


✔  checking for file ‘/home/dsxuser/.tmp/Rtmp3hKPAg/file5377b4f8b14/DESCRIPTION’ (510ms)
─  preparing ‘r2pmml’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘r2pmml_0.23.2.tar.gz’
   


## Get and View Data

In [5]:
# Get data
# https://github.com/watsonwarriors2020/challenges/raw/master/Challenge_7_Merged_Data_single_fire.csv
fireData = fread('https://raw.githubusercontent.com/watsonwarriors2020/challenges/master/Challenge_7_Merged_Data_single_fire.csv'
                 , data.table=FALSE
                 , header = TRUE
                 , stringsAsFactors = FALSE)

# Attach the column names
attach(fireData)

# Show data structure
str(fireData)

“Some columns are type 'integer64' but package bit64 is not installed. Those columns will print as strange looking floating point data. There is no need to reload the data. Simply install.packages('bit64') to obtain the integer64 print method and print the data again.”The following objects are masked from fireData (pos = 3):

    acq_date, acq_time, bright_t31, brightness, confidence, DateHrGmt,
    DateHrLwt, datetime_start, daynight, distance, duration,
    FortyToOneHundredLiquidSoilMoisturePercent, frp, instrument,
    lat_start, latitude, long_start, longitude,
    RelativeHumidityPercent, satellite, scan, SiteId, speed_mph,
    SurfaceDewpointTemperatureFahrenheit, SurfaceTemperatureFahrenheit,
    SurfaceWetBulbTemperatureFahrenheit, SurfaceWindGustsMph,
    TenToFortyLiquidSoilMoisturePercent, time_stamp, track, type,
    version, WindDirectionDegrees, WindSpeedMph,
    ZeroToTenLiquidSoilMoisturePercent

The following object is masked from package:base:

    version



'data.frame':	13818 obs. of  35 variables:
 $ SiteId                                    : 'integer64' num  1.07e-314 1.07e-314 1.07e-314 1.07e-314 1.07e-314 ...
 $ latitude                                  : num  36.5 36.5 36.5 36.5 36.5 ...
 $ longitude                                 : num  -122 -122 -122 -122 -122 ...
 $ DateHrGmt                                 : chr  "7/23/2016 3:00" "7/23/2016 3:00" "7/23/2016 3:00" "7/23/2016 3:00" ...
 $ DateHrLwt                                 : chr  "7/22/2016 20:00" "7/22/2016 20:00" "7/22/2016 20:00" "7/22/2016 20:00" ...
 $ WindSpeedMph                              : num  6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 ...
 $ WindDirectionDegrees                      : int  318 318 318 318 318 318 318 318 318 318 ...
 $ SurfaceWindGustsMph                       : num  36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 ...
 $ ZeroToTenLiquidSoilMoisturePercent        : num  14 14 14 14 14 14 14 14 14 14 ...
 $ TenToFortyLiquidSoilMoisturePercent   

## Train a Random Forest Model and Display Accuracy

In [6]:
# Train model
set.seed(556)
modFit_rf <- randomForest(speed_mph~brightness + frp + WindSpeedMph + SurfaceTemperatureFahrenheit,
                          data = fireData,
                          # nodesize = 1,
                          ntree = 20,
                          trControl = trainControl(method = "cv", number = 10)
                          )

print(modFit_rf)

#RMSE
rmse <- round(postResample(fireData$speed_mph, predict(modFit_rf, fireData))[1],4)
answers <- list('0' = rmse)



Call:
 randomForest(formula = speed_mph ~ brightness + frp + WindSpeedMph +      SurfaceTemperatureFahrenheit, data = fireData, ntree = 20,      trControl = trainControl(method = "cv", number = 10)) 
               Type of random forest: regression
                     Number of trees: 20
No. of variables tried at each split: 1

          Mean of squared residuals: 0.1684863
                    % Var explained: -35.53


## Export Predictive Model Markup Language file

In [7]:
# Export the model to PMML
r2pmml(modFit_rf, "modFit_rf.pmml")
#Export the PMML file to project storage
project$save_data('modFit_rf.xml', "modFit_rf.pmml", overwrite=TRUE)

## Complete Challenge

In [None]:
## Paste validation code below
