# Challenge 9 - Step it UP!

## Random Forest Model to Predict Fire Spread

The player will be using this Jupyter Notebook to rerun some of the code that was developed in R from the “Fight Fire with Data” Challenge. The input data has been prepared for you. The player will import the dataset, prepare the data, and divide the dataset into a training and test set. Next, the player will run the code that creates a model (random forest) using features they select (such as brightness, windspeed and frp) as the inputs and will estimate the speed of spread as the target variable. Finally, the player will train a model, record the Root Mean Squared Error and adjust the parameters of the model to see if they can make the model better. Can you? 


## Install and Load Packages

In [None]:
# Install and load packages
install.packages('bit64')
install.packages('ISLR')

In [None]:
#Load libraries 
library(projectLib)
library(ISLR)
library(bit64)
library(httr)
library(jsonlite)
host <- "https://api.watsonwarriors.ai/workers/validate";


project <- projectLib::Project$new(,"<ProjectId>", "<ProjectToken>")

library(randomForest)
library(caret)
library(data.table)
library(devtools)

install_git("git://github.com/jpmml/r2pmml.git")
library(r2pmml)

## Get and View Data

In [None]:
# Get data
# https://github.com/watsonwarriors2020/challenges/raw/master/Challenge_7_Merged_Data_single_fire.csv
fireData = fread('https://raw.githubusercontent.com/watsonwarriors2020/challenges/master/Challenge_7_Merged_Data_single_fire.csv'
                 , data.table=FALSE
                 , header = TRUE
                 , stringsAsFactors = FALSE)


# Generate the training and test datasets
smp_siz = floor(0.75*nrow(fireData))  # creates a value for dividing the data into train and test. In this case the value is defined as 75% of the number of rows in the dataset
set.seed(556)   # set seed to ensure you always have same random numbers generated
train_ind = sample(seq_len(nrow(fireData)),size = smp_siz)  # Randomly identifies the rows equal to sample size ( defined in previous instruction) from  all the rows of dataset and stores the row number in train_ind
train =fireData[train_ind,] #creates the training dataset with row numbers stored in train_ind
test=fireData[-train_ind,]  # creates the test dataset excluding the row numbers mentioned in train_ind


In [None]:
# Attach the column names
attach(train)
attach(test)
# Show data structure
str(train)

## Train a Random Forest Model and Display Accuracy

In [None]:
# Train model
set.seed(556)
modFit_rf <- randomForest(speed_mph~brightness + frp + WindSpeedMph, 
                          data = train,
                          nodesize = 5000,
                          ntree = 20,
                          trControl = trainControl(method = "cv", number = 10)
                          )




#RMSE
rmse <- round(postResample(test$speed_mph, predict(modFit_rf, test))[1],4)
answer_baseline <- as.numeric(rmse)

cat("\n The base RMSE is: " , answer_baseline)


cat("\n\n Here is the summary of the last model you tested. \n")

print(modFit_rf)

## Make changes to the base model to improve the Root Mean Square Error (RMSE) for your model.

### Adjust the Random Forest parameters to make the model have a lower RMSE value.

#### You can adjust:

    ntree = the number of trees in the model

    nodesize = the minimum size of a leaf (node). A large value will make a shallower tree. A small value will make a larger tree. 

    features included in the model by removeing the # in front of the feature you want to include.

#### Remember, not every adjustment is an improvement!! You can test and check your improvements as many times as you like before submitting your answer. 

In [None]:
# Train model
set.seed(556)
modFit_rf <- randomForest(speed_mph~ 
      brightness   #  <-- remove or add features by adding a # before the name of the feature
#      + bright_t31 
     +   frp        #
#      + RelativeHumidityPercent  
#      + SurfaceWetBulbTemperatureFahrenheit
     + WindSpeedMph #                             
#      + SurfaceWindGustsMph                       
#      + ZeroToTenLiquidSoilMoisturePercent        
#      + TenToFortyLiquidSoilMoisturePercent       
#      + FortyToOneHundredLiquidSoilMoisturePercent
#      + SurfaceTemperatureFahrenheit         
#      + SurfaceDewpointTemperatureFahrenheit   <-- remove or add features by adding a # before the name of the feature
                         , data = train,
                           nodesize = 5000,  # <-- change the min size of the terminal node
                          ntree = 20,    # <-- change the number of trees in the forrest
                          trControl = trainControl(method = "cv", number = 10)
                          )

In [None]:
#RMSE
rmse <- round(postResample(test$speed_mph, predict(modFit_rf, test))[1],4)
your_rmse <- as.numeric(rmse)

cat("\n Your RMSE is now: ",your_rmse, "\n The base RMSE was: " , answer_baseline)

cat("\n\n Here is the summary of the last model you tested. \n")

answers <- list('0' = your_rmse)
print(modFit_rf)


#### Base Model Score: 0.2133 and default values.

Call:
 randomForest(formula = speed_mph ~ brightness + frp + WindSpeedMph,      data = train, nodesize = 5000, ntree = 20, trControl = trainControl(method = "cv",          number = 10)) 
               Type of random forest: regression
                     Number of trees: 20
No. of variables tried at each split: 1

          Mean of squared residuals: 0.1501773
                    % Var explained: 0.16



## Complete Challenge

In [None]:
## Paste validation code below
