In [2]:
# Loading Libraries
library(tidyverse)
library(tidymodels)
library(RColorBrewer)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.5     [32m✔[39m [34mrsample     [39

### Introduction


### Preliminary Analysis

Our dataset was sourced from a study attempting to use machine learning methods to predict fires in the Montesinho park. A version of the dataset is available in this [link](https://archive.ics.uci.edu/static/public/162/forest+fires.zip). To install this file, we must unzip and write the file into a table. The code for writing the files into the table was taken from this [StackOverflow question](https://stackoverflow.com/questions/3053833/using-r-to-download-zipped-data-file-extract-and-import-data).

In [57]:
# Setting up data tibble from url
zip_temp <- tempfile() # temp file for .zip
extract_temp <- tempfile() # temp file for extracted .zip

download.file("https://archive.ics.uci.edu/static/public/162/forest+fires.zip", zip_temp)
unzip(zipfile=zip_temp, exdir=extract_temp)

forest_data <- read_csv(file.path(extract_temp, "forestfires.csv"))
unlink(c(zip_temp, extract_temp))

[1mRows: [22m[34m517[39m [1mColumns: [22m[34m13[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (2): month, day
[32mdbl[39m (11): X, Y, FFMC, DMC, DC, ISI, temp, RH, wind, rain, area

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Although the data is already in a tidy format, we just need to convert the `month` and `day` variables into factors.

In [58]:
# Changing month and day to factors
forest_data <- forest_data |>
            mutate(month = as_factor(month), day=as_factor(day))

head(forest_data)

X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
<dbl>,<dbl>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0
7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0
7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0
8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0
8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0
8,6,aug,sun,92.3,85.3,488.0,14.7,22.2,29,5.4,0.0,0


To get a better feel for the data table, we will collect some metrics for the variables on the dataset. First, an overview on what these values represent:




In [54]:

forest_categorical <- forest_data |> select(month, day)
forest_numerical <- forest_data |> select(-month, -day)

fire_by_stat <- function(fn) {
    return(map_df(forest_numerical, fn))
}

fire_mean <- fire_by_stat(mean)
fire_max <- fire_by_stat(max)
fire_stdev <- fire_by_stat(sd)

month_appearances <- forest_categorical |> 
                    group_by(month) |>
                    summarize(count=n()) |>
                    pivot_wider(names_from="month", values_from="count")

month_appearances
fire_mean
fire_max
fire_stdev

mar,oct,aug,sep,apr,jun,jul,feb,jan,dec,may,nov
<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
54,15,184,172,9,17,32,20,2,9,2,1


X,Y,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
4.669246,4.299807,90.64468,110.8723,547.94,9.021663,18.88917,44.2882,4.017602,0.02166344,12.84729


X,Y,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
9,9,96.2,291.3,860.6,56.1,33.3,100,9.4,6.4,1090.84


X,Y,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2.313778,1.2299,5.520111,64.04648,248.0662,4.559477,5.806625,16.31747,1.791653,0.2959591,63.65582


In [None]:
forest_data |> pull(month) |> levels()
area_distribution_plot <- forest_data |>
                        ggplot(aes(x=c())) +
                        geom_histogram(bins=250) +
                        labs(color="Month") +
                        scale_y_continuous(limits=c(0, 75)) +
                        facet_grid()
                        
area_distribution_plot

### Methods

### Expected Outcome and Significance

"hello"

### Bibliography