## Fighting Forest Fires with Functions


### University of Virginia
### Programming for Data Science
### Last Updated: July 26, 2021
---

### Objectives: 
- Work with functions (built-in and user-defined), lambda functions, and list comprehensions

### Executive Summary


You will work with the Forest Fires Data Set from UCI.  

Information about the dataset: https://archive.ics.uci.edu/ml/datasets/Forest+Fires

Background: This dataset was used in a regression task, where the aim was to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.

We will apply some of the steps leading to an ML task.

### Instructions

Run the pre-populated code, and along the way, you will be asked to perform several graded tasks <span style="color:blue">(prompted in blue font)</span>.  
Show your code and solutions clearly in the cells following each question.   
When the file is completed, submit the notebook through Collab.

**TOTAL POINTS: 14**

---


In [30]:
import pandas as pd
import numpy as np

#### Read in the dataset from the UCI Machine Learning Repository  

In [3]:
path_to_data = "https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv"
fire = pd.read_csv(path_to_data)

In [4]:
fire.head(3)

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.0
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.0
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.0


**Working with spatial coordinates X, Y**

X - x-axis spatial coordinate within the Montesinho park map: 1 to 9  
Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9

In [5]:
# extract the spatial coords

X, Y = fire.X.values, fire.Y.values

**<span style="color:blue">(2 PTS) 1. Write a function called `coord_builder()` with these requirements:</span>**

- takes X, Y as inputs
- contains a docstring with short description of the function
- uses the zip() function (details: https://realpython.com/python-zip-function/)
- builds and returns a list of tuples [(x1,y1), (x2,y2), ..., (xn,yn)] where (xi,yi) are the ordered pairs from X, Y

Hint: You'll need to call list() on the zipped object to show the results


In [9]:
def coord_builder(X, Y):
    '''
    Purpose: 
    Given to lists of integers of equal length, create a list of tuples of ordered pairs
    
    Inputs: 
    X = list of X-coordinates
    Y = list of Y-coordinates
    
    Output:
    list of ordered pairs of (x, y)
    '''
    zipped = zip(X, Y)
    
    return list(zipped)

**<span style="color:blue">(1 PT) 2. Call your `coord_builder()` function, passing in X, Y.  
    Please subset the returned list to show a list with only the FIRST FIVE TUPLES. </span>**

In [13]:
coord_builder(X, Y)[0:5]

[(7, 5), (7, 4), (7, 4), (8, 6), (8, 6)]

**Working with AREA**

In [26]:
# extract values for area
area = fire.area.values

**<span style="color:blue">(1 PT) 3. Write code to print the minimum area and maximum area in a tuple
(min_value, max_value) where the min_value, max_value are floats.</span>** 

In [121]:
min_max = (min(area), max(area))
print(min_max)

(0.0, 1090.84)


**<span style="color:blue">(2 PTS) 4. Write a lambda function that computes the following transformation of a variable:</span>**   
```    
    logarithm(base10) of [1 + x]
```

**<span style="color:blue">Then call the lambda function on *area*, printing the LAST 10 values.</span>**  
Hint: numpy has a function that can be applied to an array.

In [123]:
logs = lambda x: np.log10(1+x)
logs(area[-10:])

array([0.        , 0.        , 0.50105926, 0.15533604, 0.        ,
       0.87157294, 1.74264659, 1.08493357, 0.        , 0.        ])

**Working with MONTH**

month - month of the year: 'jan' to 'dec'

In [40]:
month = fire.month.values

**<span style="color:blue">(1 PT) 5. Call the `unique()` function (from the numpy package) on *month*, printing the unique months:</span>**   

In [43]:
np.unique(month)

array(['apr', 'aug', 'dec', 'feb', 'jan', 'jul', 'jun', 'mar', 'may',
       'nov', 'oct', 'sep'], dtype=object)

**<span style="color:blue">(1 PT) 6. Write a list comprehension to select all months starting with letter 'a' from *month*.   
    Next, call set() on the result, to get the unique months starting with letter 'a'. Print this result.</span>**   

In [125]:
months_a = [month for month in month if month.startswith('a')]
set(months_a)

{'apr', 'aug'}

**Working with DMC**  
DMC - DMC index from the FWI system: 1.1 to 291.3  

In [119]:
dmc = fire.DMC.values

**<span style="color:blue">(2 PTS) 7. Write a function called `bandpass_filter()` with these requirements:</span>**

- takes three inputs: 
  - a numpy array to be filtered
  - an integer serving as a lower bound L
  - an integer serving as an upper bound U
- returns a new array containing only the values from the original array which are greater than L AND less than U

In [126]:
def bandpass_filter(arr, lower, upper):
    '''
    Purpose:
    given an array of numbers and an upper/lower bound, create a new arrary of values within the range
    
    Inputs:
    arr = arrary of numbers
    lower = integer
    upper = integer
    
    Output:
    arrary bounded by upper and lower bounds
    '''
    mt_list = []
    for x in arr:
        if (x > lower) and (x < upper):
            mt_list.append(x)
    new_arr = np.array([mt_list])
    return new_arr
    

**<span style="color:blue">(1 PT) 8. Call `bandpass_filter()` passing DMC as the array, L=25, U=35, printing the result. </span>**


In [127]:
bandpass_filter(dmc, 25, 35)

array([[26.2, 33.3, 32.8, 27.9, 27.4, 25.7, 33.3, 33.3, 30.7, 33.3, 25.7,
        25.7, 25.7, 32.8, 27.2, 27.8, 26.4, 25.4, 25.4, 25.4, 25.4, 26.7,
        25.4, 27.5, 28. , 25.4]])

**Working with FFMC**  
FFMC - FFMC index from the FWI system: 18.7 to 96.20

In [79]:
FFMC = fire.FFMC.values

array([86.2, 90.6, 90.6, 91.7, 89.3, 92.3, 92.3, 91.5, 91. , 92.5, 92.5,
       92.8, 63.5, 90.9, 92.9, 93.3, 91.7, 84.9, 89.2, 86.3, 91. , 91.8,
       94.3, 90.2, 93.5, 91.4, 92.4, 90.9, 93.4, 93.5, 94.3, 88.6, 88.6,
       91.7, 91.8, 90.3, 90.6, 90. , 90.6, 88.1, 79.5, 90.2, 94.8, 92.5,
       90.1, 94.3, 90.9, 94.2, 87.2, 87.6, 92.9, 90.2, 92.1, 92.1, 91.7,
       92.9, 90.3, 92.6, 84. , 86.6, 89.3, 89.3, 93. , 90.2, 91.1, 91.7,
       92.4, 92.4, 92.4, 91.7, 91.2, 94.3, 91.7, 88.8, 93.3, 84.2, 86.6,
       87.6, 90.1, 91. , 91.4, 90.2, 94.8, 92.1, 91.7, 92.9, 92.9, 92.9,
       93.5, 91.7, 90.2, 91.7, 92.3, 91.4, 91.1, 89.7, 83.9, 69. , 91.4,
       91.4, 91.4, 88.8, 94.8, 92.5, 82.1, 85.9, 91.4, 90.2, 92.5, 88.6,
       85.9, 91.7, 89.7, 91.8, 88.1, 88.1, 91.7, 91.7, 90.1, 93. , 91.5,
       91.5, 92.4, 84.4, 94.3, 92.6, 87.6, 93.5, 91.4, 92.6, 68.2, 87.2,
       89.3, 93.7, 88.1, 93.5, 92.4, 90.9, 85.8, 91. , 90.9, 95.5, 90.1,
       90. , 95.5, 95.2, 90.1, 84.4, 94.8, 93.7, 92

**<span style="color:blue">(2 PTS) 9. Write a function called `sum_sq_err()` with these requirements:</span>**

- take a numpy array as input
- compute the mean of the array, mu
- using a for-loop, compute the squared deviation of each array element xi from the mean, (xi - mu)**2  
Hint: it may be helpful to keep a running sum of the squared deviations


- computes the sum of squared deviations
- returns the sum of squared deviations

In [128]:
def sum_sq_err(arr):
    mu = np.sum(arr)/len(arr)
    sum = 0
    for xi in arr:
        dev = (xi-mu)**2
        sum = sum+dev
    return(sum)

**<span style="color:blue">(1 PT) 10. Call `sum_sq_err()` passing FFMC as the array, printing the result. </span>**

In [129]:
sum_sq_err(FFMC)

15723.357872340408

---  