## Fighting Forest Fires with Functions


### Programming for Data Science
### Last Updated: Jan 15, 2023
---

### Objectives: 
- Work with functions (built-in and user-defined), lambda functions, and list comprehensions

### Executive Summary


You will work with the Forest Fires Data Set from UCI.  

Information about the dataset: https://archive.ics.uci.edu/ml/datasets/Forest+Fires

Background: This dataset was used in a regression task, where the aim was to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.

We will apply some of the steps leading to an ML task.

### Instructions

Run the pre-populated code, and along the way, you will be asked to perform several graded tasks <span style="color:blue">(prompted in blue font)</span>.  
Show your code and solutions clearly in the cells following each question.   
When the file is completed, submit the notebook through Collab.

**TOTAL POINTS: 14**

---


In [None]:
import pandas as pd
import numpy as np

#### Read in the dataset from the UCI Machine Learning Repository  

In [None]:
path_to_data = "https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv"
fire = pd.read_csv(path_to_data)

In [None]:
fire.head(3)

**Working with spatial coordinates X, Y**

X - x-axis spatial coordinate within the Montesinho park map: 1 to 9  
Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9

In [None]:
# extract the spatial coords

X, Y = fire.X.values, fire.Y.values

**<span style="color:blue">(2 PTS) 1. Write a function called `coord_builder()` with these requirements:</span>**

- takes X, Y as inputs
- contains a docstring with short description of the function
- uses the zip() function (details: https://realpython.com/python-zip-function/)
- builds and returns a list of tuples [(x1,y1), (x2,y2), ..., (xn,yn)] where (xi,yi) are the ordered pairs from X, Y

Hint: You'll need to call list() on the zipped object to show the results


**<span style="color:blue">(1 PT) 2. Call your `coord_builder()` function, passing in X, Y.  
    Please subset the returned list to show a list with only the FIRST FIVE TUPLES. </span>**

**Working with AREA**

In [None]:
# extract values for area
area = fire.area.values

**<span style="color:blue">(1 PT) 3. Write code to print the minimum area and maximum area in a tuple
(min_value, max_value) where the min_value, max_value are floats.</span>** 

**<span style="color:blue">(2 PTS) 4. Write a lambda function that computes the following transformation of a variable:</span>**   
```    
    logarithm(base10) of [1 + x]
```

**<span style="color:blue">Then call the lambda function on *area*, printing the LAST 10 values.</span>**  
Hint: numpy has a function that can be applied to an array.

**Working with MONTH**

month - month of the year: 'jan' to 'dec'

In [None]:
month = fire.month.values

**<span style="color:blue">(1 PT) 5. Call the `unique()` function (from the numpy package) on *month*, printing the unique months:</span>**   

**<span style="color:blue">(1 PT) 6. Write a list comprehension to select all months starting with letter 'a' from *month*.   
    Next, call set() on the result, to get the unique months starting with letter 'a'. Print this result.</span>**   

**Working with DMC**  
DMC - DMC index from the FWI system: 1.1 to 291.3  

In [None]:
dmc = fire.DMC.values

**<span style="color:blue">(2 PTS) 7. Write a function called `bandpass_filter()` with these requirements:</span>**

- takes three inputs: 
  - a numpy array to be filtered
  - an integer serving as a lower bound L
  - an integer serving as an upper bound U
- returns a new array containing only the values from the original array which are greater than L AND less than U

**<span style="color:blue">(1 PT) 8. Call `bandpass_filter()` passing DMC as the array, L=25, U=35, printing the result. </span>**


**Working with FFMC**  
FFMC - FFMC index from the FWI system: 18.7 to 96.20

In [None]:
FFMC = fire.FFMC.values

**<span style="color:blue">(2 PTS) 9. Write a function called `sum_sq_err()` with these requirements:</span>**

- take a numpy array as input
- compute the mean of the array, mu
- using a for-loop, compute the squared deviation of each array element xi from the mean, (xi - mu)**2  
Hint: it may be helpful to keep a running sum of the squared deviations


- computes the sum of squared deviations
- returns the sum of squared deviations

**<span style="color:blue">(1 PT) 10. Call `sum_sq_err()` passing FFMC as the array, printing the result. </span>**

---  