In [70]:
# Initialize Otter
import otter
grader = otter.Notebook("practice.ipynb")

In [71]:
import practice_test

# Lab-P5: Looping Patterns and Hurricane API

**WARNING:** Please go through Segment 1 of [lab-p5](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-f22-projects/-/tree/main/lab-p5) **before** you start to solve this notebook.

## Segment 2: Learning the API

### Task 2.1: Examine the `hurricanes` CSV file

The `project.py` file will allow you to access the dataset you'll use this week, `hurricanes.csv`. We generated this data file by writing a Python program to extract data from several lists of hurricanes over the Atlantic Ocean on Wikipedia (here is an [example](https://en.wikipedia.org/wiki/2022_Atlantic_hurricane_season)). You can take a look at the script `gen_csv.ipynb` yourself. At the end of the semester, you will be able to write it yourself.

Open `hurricanes.csv` with Microsoft Excel or some other Spreadsheet viewer and look at the hurricanes in the dataset. The data shows:

* name
* the date of formation
* the date of dissipation
* max wind speed (in mph)
* damage (in US dollars)
* deaths

Often, we'll organize data by assigning numbers (called **indexes**) to different parts of the data (e.g., rows or columns in a table). In Computer Science, indexing typically starts with the number `0`; i.e., when you have a sequence of things, you'll start counting them from `0` instead of `1`. Thus, you should **ignore the numbers shown by your Spreadsheet Viewer to the left of the rows**. From the perspective of `project.py`, the indexes of `1804 New England hurricane`, `1806 Great Coastal hurricane`, and `1812 Louisiana hurricane` are `0`, `1`, and `2` respectively (and so on).

For example, consider this example from `hurricanes.csv` as viewed from Microsoft Excel:

![table.PNG](attachment:table.PNG)

The **index** for the `1812 Louisiana hurricane` is `2` but its actual **location** is `3`, and it is on **row** `4` of the table. Therefore, you must follow this convention for all the questions
asking for the value at a particular **index**.

### Task 2.2: Explore the API
Use the inspection process we learned in [lab-p3](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-f22-projects/-/tree/main/lab-p3) and [lab-p4](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-f22-projects/-/tree/main/lab-p3) to know more details of the 'project' API. In lab-p3, we saw how to use `dir`, and `help` to learn the API. Run the following cells to explore the API:

In [72]:
# it is considered a good coding practice to place all import statements at the top of the notebook
# please place all your import statements in this cell if you need to import any more modules for this project
import project

In [73]:
# use the 'dir' function to learn more about the project API.

Spend some time reading about each of the seven functions that don't begin with two underscores. For example, run this to learn about `count`:

In [6]:
help(project.count) 

Help on function count in module project:

count()
    This function will return the number of records in the dataset



Alternatively, you could run the following to just see the function's documentation:

In [7]:
print(project.count.__doc__)

This function will return the number of records in the dataset


You may also open up the `project.py` file directly to learn about the functions provided. E.g., you might see this:

```python
def count():
    """This function will return the number of records in the dataset"""
    return len(__hurricane__)
```

You don't need to understand the code in the functions, but the strings in triple quotes (called *docstrings*) explain what each function does. As it turns out, all `project.count.__doc__` is providing you is the docstring of the `count` function.

Try to learn other functions in `project.py`, by using `help` function. For example, you may try: 

In [75]:
help(project.get_name)
help(project.get_formed)
help(project.get_formed)

Help on function get_name in module project:

get_name(idx)
    get_name(idx) returns the name of the hurricane in row idx

Help on function get_formed in module project:

get_formed(idx)
    get_formed(idx) returns the date of formation of the hurricane in row idx



In [13]:
# now try getting help for the other functions in the `project` module
dir(project)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__hurricane__',
 '__init__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'count',
 'get_damage',
 'get_deaths',
 'get_dissipated',
 'get_formed',
 'get_mph',
 'get_name']

### Task 2.2.1: Getting familiar with `project.py`

You will now demonstrate your familiarity with the functions inside the `project` module by answering a few simple questions. You must have already imported the `project` module to this notebook. Make sure you placed the `import` statememnt at the **top** of the notebook in the designated cell.

**Remember:** In Computer Science, we start indexing at `0`.

**Question 1.1:** What is the `name` of the hurricane at **index** `0`? 

In [9]:
# we have done this for you!
name_idx0 = project.get_name(0)

name_idx0

'1804 New England hurricane'

In [10]:
grader.check("q1-1")

**Question 1.2:** What is the `name` of the hurricane at **index** `1`? 

In [11]:
# replace the ... below with your code
name_idx1 = project.get_name(1)
name_idx1

'1806 Great Coastal hurricane'

In [12]:
grader.check("q1-2")

**Question 1.3:** What is the speed in `mph` of the hurricane at **index** `7`? 

In [14]:
# replace the ... below with your code
mph_idx7 = project.get_mph(7)
mph_idx7

105

In [15]:
grader.check("q1-3")

**Question 1.4:** What is the `damage` in dollars caused of the hurricane at **index** `5`? 

In [18]:
# replace the ... below with your code
damage_idx5 = project.get_damage(5)
damage_idx5

'1M'

In [19]:
grader.check("q1-4")

Notice that the damage amount ends with a "M". In this dataset, "K" represents one thousand, "M" represents one million, and "B" represents one billion. For p5, you'll need to convert these strings to integers (e.g., `"1.5K"` will become `1500`, `"2.55M"` will become `2550000`).

**Question 2:** What is the `name` of the **last** hurricane in the dataset?

In [20]:
# we have done this for you!
name_idx_last = project.get_name(project.count() - 1)
name_idx_last

'Fiona'

In [21]:
grader.check("q2")

Now, let us try to get the `name` at index `project.count()` instead. What happens? Why? Feel free to reach out to your TA/PM, if you are not sure.

In [23]:
# execute this cell without changing anything
project.get_name(project.count()-1)

'Fiona'

## Segment 3: Working with strings

### Task 3.1: Indexing / slicing Strings

Stepping back from the Hurricane data, Tasks 3.1 and 3.2 introduce us to performing operations with strings. While this will be covered in more detail during Friday's lecture, we will cover the essentials now.

We can think of a string as a sequence of characters. For example, the string `my_str = 'hello_world!'` can be written as...

| index  | 0    | 1    | 2    | 3    | 4    | 5    | 6    | 7    | 8    | 9    | 10   | 11   |
| ------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| string | h    | e    | l    | l    | o    | _    | w    | o    | r    | l    | d    | !    |

... where we can then access specific characters of the string by an index, e.g. `my_str[0]` which returns `'h'` or `my_str[8]` which returns `'r'`.

Furthermore, we can "slice" strings -- that is, get a particular section of characters. For example,

- `my_str[1:5]` returns `'ello'`
- `my_str[:8]` returns `'hello_wo'`
- `my_str[5:]` returns `'_world!'`
- `my_str[:]` returns `'hello_world!'`

Try running this in the cell below.

In [24]:
my_str = 'hello_world!'
print("my_str[0] returns", my_str[0])
print("my_str[8] returns", my_str[8])
print("my_str[1:5] returns", my_str[1:5])
print("my_str[:8] returns", my_str[:8])
print("my_str[5:] returns", my_str[5:])
print("my_str[:] returns", my_str[:])

my_str[0] returns h
my_str[8] returns r
my_str[1:5] returns ello
my_str[:8] returns hello_wo
my_str[5:] returns _world!
my_str[:] returns hello_world!


Notice that slicing is *inclusive* on the lower bound and *exclusive* on the upper bound. We can also leave out a bound to start from the beginning (e.g. `my_str[:6]`) or the end (e.g. `my_str[8:]`). Lastly, a negative index will count *backwards* from the *end* of the string.

Try running the cell below.

In [25]:
print("my_str[-1] returns", my_str[-1])
print("my_str[-4:-1] returns", my_str[-4:-1])

my_str[-1] returns !
my_str[-4:-1] returns rld


**Your Turn!** Try slicing the below phone number! Can you extract the area code (first 3 digits), exchange code (middle 3 digits), and line number (last 4 digits) of the given phone number?

**Question 3.1:** What is the **last digit** of the phone number: `608-867-5309`?

In [26]:
# replace the ... with your code
phone_number = "608-867-5309"
last_digit = phone_number[11
                    ]

last_digit

'9'

In [27]:
grader.check("q3-1")

**Question 3.2:** What is the **area code** (i.e., the first three characters) of the phone number: `608-867-5309`?

In [30]:
# replace the ... with your code
phone_number = "608-867-5309"
area_code = phone_number[:3]

area_code

'608'

In [31]:
grader.check("q3-2")

**Question 3.3:** What is the **line number** (i.e., the last four characters) of the phone number: `608-867-5309`?

In [32]:
# replace the ... with your code
phone_number = "608-867-5309"
line_number = phone_number[8:]

line_number

'5309'

In [33]:
grader.check("q3-3")

**Question 3.4:** What is the **exchange code** (i.e., middle three characters) of the phone number: `608-867-5309`?

In [34]:
# replace the ... with your code
phone_number = "608-867-5309"
exchange_code = phone_number[4:7]

exchange_code

'867'

In [35]:
grader.check("q3-4")

**Question 4.1:** What is the **department code** (i.e., the letters at the start) of the course: `CS220`?

In [36]:
course = 'CS220'
dept_code = course[:2]

dept_code

'CS'

In [37]:
grader.check("q4-1")

**Question 4.2:** What is the **course code** (i.e., the numbers at the end) of the course: `CS220`?

In [38]:
course = 'CS220'
course_code = course[2:5]

course_code

'220'

In [39]:
grader.check("q4-2")

After that short detour, we will now go back to working on the hurricane dataset.

### Task 3.2: Calculating Damage Costs

`Q1.4` showed us that damage costs are represented as strings with suffixes for thousands, millions, and billions.

We can **index** the last character of these damages to find the suffix. We can then potentially use it to determine whether the suffix represents a thousand, million, or a billion.

**Question 5.1:** What is the **suffix** (i.e., the last character) of the cost `"3.19B"`?

In [40]:
# replace the ... with your code
cost = "3.19B"
suffix = cost[4]

suffix

'B'

In [41]:
grader.check("q5-1")

**Question 5.2:** How many billions are there in the cost `"3.19B"`?

Just as we found the suffix by **indexing**, we can also find the number by **slicing**. Answer the question by slicing the string to obtain the number of billions, and typecasting the string into a float.

In [42]:
# replace the ... with your code
cost = "3.19B"
billions = float(cost[:4])

billions

3.19

In [43]:
grader.check("q5-2")

### Task 3.3: Slicing dates

Run the below cell which prints the formation and dissipation date of the first hurricane.

In [None]:
print(project.get_formed(0))
print(project.get_dissipated(0))

The dates are represented as a string in `mm/dd/yyyy` notation. Two digits are used to represent the month and day even when they can be represented with a single digit, that is, `'9/4/1804'` is represented as `'09/04/1804'`.

To extract the month, we could run the following code...

In [None]:
project.get_formed(0)[:2]

Notice, however, that this is the *string* `'09'`.

Write the code to get this as the *int* (e.g. `9`).

**Question 6:** In which `month` did the hurricane at **index** `0` form?

Your answer **must** be an `int` between `1` and `12`. You **must not** hardcode the answer, but use the appropriate function from the `project` module to find the date of formation of the hurricane.

In [46]:
# replace the ... with your code
month_idx0 = int(project.get_formed(0)[1:2])
month_idx0

9

In [47]:
grader.check("q6")

### Task 3.4: Helper Functions for Month, Day, and Year

The below functions will be useful in p5. Solve the below questions for getting the day, and year as an int. The function to get the month has already been done for you.

In [48]:
def get_month(date):
    """get_month(date) returns the month when the date is the in the 'mm/dd/yyyy' format"""
    return int(date[:2])

You can confirm that `get_month` works by running the cell below.

In [49]:
month = get_month("10/05/2022")
month

10

### Task 3.4.1: Define `get_year(date)`

You must now define this function, which will take in the `date` as a `str` and return the `year` as an `int`.

In [60]:
def get_year(date):
    return int(date[6:10])

**Question 7:** What is the `year` in the date `"10/05/2022"`?

You **must** answer this question by calling the `get_year` function.

In [61]:
# replace the ... with your code
year = get_year("10/05/2022")
year

2022

In [62]:
grader.check("q7")

### Task 3.4.2: Define `get_day(date)`

You must now define this function, which will take in the `date` as a `str` and return the `day` as an `int`.

In [63]:
def get_day(date):
    """get_day(date) returns the day when the date is the in the 'mm/dd/yyyy' format"""
    return int(date[3:5])

**Question 8:** What is the `day` in the date `"10/05/2022"`?

You **must** answer this question by calling the `get_day` function.

In [64]:
# replace the ... with your code
day = get_day("10/05/2022")
day

5

In [65]:
grader.check("q8")

### Task 3.5: Using Helper Functions

Using the helper functions you made above, complete the following questions.

**Hint:** You'll use these helper functions in combination with `project.get_formed(idx)` and `project.get_dissipated(idx)`!

**Question 9:** On what `day` did the hurricane at **index** `100` **form**?

You **must** answer this question by calling the `get_day` function.

In [67]:
help(project.get_formed(idx))

NameError: name 'idx' is not defined

In [80]:
# replace the ... with your code
day_formed_idx100 = get_day(project.get_formed(100))
day_formed_idx100

13

In [81]:
grader.check("q9")

**Question 10:** On what `year` did the hurricane at **index** `200` **form**?

You **must** answer this question by calling the `get_year` function.

In [82]:
# replace with your code
year_formed_idx200 = get_year(project.get_formed(200))
year_formed_idx200

1979

In [83]:
grader.check("q10")

**Question 11:** On what `month` did the hurricane at **index** `300` **dissipate**?

You **must** answer this question by calling the `get_month` function.

In [84]:
# replace the ... with your code
month_diss_idx300 = get_month(project.get_formed(300))
month_diss_idx300

8

In [85]:
grader.check("q11")

## Segment 4: Looping

### Task 4.1: `while` and `for` loops

Run the below code and observe the output.

In [86]:
i = 0
while i < 5:
    print(i)
    i += 1

0
1
2
3
4


Equivalently, we can use `for` and `range(n)`. The `range(n)` function returns a sequence of numbers, from `0` to `n` but not including `n`.

In [87]:
for i in range(5):
    print(i)

0
1
2
3
4


Now, we will try to use `while` and `for` loops to answer a few simple questions.

**Question 12:** What is the sum of the numbers *0 to 25*, both inclusive?

You **must** answer this with a `while` loop. Ask your TA/PM if you are not sure what to do.

In [88]:
i = 0
sum_while = 0 # replace the ... with the correct initial value for the sum
while i <= 25: # replace the ... with the correct comparison operator
    sum_while += i 
    i += 1
    
sum_while

325

In [89]:
grader.check("q12")

**Question 13:** What is the sum of the numbers *0 to 25*, both inclusive?

You **must** answer this with a `for` loop. Ask your TA/PM if you are not sure what to do.

In [104]:
# replace the ... with your code
sum_for = 0
for i in range(26):
    sum_for += i
    i += 1
sum_for

325

In [105]:
grader.check("q13")

### Task 4.2: Looping through hurricanes

You have had some practice with simple looping structures. You will now loop through the hurricanes dataset.

Run the below code and observe the output.

In [108]:
for idx in range(10):
    print(project.get_name(idx))

1804 New England hurricane
1806 Great Coastal hurricane
1812 Louisiana hurricane
1821 Norfolk and Long Island hurricane
1848 Tampa Bay hurricane
1867 San Narciso hurricane
1875 Indianola hurricane
Gale of 1878
1886 Indianola hurricane
1887 Halloween tropical storm


Can you make the code above display the year of the formation of the first 10 hurricanes? How about the first 15 hurricanes? Please feel free to reach out to your TA/PM and ask them for help, if you face any issues.

You are now ready to answer some interesting questions with loops.

**Question 14:** What is the **total** `deaths` caused by the **first** `10` hurricanes in the dataset?

In [118]:
# replace the ... with your code
total_deaths_first10 = 0
for idx in range(10):
    total_deaths_first10 += project.get_deaths(idx)

total_deaths_first10

1920

In [119]:
grader.check("q14")

**Question 15:** What is the **average** speed (in `mph`) of **all** the hurricanes in the dataset?

In [120]:
# replace the ... with your code
sum_wind_speed = 0
for idx in range(project.count()):
    sum_wind_speed += project.get_mph(idx)
average_wind_speed = sum_wind_speed/project.count()

average_wind_speed

99.44241316270566

In [121]:
grader.check("q15")

### Task 4.3: Filtering

You will now *filter* the data using an `if` condition as you loop through the dataset.

**Question 16:** How many hurricanes caused **more than** `1000` deaths in the dataset?

In [134]:
# replace the ... with your code
num_hurr_1000_deaths = 0
for idx in range(project.count()): # loop through ALL hurricanes in the dataset; do NOT hardcode the number here
    if project.get_deaths(idx) > 1000: # replace ... with a Boolean expression
        num_hurr_1000_deaths += 1

num_hurr_1000_dea

TypeError: 'function' object is not subscriptable

In [127]:
grader.check("q16")

**Question 17:** How many hurricane `names` **start** with the letter *D* in the dataset?

In [135]:
# compute and store the answer in the variable 'num_hurr_D'
# TODO: initialize the variable 'num_hurr_D'
# TODO: loop through all hurricanes in the dataset
# TODO: update the value of 'num_hurr_D' only if
#       the name of the hurricane at the current idx starts with 'D'

num_hurr_D = 0
for idx in range(project.count()):
    if project.get_name(idx)[0] == "D":
        num_hurr_D += 1
    idx += 1
# display the variable 'num_hurr_D' here
num_hurr_D

40

In [136]:
grader.check("q17")

### Task 4.4: Maximization/Minimization

You will now find the maximum/minimum using loops. Run the following two cells and observe the output.

In [137]:
def f(n):
    return 3 + (n % 7)

for n in range(11):
    print('f(' + str(n) + ') = ' + str(f(n)))

f(0) = 3
f(1) = 4
f(2) = 5
f(3) = 6
f(4) = 7
f(5) = 8
f(6) = 9
f(7) = 3
f(8) = 4
f(9) = 5
f(10) = 6


In [138]:
best_n = 0
for n in range(11):
    if f(n) > f(best_n):
        best_n = n

best_n

6

Can you figure out what the code above is doing? It is using the variable `best_n` to keep track of the value of `n` with the maximum value of `f(n)`. At each iteration of the loop, `best_n` stores the **best** value of `n` observed **so far**. At each iteration of the loop, the code checks if `f(n) > f(best_n)`. If this is the case, then the **new** value `n` has a higher value of `f(n)` than the **previous best** value `best_n`, so `best_n` is **updated** to take the value of `n`, which is now the **new best** seen **so far**. At the end of the loop, `best_n` is the value of `n` for which `f(n)` is maximum.

Notice that this code does **not** find the maximum value of `f(n)`, it finds the value of `n` for which `f(n)` is maximum. This is far more useful than simply finding the maximum value of `f(n)`, as you shall see when you solve p5.

### If the above explanation is not clear, reach out to your TA/PM. You will have to find maximums in p5 and in future projects. It is very important that you understand how this code works.

**Question 18:** What is the `name` of the hurricane which has the **fastest** wind speed (in `mph`)?.

In [141]:
# replace the ... with your code
fastest_idx = 0
for idx in range(project.count()):
    if project.get_mph(idx) > project.get_mph(fastest_idx):
        fastest_idx = idx
fastest_name = project.get_name(fastest_idx)
        
fastest_name

'Allen'

In [142]:
grader.check("q18")

**Question 19:** What is the `name` of the hurricane which has the **slowest** wind speed (in `mph`)?.

You **must** break ties in favor of the hurricanes that appear **first** in the dataset.

In [143]:
# replace the ... with your code
slowest_idx = 0
for idx in range(project.count()):
    if project.get_mph(idx) < project.get_mph(slowest_idx): # replace the ... with a comparison operator
        slowest_idx = idx
slowest_name = project.get_name(slowest_idx)
        
slowest_name

'1975 Tropical Depression Six'

In [144]:
grader.check("q19")

### Task 4.5: More Filtering

You will now create a function that takes in two years `start_year` and `end_year`, and return the number of hurricanes that were formed between these two years (both years inclusive).

You **must** use the `get_year` function you defined above to find the year of formation of each hurricane. 

In [145]:
def count_hurricanes_between(start_year, end_year):
    # replace the ... with your code
    num_hurricanes = 0
    for idx in range(project.count()):
        if get_year(project.get_formed(idx)) >= start_year and get_year(project.get_formed(idx)) <= end_year:
            num_hurricanes += 1
    return num_hurricanes

**Question 20.1:** How many hurricanes were `formed` between *1980 and 2002*, both inclusive?

You **must** answer this question by calling the `count_hurricanes_between` function.

In [147]:
# replace the ... with your code
hurr_between_1980_2002 = count_hurricanes_between(1980,2002)
hurr_between_1980_2002

130

In [148]:
grader.check("q20-1")

**Question 20.2:** How many hurricanes were `formed` between *1901 and 2000*, both inclusive?

You **must** answer this question by calling the `count_hurricanes_between` function.

In [149]:
# replace the ... with your code
hurr_between_1901_2000 = count_hurricanes_between(1901,2000)
hurr_between_1901_2000

294

In [150]:
grader.check("q20-2")

## Great work! You are now ready to start [p5](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-f22-projects/-/tree/main/p5)