# Lab 4: Conditionals, Iterations, and Sampling

Welcome to Lab 4! This week, we will go over more table organizing techniques, conditionals and iteration, and introduce the concept of randomness. Refer to the [textbook](https://eldridgejm.github.io/dive_into_data_science/front.html) for help. This lab is due **Monday (07/27) at 11:59pm**


First, set up the tests and imports by running the cell below.

In [1]:
import numpy as np
import babypandas as bpd

# These lines set up graphing capabilities.
import matplotlib
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

#Don't change this part
import otter
otter = otter.Notebook('tests')

%matplotlib inline

## 1. Nachos and Conditionals

In Python, Boolean values can either be `True` or `False`. We get Boolean values when using comparison operators, among which are `<` (less than), `>` (greater than), and `==` (equal to). For a complete list, refer to [this](https://www.pylenin.com/img/comparison-operators/comparison-table-2.png).

Run the cell below to see an example of a comparison operator in action.

In [2]:
3 > 1 + 1

We can even assign the result of a comparison operation to a variable.

In [3]:
result = 10 / 2 == 5
result

Arrays are compatible with comparison operators. The output is an array of boolean values.

In [4]:
np.array([1, 5, 7, 8, 3, -1]) > 3

Waiting on the dining table just for you is a hot bowl of nachos! Let's say that whenever you take a nacho, it will have cheese, salsa, both, or neither (just a plain tortilla chip). 

Using the function call `np.random.choice(array_name)`, let's simulate taking nachos from the bowl at random. Start by running the cell below several times, and observe how the results change.

In [5]:
nachos = np.array(['cheese', 'salsa', 'both', 'neither'])
np.random.choice(nachos)

**Question 1.** Assume we took ten nachos at random, and stored the results in an array called `ten_nachos`. Find the number of nachos with only cheese using code (do not hardcode the answer).  

*Hint:* Our solution involves a comparison operator and the `np.count_nonzero` function.

In [6]:
ten_nachos = np.array(['neither', 'cheese', 'both', 'both', 'cheese', 'salsa', 'both', 'neither', 'cheese', 'both'])

In [7]:
number_cheese = ...
number_cheese

In [8]:
otter.check('q1_1')

**Conditional Statements**

A conditional statement is made up of multiple lines of code that allow Python to choose from different alternatives based on whether some condition is true.

Here is a basic example.

```
def sign(x):
    if x > 0:
        return 'Positive'
```

How the function works is if the input `x` is greater than `0`, we get the string `'Positive'` back.

If we want to test multiple conditions at once, we use the following general format.

```
if <if expression>:
    <if body>
elif <elif expression 0>:
    <elif body 0>
elif <elif expression 1>:
    <elif body 1>
...
else:
    <else body>
```

Only one of the bodies will ever be executed. Each `if` and `elif` (else-if) expression is evaluated and considered in order, starting at the top. As soon as a true value is found, the corresponding body is executed, and the rest of the expression is skipped. If none of the `if` or `elif` expressions are true, then the `else body` is executed. For more examples and explanation, refer to [Section 9.1](http://sierra.ucsd.edu/dsc10-book/chapters/09/1/Conditional_Statements.html).

**Question 2.** Complete the following conditional statement so that the string `'More please'` is assigned to `say_please` if the number of nachos with cheese in `ten_nachos` is less than `5`.

In [9]:
say_please = '?'

if ...:
    say_please = 'More please'
    
say_please

In [11]:
otter.check('q1_2')

**Question 3.** Write a function called `nacho_reaction` that returns a string based on the type of nacho passed in. From top to bottom, the conditions should correspond to: `'cheese'`, `'salsa'`, `'both'`, `'neither'`.  

In [12]:
def nacho_reaction(nacho):
    if ...:
        return 'Cheesy!'
    # next condition should return 'Spicy!'
    ...
    # next condition should return 'Wow!'
    ...
    # next condition should return 'Meh.'
    ...

spicy_nacho = nacho_reaction('salsa')
spicy_nacho

In [14]:
otter.check('q1_3')

**Question 4.** Add a column `'Reactions'` to the table `ten_nachos_reactions` that consists of reactions for each of the nachos in `ten_nachos`. 

*Hint:* Use the `apply` method. 

In [15]:
ten_nachos_reactions = bpd.DataFrame().assign(Nachos=ten_nachos)
ten_nachos_reactions

In [16]:
ten_nachos_reactions = ...
ten_nachos_reactions

In [17]:
otter.check('q1_4')

**Question 5.** Using code, find the number of `'Wow!'` reactions for the nachos in `ten_nachos_reactions`.  Think about how you could find this both by using table methods or by using `np.count_nonzero`.

In [18]:
number_wow_reactions = ...
number_wow_reactions

In [19]:
otter.check('q1_5')

**Question 6:** Change some of the `==`s in the expression below to something else (like `<` or `>`) so that `should_be_true` is `True`.

In [20]:
should_be_true = number_cheese == number_wow_reactions == np.count_nonzero(ten_nachos == 'neither')
should_be_true

In [21]:
should_be_true = ...
should_be_true

In [22]:
otter.check('q1_6')

**Question 7.** Complete the function `both_or_neither`, which takes in a table of nachos with reactions (with the same labels as the one from Question 4) and returns `'Wow!'` if there are more nachos with both cheese and salsa, or `'Meh.'` if there are more nachos with neither. If there are an equal number of each, return `'Okay!'`.

In [23]:
def both_or_neither(nacho_table):
    reactions = ...
    number_wow_reactions = ...
    number_meh_reactions = ...
    if ...:
        return 'Wow!'
    # next condition should return 'Meh.'
    ...
    # next condition should return 'Okay!'
    ...
    
many_nachos = bpd.DataFrame().assign(Nachos=np.random.choice(nachos, 250))
many_nachos = many_nachos.assign(Reactions=many_nachos.get("Nachos").apply(nacho_reaction))
result = both_or_neither(many_nachos)
result

In [25]:
otter.check('q1_7')

## 2. Iteration
Using a `for` statement, we can perform a task multiple times. This is known as iteration. Here, we'll simulate drawing different suits from a deck of cards. 

In [26]:
suits = np.array(["♤", "♡", "♢", "♧"])

draws = np.array([])

repetitions = 6

for i in np.arange(repetitions):
    draws = np.append(draws, np.random.choice(suits))

draws

Another use of iteration is to loop through a set of values. For instance, we can print out all of the colors of the rainbow.

In [27]:
rainbow = np.array(["red", "orange", "yellow", "green", "blue", "indigo", "violet"])

for color in rainbow:
    print(color)

We can see that the indented part of the `for` loop, known as the body, is executed once for each item in `rainbow`. Note that the name `color` is arbitrary; we could easily have named it something else.

**Question 1.** Clay is playing darts. His dartboard contains ten equal-sized zones with point values from 1 to 10. Write code using `np.random.choice` that simulates his total score after 1000 dart tosses.

In [28]:
possible_point_values = ...
tosses = 1000

total_score = ...

total_score

In [30]:
otter.check('q2_1')

**Question 2.** What is the average point value of a dart thrown by Clay?

In [31]:
average_score = ...
average_score

In [32]:
otter.check('q2_2')

**Question 3.** In the following cell, we've loaded the text of _Pride and Prejudice_ by Jane Austen, split it into individual words, and stored these words in an array. Using a `for` loop, assign `longer_than_five` to the number of words in the novel that are more than 5 letters long.  Look at [Section 9.2](http://sierra.ucsd.edu/dsc10-book/chapters/09/2/Iteration.html) if you get stuck, the textbook is very useful!

*Hint*: You can find the number of letters in a word with the `len` function.

In [33]:
austen_string = open('Austen_PrideAndPrejudice.txt', encoding='utf-8').read()
p_and_p_words = np.array(austen_string.split())

...

longer_than_five

In [35]:
otter.check('q2_3')

**Question 4.** Using simulation with 10,000 trials, assign `chance_of_all_different` to an estimate of the chance that if you pick three words from Pride and Prejudice uniformly at random (with replacement), they all have different lengths.

In [36]:
trials = 10000

...

chance_of_all_different

In [38]:
otter.check('q2_4')

## 3. Finding Probabilities
After a long day of class, Clay decides to go to Pines for dinner. Today's menu has Clay's four favorite foods: enchiladas, hamburgers, pizza, and spaghetti. However, each dish has a 30% chance of running out before Clay can get to Pines.

**Question 1.** What is the probability that Clay will be able to eat pizza at Pines?

In [39]:
pizza_prob = ...
pizza_prob

In [40]:
otter.check('q3_1')

**Question 2.** What is the probability that Clay will be able to eat all four of these foods at Pines?

In [41]:
all_prob = ...
all_prob

In [42]:
otter.check('q3_2')

**Question 3.** What is the probability that Pines will have run out of something (anything) before Clay can get there?

In [43]:
something_is_out = ...
something_is_out

In [44]:
otter.check('q3_3')

To make up for their unpredictable food supply, Pines decides to hold a contest for some free HDH Dining swag. There is a bag with two red marbles, two green marbles, and two blue marbles. Clay has to draw three marbles without replacement. In order to win, all three of these marbles must be of different colors.

**Question 4.** What is the probability of Clay winning the contest?

In [45]:
winning_prob = ...
winning_prob

In [46]:
otter.check('q3_4')

## 4. California National Parks
This part of the lab will help you get acquainted with the Table operations `merge` and `group`.
You can read more about them in [Chapter 8](http://sierra.ucsd.edu/dsc10-book/chapters/08/Functions_and_Tables.html) of your textbook.

We'll begin by importing our two datasets, california_parks.csv and california_parks_species.csv, which provide information on California National Parks and their species, respectively. These are a subset of a [larger dataset the National Parks Service provides](https://www.kaggle.com/nationalparkservice/park-biodiversity).

In [47]:
parks = bpd.read_csv("california_parks.csv")
species = bpd.read_csv("california_parks_species.csv")
parks_species = bpd.DataFrame().assign(
    count=species.groupby('Park Name').count().get('Category')
)
parks

**Question 1.** Say we want to see which National Park has the most amount of species (highest biodiversity), but the species count are in a different table, `parks_species`.

Use the `merge` command to make a new table `parks_with_species`, which will have the parks' existing information and the number of species each has. Make sure the table only has one row per park containing the count of species. Your table should look like this:

<img width=75% src="./merge-result.png"/>

In [48]:
parks_with_species = ...
parks_with_species

In [49]:
otter.check('q4_1')

**Species Abundance** The next question will ask you about the species abundances at each park. Take a second to look at the species table to get acquainted with it's components.

**Question 2.** Each park has a lot of different species, and each species varies in abundance at each park. Using the `groupby` command assign the variable `species_abundance` to a DataFrame that *classifies* the parks by both Park Name and Abundance.

Hint: Reset the index and assign columns so that you have three columns: 'Park Name', 'Abundance', and 'Count'. Your table should look like this:

<img width=40% src="./groupby-result.png"/>

In [50]:
species_abundance = ...
species_abundance

In [52]:
otter.check('q4_2')

# Finish Line

Congratulations! You are done with lab04.

To submit your assignment:

1. Select `Kernel -> Restart & Run All` to ensure that you have executed all cells, including the test cells.
2. Read through the notebook to make sure everything is fine and all tests passed.
3. Run the cell below to run all tests, and make sure that they all pass.
4. Download your notebook using `File -> Download as -> Notebook (.ipynb)`, then upload your notebook to Gradescope.

In [53]:
# For your convenience, you can run this cell to run all the tests at once!
otter.check_all()