In [1]:
#: imports!

import numpy as np
import babypandas as bpd

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

# Lecture 7

## Group, Merge, Conditionals, and Iteration

# Grouping (Again)

Sub-groups

## Our familiar NBA data...

In [3]:
#: read from csv and relabel
nba = bpd.read_csv('data/nba_salaries.csv').set_index('PLAYER')
nba = nba.assign(SALARY=nba.get("'15-'16 SALARY")).drop(columns="'15-'16 SALARY")
nba

## How big is each team?

- We know how to do this: `.groupby()`.
- **Notice**: team names become the row labels.

In [7]:
...

## How much does each team pay in payroll?

- Instead of counting, we want to sum the `SALARY` column.

In [8]:
...

## How many of each position does each team have?

- We want to count...
- but sizes of groups within groups.
- i.e., sizes of position groups within teams.

In [9]:
...

## `.groupby()` with subgroups

- To make groups within groups (with groups, etc.)...
- Pass a list of column names to `.groupby()`:

```
table.groupby([col_1, col_2, col_3])
```
- Groups `col_1` first.
- Within each group, groups by `col_2`,
- So on

## Notice the index...

- This is a "MultiIndex"
- We won't worry about those...
- Use `.reset_index()` to move index back to columns.

In [10]:
nba.groupby(['TEAM', 'POSITION']).count()

## Which team has the most centers?

In [11]:
position_counts = ...
position_counts

In [12]:
# select only the centers
...

## Example: Sea Temperatures

- The sea surface temperature in La Jolla, every day since August 22, 1916

In [13]:
sea_temp = bpd.read_csv('data/sea_temp.csv')
sea_temp

## What was the hottest month (average temp)?

In [14]:
# define table `hottest` using `sea_temp`, in descending order by temp
...

In [15]:
hottest.get('YEAR').iloc[0]

In [16]:
hottest.get('MONTH').iloc[0]

## Bonus Plot

- Yearly average surface temperature

In [17]:
sea_temp.groupby('YEAR').mean().plot(y='SURFACE_TEMP')

## Summary: `.groupby`

- Pass a list of columns to make subgroups.
- *Always* use `.reset_index()` after to move index to columns.

# Merge

Combining columns from two different tables

## Example

In [18]:
products = bpd.DataFrame().assign(
    Location=['Cups', 'Cups', 'Cups', 'Art of Espresso', 'Art of Espresso', 'Perks', 'Perks'],
    Product=['Green Tea', 'Latte', 'Drip Coffee', 'Espresso', 'Latte', 'Drip Coffee', 'Green Tea'],
    Price=[1.25, 2.50, 1.00, 2.00, 3.00, 1.25, 1.50]
)
products

## Example

In [19]:
coupons = bpd.DataFrame().assign(
    Location=['Cups', 'Art of Espresso'],
    Discount=[.25, .10]
)
coupons

## How do we calculate discounted price of each product?

- Idea: "cross-reference" tables.
- I.e., for each row in `products`, find discount in `coupons` for that row's `Location`.
- This is what `.merge()` does:

In [20]:
with_discounts = products.merge(coupons, left_on='Location', right_on='Location')
with_discounts

In [21]:
with_discounts.assign(
    Discounted=with_discounts.get('Price') * with_discounts.get('Discount')
)

## Merging

- Pick a "left" table and a "right" table.
- Choose a column from each to "merge on".

<img src="data/merge.png" />

## `.merge()` method

```python
left_table.merge(
    right_table, 
    left_on=left_column_name,
    right_on=right_column_name
)
```
- `left_on` and `right_on` should be column names (don't have to be the same)
- one row for every match
- deletes rows that don't match!

## What if column names don't match?

In [22]:
cafes = coupons.assign(
    Cafe=coupons.get('Location')
).drop(columns='Location')
cafes

In [23]:
products.merge(cafes, left_on='Location', right_on='Cafe')

## What if we want to "merge on" an index?

- Instead of using `left_on` or `right_on`, use `left_index=True` or `right_index=True`

In [24]:
coupons_by_location = coupons.set_index('Location')
coupons_by_location

In [25]:
products.merge(
    coupons_by_location, 
    left_on='Location', 
    right_index=True
)

# Finish Line

Those are all of the table methods we'll learn.

With the exception of `table.sample`, which we'll see soon.

# Booleans and Conditionals

## Booleans

- A **Boolean** variable is either true or false.
    - yes or no
    - on or off
    - 0 or 1
- Named after George Boole.
- In Python: 
    - we have the `bool` type, `True` and `False` literals.
    - `and`, `or`, `not` operators.

In [26]:
x = True

In [27]:
type(x)

## The `not` operator

- Flips a `True` to a `False`, and a `False` to a `True`.

In [28]:
is_sunny = True

not is_sunny

## The `and` operator

- Placed between two `bool`s.
- `True` if *both* are true, otherwise `False`.

In [29]:
is_sunny = True
is_warm = False

is_sunny and is_warm

## The `or` operator

- Placed between two `bool`s.
- `True` if at least one of them is `True`, otherwise `False`.

In [30]:
is_sunny = True
is_warm = False

is_sunny or is_warm

## Building expressions

- We can chain together longer expressions.
- Parsed from left to right.
- But use parenthesis to make things clearer.

In [31]:
is_sunny = True
is_warm = False
is_humid = True

is_humid and not is_sunny or is_warm

## Discussion question

    a = True
    b = True
    not(((not a) and b) or ((not b) or a))
    
What does the expression evaluate to?

- A) `True`
- B) `False`
- C) 32.7

In [33]:
#: let's see...
...

## Comparisons

- Comparisons produce `bool`s:

In [34]:
4 > 2

## Comparison operators

Operator | Description
-------------| ----------
`>` | greater than
`>=` | greater than or equal to
`<` | less than
`<=` | less than or equal to
`==` | equals
`!=` | not equals

## Careful!

- Note that there's a difference between `=` and `==`.
- Using the wrong one can result in a `SyntaxError`.

In [35]:
3 = 5

## Conditionals

- Do something if an expression is `True`.
- Syntax (don't forget the colon):


    if <condition>:
        <body>
            
- Indentation matters!

In [36]:
#: in San Diego
is_sunny = True

if is_sunny:
    print('Wear sunglasses!')

## Conditionals

- `else`: do something else if condition is `False`

In [37]:
#: in San Diego
is_sunny = False

if is_sunny:
    print('Wear sunglasses')
else:
    print('Stay inside')

## Conditionals

- `elif`: If condition is `False`, check another condition
- "Falls through" until first `True` condition.
- But doesn't continue after that.
- "Catch" everything that falls through with `else`

In [38]:
#: in San Diego
is_raining = True
is_warm = False
is_sunny = True

if is_raining:
    print('Grab an umbrella')
elif is_warm:
    print('Wear shorts')
elif is_sunny:
    print('Wear sunglasses')
else:
    print('All conditions false!')

## Example: sign function

Write a function that takes a single number and prints "positive" if it is a positive number and "negative" if it is a negative number.

In [39]:
def sign(x):
    if x > 0:
        print('positive')
    elif x < 0:
        print('negative')
    else:
        print('neither!')

In [40]:
sign(7)

In [41]:
sign(-2)

In [42]:
sign(0)

## Example: the other one

- Develop a function which takes a 2-element array and a value.
- If the value is:
    - the first element, return the second.
    - the second element, return the first.
    
    
    >>> choices = np.array(['moon', 'sun'])
    >>> other_one(choices, 'moon')
    sun
    >>> other_one(choices, 'sun')
    moon

In [43]:
#- define `other_one(arr, value)`
def other_one(arr, value):
    if value == arr[0]:
        return arr[1]
    elif value == arr[1]:
        return arr[0]
    else:
        print('Invalid input!')

## Discussion question

```
def func(a, b):
    if (a + b > 4 and b > 0):
        return 'foo'
    elif (a*b >= 4 or b < 0):
        return 'bar'
    else:
        return 'baz'
```

What is returned when `func(2, 2)` is called?

- A) foo
- B) bar
- C) baz
- D) more than one of the above

## Using parenthesis...

Instead of:

    if (a + b > 4 and b > 0):
        ...

You might prefer: 

    if (a + b > 4) and (b > 0):
        ...
        
They do the same thing, because comparison operators are evaluated first.

Fun fact: if `a = 2`, and `b = 2`, `a + b > (4 and b) > 0` evaluates to `True`.

# Iteration

We can use Python to help automate our job at NASA:

In [44]:
#: counting down...
import time

print("Launching in...")
print("t-minus", 10)
time.sleep(1)
print("t-minus", 9)
time.sleep(1)
print("t-minus", 8)
time.sleep(1)
print("t-minus", 7)
time.sleep(1)
print("t-minus", 6)
time.sleep(1)
print("t-minus", 5)
time.sleep(1)
print("t-minus", 4)
time.sleep(1)
print("t-minus", 3)
time.sleep(1)
print("t-minus", 2)
time.sleep(1)
print("t-minus", 1)
time.sleep(1)
print("Blast off!")

## Better approach: use a `for`-loop.

In [45]:
print("Launching in...")

for t in [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]:
    print("t-minus", t)
    time.sleep(1)
    
print("Blast off!")

## `for`-loops

- Do something for every value in a sequence
- Syntax (don't forget the colon):

```
for <loop variable> in <sequence>:
    <body>
```

- Indentation matters!


In [46]:
#: loop variable can be anything
for x in [1, 2, 3, 4]:
    print(x ** 2)

## Ranges

- We can use `np.arange` to create sequences to iterate over:

In [47]:
#: count to 9, starting from 0
for x in np.arange(10):
    print(x)

In [48]:
#: countdown
for x in np.arange(10, 0, -1):
    print(x)

## Iterating over array by indexing

In [49]:
#: use np.arange(size)
flavors = np.array(['Chocolate', 'Vanilla', 'Strawberry'])

for index in np.arange(flavors.size):
    print('Flavor number', index, 'is', flavors[index])

In [50]:
# using enumerate()
for index, flavor in enumerate(flavors):
    print('Flavor number', index, 'is', flavor)

## Building an array by iterating

- How many letters are in each name?
- We want to save our results!
- Use `np.append`: appends an element to end of array.

In [51]:
#: names
names = ['Winona', 'Xanthippe', 'Yvonne', 'Zelda']

# empty array
lengths = np.array([])

for name in names:
    lengths = np.append(lengths, len(name))
    
lengths