# Lecture 7

## Group, Join, Conditionals, Iteration

# Announcements

- The project is due on Sunday evening.
- No homework this week (due to project) but there is a lab due Thursday.
- Midterm study resources have been posted on Piazza. Exam is May 2 in class.

# Grouping

Classifying variables

In [None]:
#: imports!

import numpy as np
from datascience import *

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

## Our familiar NBA data...

In [None]:
#: read from csv and relabel
nba = Table.read_table('nba_salaries.csv').relabeled("'15-'16 SALARY", 'SALARY')
nba

## How big is each team?

- We know how to do this: `.group()`.
- Can visualize distribution of team sizes with `.hist()`.

In [None]:
nba.group('TEAM')#.hist('count')

## How much does each team pay in payroll?

- Instead of counting, we want to sum the `SALARY` column.

- `sum` is applied to all columns (besides `TEAM`)
- Notice how columns get renamed automatically.
- But we can't sum all columns. E.g., `PLAYER`.
- In those cases: empty column.

## Which position has the highest average salary?

- We need to group by position.
- Within each group, find the average.
- Then sort by average salary.

In [None]:
nba.group('POSITION', np.mean)#.sort('SALARY mean', descending=True)

## What is the max salary of each position?

- Group by position.
- Within each group, use `max`.

In [None]:
nba.group('POSITION', max)

## Discussion question

Does Zaza Pachulia play for the Washington Wizards?

A. Yes  
B. No  
C. I cannot tell from this table.

## For each position, which team has the most players at that position?

- We want to count...
- but sizes of groups within groups.
- i.e., sizes of position groups within teams.

In [None]:
nba.group(['TEAM', 'POSITION'])#.sort('count', descending=True).sort('POSITION', distinct=True)

## What are the number of players at each position on *every* team?

In [None]:
nba.group(['TEAM', 'POSITION'])

## A better approach: `.pivot()` to create a two-way table

In [None]:
nba.pivot('POSITION', 'TEAM')

## `.pivot()` can do more than count...

- What is the *average* salary of each position on every team?

In [None]:
nba.pivot('POSITION', 'TEAM', 'SALARY', np.mean)

# Join

Combining columns from two different tables

## Example: Drinks

In [None]:
#: table of products
products = Table(['Location', 'Product', 'Price']).with_rows([
    ['Cups', 'Green Tea', 1.25],
    ['Cups', 'Latte', 2.50],
    ['Cups', 'Drip Coffee', 1.00],
    ['Art of Espresso', 'Espresso', 2.00],
    ['Art of Espresso', 'Latte', 3.00],
    ['Perks', 'Drip Coffee', 1.25],
    ['Perks', 'Green Tea', 1.50]
])
products

## Example: Drinks

In [None]:
#: table of coupons
#: discounts are percentages off

coupons = Table(['Location', 'Discount']).with_rows([
    ['Cups', .10],
    ['Art of Espresso', .25]
])
coupons

## How do we calculate discounted price of each product?

- Idea: "cross-reference" tables.
- I.e., for each row in `products`, find discount in `coupons` for that row's `Location`.
- This is what `.join()` does:

In [None]:
discounted = products.join('Location', coupons)
discounted

In [None]:
discounted.with_column(
    'Discounted Price',
    np.round(discounted.column('Price') * (1 - discounted.column('Discount')), 2)
)

## The `.join()` method:

- `this_table.join(common_column, that_table)`
- Only contains rows with values of `common_column` which appear in *both* tables.
    - For example, Perks was omitted.
- What if the "common columns" have different names?
- `this_table.join(this_column, that_table, that_column)`

## Common Columns with Different Names

In [None]:
cafes = coupons.relabeled('Location', 'Cafe')
cafes

In [None]:
products

In [None]:
products.join('Location', cafes, 'Cafe')

# Booleans and Conditionals

## Booleans

- A **Boolean** variable is either true or false.
    - yes or no
    - on or off
    - 0 or 1
- In Python: 
    - `bool` type
    - `True` and `False` literals
    - `and`, `or`, `not` operators

In [None]:
x = True

In [None]:
type(x)

## The `not` operator

- Flips a `True` to a `False`, and a `False` to a `True`.

In [None]:
is_sunny = True

not is_sunny

## The `and` operator

- Placed between two `bool`s.
- `True` if *both* are true, otherwise `False`.

In [None]:
is_sunny = True
is_warm = False

is_sunny and is_warm

## The `or` operator

- Placed between two `bool`s.
- `True` if at least one of them is `True`, otherwise `False`.

In [None]:
is_sunny = True
is_warm = False

is_sunny or is_warm

## Building expressions

- We can chain together longer expressions.
- Parsed from left to right.
- But use parenthesis to make things clearer.

In [None]:
is_sunny = True
is_warm = False
is_humid = True

(is_humid and (not is_sunny)) or is_warm

## Discussion Question

What does the expression below evaluate to?

- A) `True`
- B) `False`
- C) I'm lost.

In [None]:
a = True
b = True
not(((not a) and b) or ((not b) or a))

## Comparisons

- Comparisons produce `bool`s:

In [None]:
4 > 2

## Comparison operators

Operator | Description
-------------| ----------
`>` | greater than
`>=` | greater than or equal to
`<` | less than
`<=` | less than or equal to
`==` | equals
`!=` | not equals

## Careful!

- Note that there's a difference between `=` and `==`.
- Using the wrong one can result in a `SyntaxError`.

In [None]:
3 = 5

## Conditionals

- Do something if an expression is `True`.
- Syntax (don't forget the colon):


    if <condition>:
        <body>
            
- Indentation matters!

In [None]:
#: in San Diego
is_sunny = True

if is_sunny:
    print('Wear sunglasses!')

## Conditionals

- `else`: do something else if condition is `False`

In [None]:
#: in San Diego
is_sunny = False

if is_sunny:
    print('Wear sunglasses')
else:
    print('Stay inside')

## Conditionals

- `elif`: If original condition is `False`, check another condition.
    - stands for "else, if"
- Checks conditions one by one until first `True` condition is found, then stops.
- "Catch" everything that remains with `else`.

In [None]:
#: in San Diego
is_raining = False
is_warm = True
is_sunny = True

if is_raining:
    print('Get an umbrella')
elif is_warm:
    print('Wear shorts')
elif is_sunny:
    print('Wear sunglasses')
else:
    print('All conditions false!')

## Example: sign function

Write a function that takes a single number and prints
- "positive" if it is a positive number
- "negative" if it is a negative number
- "neither" if it is zero

In [None]:
def sign(x):
    if x > 0:
        print('positive')
    elif x < 0:
        print('negative')
    else:
        print('neither')

In [None]:
sign(7)

In [None]:
sign(-2)

In [None]:
sign(0)

## Discussion question

```
def func(a, b):
    if (a + b > 4 and b > 0):
        return 'foo'
    elif (a*b >= 4 or b < 0):
        return 'bar'
    else:
        return 'baz'
```

What is returned when `func(2, 2)` is called?

- A) foo
- B) bar
- C) baz
- D) more than one of the above

## Using parenthesis...

Instead of:

    if (a + b > 4 and b > 0):
        ...

You might prefer: 

    if (a + b > 4) and (b > 0):
        ...
        
They do the same thing, because comparison operators are evaluated first.

## Example: the other one

- Develop a function which takes a 2-element array and a value.
- If the value is:
    - the first element, return the second.
    - the second element, return the first.
    
    
    >>> choices = make_array('moon', 'sun')
    >>> other_one(choices, 'moon')
    sun
    >>> other_one(choices, 'sun')
    moon

In [None]:
def other_one(a, value):
    if value == a.item(0):
        return a.item(1)
    elif value == a.item(1):
        return a.item(0)
    else:
        print('Invalid input!')

In [None]:
choices = make_array('moon', 'sun')
other_one(choices, 'moon')

# Iteration

We can use Python to help automate our job at NASA:

In [None]:
#: counting down...
import time

print("Launching in...")
print("t-minus", 10)
time.sleep(1)
print("t-minus", 9)
time.sleep(1)
print("t-minus", 8)
time.sleep(1)
print("t-minus", 7)
time.sleep(1)
print("t-minus", 6)
time.sleep(1)
print("t-minus", 5)
time.sleep(1)
print("t-minus", 4)
time.sleep(1)
print("t-minus", 3)
time.sleep(1)
print("t-minus", 2)
time.sleep(1)
print("t-minus", 1)
time.sleep(1)
print("Blast off!")

## Better approach: use a `for`-loop.

In [None]:
print("Launching in...")

for t in [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]:
    print("t-minus", t)
    time.sleep(1)
    
print("Blast off!")

## `for`-loops

- Do something for every value in a sequence
- Syntax (don't forget the colon):

```
for <loop variable> in <sequence>:
    <body>
```

- Indentation matters!


In [None]:
#: loop variable can be anything
for x in [1, 2, 3, 4]:
    print(x ** 2)

## Ranges

- We can use `np.arange` to create sequences to iterate over:

In [None]:
#: count to 9, starting from 0
for x in np.arange(10):
    print(x)

In [None]:
#: countdown
for x in np.arange(10, 0, -1):
    print(x)

## Iterating over array by indexing

In [None]:
#: use np.arange(size)

flavors = make_array('Chocolate', 'Vanilla', 'Strawberry')

for index in np.arange(flavors.size):
    print('Flavor at index', index, 'is', flavors.item(index))

## Building an array by iterating

- How many letters are in each name?
- We want to save our results!
- Use `np.append`: appends an element to end of array.

In [None]:
#: names
names = ['Whitney', 'Xiang', 'Yekaterina', 'Zahara']

name_lengths = make_array()

for name in names:
    name_lengths = np.append(name_lengths, len(name))
    
name_lengths