In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

# Recap

## Arithmetic on numbers

## Built-in functions

## Variables
- Create
- Use
- Overwrite

## Errors
- undefined name
- unsupported operand
- wrong number of arguments
- invalid keyword argument
- division by zero

## Types 
- int
- float
- string/text
- bool
- table `Table.read_table('data/skyscrapers.csv')`
- `type()` function

**Question:** what will `type(int(float('3.14159')))` return?

--- 
Back to slides

---

# Arrays
Arrays are ordered "lists" of elements that can be directly accessed by location.

## Making Arrays
**Exercise**: Make an array of 4 numbers using `make_array()`

**Exercise:** Arrays can be any type. Make an array of `Strings` called `string_array`:


**Exercise:** Mixing types (Strings, Numbers, Booleans).  Make an array of multiple types:

**Question**: What is the type of `weird_array`?

## Simple Arithmetics on Arrays

Assume, we have:
- an array of heights given in inches 
- an array of feet
- an array of masses in kg
    
We want to:
- convert the heights to centimeters (1 ft = 12 in; 1 in = 2.54 cm)
- calculate avg height
    - manuall
    - using np function
- convert back to inch and ft
- add an offset
- calculate BMI
    - $BMI = m/h^2$ 
    - $[m] = kg$, $[h]=m$

In [None]:
heights_inches = make_array(8, 6, 3, 5)
heights_feet = make_array(5, 5, 5, 6)
mass_kg = make_array(68, 75 ,67, 80)

### Aggregation Operations

You will often need to compute summaries of an array like the `sum`, `max`, or the `min`.  These are all **member functions** of an array.  Here is the documentation on all the **[member functions](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)** for arrays.

**Exercise:** Use the `sum`, `min`, `mean`, and `max` operations to summarize the cool numbers array.

## Ranges
We use ranges to make arrays of number sequence easily.  The numpy `np.arange(start, stop, step)` function produce an array starting at `start` and ending *before* `stop`, in increments of `step`.

In [None]:
help(np.arange)

**Exercise:** Make an array of the nubmers 0 through 6:

Can we writer it shorter?

**Question 1:** can we create an array from 0 to 100, including 100, with a step increase of 10?

**Question 2:** can we create an array from 100 to 200, not including 200, with a step increase of 10?

**Challenge question:** can we create an array that *decreases* from 10 to 0 (including both 10 and 0)?

**Exercise:** What will the following produce:

```python
np.arange(40, -1, -5) 
```

## Accessing Elements

For this exercise lets start with this array of strings.

**Question:** how do we get the largest item in the `heights` array. Hint: use `np.sort()`

In [None]:
string_array = make_array("cat", "dog", "bird")
string_array

You can use `array_name.item( NUMBER )` to get an element from an array.

**Exercise:** What will the following expression return?

```python
string_array.item(1)
```

**Bonus!** This is called **array indexing**.  There is a shorter "equivalent" syntax that people will often use. However, for this class you only need to know about `.item()`.

```python
string_array[ INDEX ]
```

**Exercise:** Use the `len` function to determine the length of the string array.

Arrays also have a **member variable** `array_name.size` that contains the size of the array.  

**Exercise:** Use the size **member variable** to check the size of the array:


## Common Bugs

**Exercise:** What happens if we run the following:

```python
a = make_array(0,1,2,3)
bigger_array = make_array(1,2,3,4,5)
a * bigger_array
```

In [None]:
# a = make_array(0, 1,2,3)
#bigger_array = make_array(1,2,3,4,5)
#a * bigger_array

**Exercise:** What happens if I run the following:

```python
uhoh = make_array(0,1,2,3)
a / uhoh
```

In [None]:
#uhoh = make_array(0,1,2,3)
#a / uhoh

**Exercise:** What happens if I run the following:

```python
a.item(4)
```

In [None]:
#a.item(4)

**Exercise:** What happens if I run the following:

```python
a.item(-1)
```

In [None]:
#a.item(-1)

# Tables
Tables are Made of Arrays

We are covering arrays because this is the mathematical object that is returned when we work on specific columns of a table. Here we load a table of NBA salaries from a local file `nba_salaries.csv`.

In [None]:
nba = Table.read_table('data/nba_salaries.csv')
nba

Let's focus on the **Golden State Warriors**.

**Exercise:** Use the `my_table.where` function to select the rows where team is the `"Golden State Warriors"`.

We can also select columns by name. 

**Exercise**: Make a table with just the `"name"` and `"salary"` columns. 


**Exercise:** Compute the average average salary of the warriors.  Which of the following works?

*Option (A):*
```python
warriors.mean()
```

*Option (B):*
```python
warriors.select("salary").mean()
```

*Option (C):*
```python
warriors.column("salary").mean()
```

**Exercise:** Would the following work?

```python
np.average(warriors.select("salary"))
```

In [None]:
np.average(warriors.select("salary"))

Why?

**Exercise:** Use `np.average` to compute the average salary of the Warriors:

<details><summary>Solution</summary>
   
```python
np.average(warriors.column("salary"))
```
    
</details></br></br>

**Exercise:** Compute the difference in the average salaries of the warriors and the `"Los Angeles Lakers"`.

<details><summary>Solution</summary>
   
```python
lakers = nba.where('team', 'Los Angeles Lakers')
warriors.column('salary').mean() - lakers.column('salary').mean()
```
    
</details></br></br>

## Creating a Table from Arrays

Let's start with an array of street names.

In [None]:
streets = make_array('Embarcadero De Norte', 
                     'Embarcadero De Mar', 
                     'Camino Pescadero', 
                     'Camino Del Sur')
streets

We can make an empty table (no rows, no columns, no problems ...).

The `Table()` function makes an empty table.

In [None]:
empty_table = Table()
empty_table

**Exercise:** Check that the empty table has 0 rows and 0 columns
 using the `num_rows` and `num_columns` attribute

**Exercise:** Use the `table.with_column` function to add a column to the table and save the new table as `IV`.

**Exercise:** What is the output of:
```python
empty_table.with_column("Streets", streets)
empty_table.num_columns
```

</br></br></br></br>

**Exercise:** Can you do the same thing without using `empty_table`? Hint: use `Table()` directly.

**Exercise:** Extend the IV table to include the blocks from campus (use `np.arange`). ([map](https://goo.gl/maps/2pnJvxSWfmNaKJvUA))

**Exercise:** Build the entire table with blocks from campus in one call to the `Table.with_columns()` function.

---
Back to slides

---

# Case Study: Understanding the [W. E. B. Du Bois](https://en.wikipedia.org/wiki/W._E._B._Du_Bois) Visualization

![Picture from Wikipedia](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/W.E.B._Du_Bois_by_James_E._Purdy%2C_1907_%28cropped%29.jpg/167px-W.E.B._Du_Bois_by_James_E._Purdy%2C_1907_%28cropped%29.jpg)

**From Wikipedia:**  *William Edward Burghardt Du Bois (/djuːˈbɔɪs/ dew-BOYSS;[1][2] February 23, 1868 – August 27, 1963) was an American sociologist, socialist, historian, and Pan-Africanist civil rights activist. Born in Great Barrington, Massachusetts, Du Bois grew up in a relatively tolerant and integrated community. After completing graduate work at the University of Berlin and Harvard University, where he was the first African American to earn a doctorate, he became a professor of history, sociology, and economics at Atlanta University. Du Bois was one of the founders of the National Association for the Advancement of Colored People (NAACP) in 1909.*

For more context on the visualization in lecture checkout [Du Bois’ Data Portraits Tell A Story About Black Life In Georgia And Beyond](https://www.wabe.org/du-bois-data-portraits/)



In [1]:
du_bois = Table.read_table('data/du_bois.csv')
du_bois

NameError: name 'Table' is not defined

**Exercise:** Compute the amount of money spent on food and add it to the table and add it to the table as `"FOOD $"`:

**Exercise:** Use the table functions we learned this week to find the income bracket ("class") that spent the most money on rent.

**Bonus:** use the `set_format()` function to display the shares as percentages (using the `PercentFormatter`)