In [1]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

# Recap

## Arithmetic on numbers

In [4]:
(1 + 4) / 5

1.0

In [5]:
2 ** 4

16

In [6]:
3 * 4

12

## Built-in functions

In [7]:
abs(5)

5

In [8]:
abs(-5)

5

In [11]:
help(abs)

Help on built-in function abs in module builtins:

abs(x, /)
    Return the absolute value of the argument.



In [10]:
abs?

In [12]:
min(4, -4)

-4

## Variables / Names
- Create
- Use
- Overwrite

In [13]:
a = 3

In [15]:
a = 5

In [16]:
a

5

## Errors
- undefined name
- unsupported operand
- wrong number of arguments
- invalid keyword argument
- division by zero

In [18]:
b

NameError: name 'b' is not defined

In [20]:
animal = 'dog'

In [23]:
1 + animal

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [24]:
abs(4, 5)

TypeError: abs() takes exactly one argument (2 given)

In [25]:
help(abs)

Help on built-in function abs in module builtins:

abs(x, /)
    Return the absolute value of the argument.



In [30]:
round(number=1.2435, ndigits=1)

1.2

In [31]:
round(ndigits=1, number=1.2435)

1.2

In [32]:
round(number=1.2435, ndigit=1)

TypeError: 'ndigit' is an invalid keyword argument for round()

In [27]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



In [33]:
1/0

ZeroDivisionError: division by zero

## Types 
- int
- float
- string/text
- bool
- table `Table.read_table('data/skyscrapers.csv')`
- `type()` function

In [38]:
type(a)

int

**Question:** what will `type(int(float('3.14159')))` return?

In [39]:
type(int(float('3.14159')))

int

--- 
Back to slides

---

# Arrays
Arrays are ordered "lists" of elements that can be directly accessed by location.

## Making Arrays
**Exercise**: Make an array of 4 numbers using `make_array()`

In [45]:
make_array(3, 7, 9, 1)

dtype('int64')

**Exercise:** Arrays can be any type. Make an array of `Strings` called `string_array`:

In [43]:
make_array('a', 'y', 'd', 'food')

array(['a', 'y', 'd', 'food'],
      dtype='<U4')


**Exercise:** Mixing types (Strings, Numbers, Booleans).  Make an array of multiple types:

In [44]:
make_array('Cat', 1, True)

array(['Cat', '1', 'True'],
      dtype='<U21')

**Question**: What is the type of `weird_array`?

## Simple Arithmetics on Arrays

Assume, we have:
- an array of heights given in inches 
- an array of feet
- an array of masses in kg
    
We want to:
- convert the heights to centimeters (1 ft = 12 in; 1 in = 2.54 cm)
- calculate avg height
    - manuall
    - using np function
- convert back to inch and ft
- add an offset
- calculate BMI
    - $BMI = m/h^2$ 
    - $[m] = kg$, $[h]=m$

In [47]:
heights_inches = make_array(8, 6, 3, 5)
heights_feet = make_array(5, 5, 5, 6)
mass_kg = make_array(68, 75 ,67, 80)

In [58]:
heights_inches_total = heights_feet * 12 + heights_inches
heights_inches_total

array([68, 66, 63, 77])

In [62]:
np.floor(heights_inches_total / 12)

array([ 5.,  5.,  5.,  6.])

In [63]:
heights_inches_total % 12

array([8, 6, 3, 5])

In [60]:
height_m = heights_inches_total * 2.54 / 100
height_m

array([ 1.7272,  1.6764,  1.6002,  1.9558])

In [64]:
mass_kg / height_m**2

array([ 22.79416324,  26.6873812 ,  26.16533326,  20.91419261])

### Aggregation Operations

You will often need to compute summaries of an array like the `sum`, `max`, or the `min`.  These are all **member functions** of an array.  Here is the documentation on all the **[member functions](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)** for arrays.

**Exercise:** Use the `sum`, `min`, `mean`, and `max` operations to summarize the cool numbers array.

In [65]:
np.mean(height_m)

1.7399

In [69]:
np.sum(height_m) / 4

1.7399

In [66]:
np.max(height_m)

1.9558000000000002

In [67]:
np.min(height_m)

1.6002000000000001

## Ranges
We use ranges to make arrays of number sequence easily.  The numpy `np.arange(start, stop, step)` function produce an array starting at `start` and ending *before* `stop`, in increments of `step`.

In [None]:
help(np.arange)

**Exercise:** Make an array of the nubmers 0 through 6:

In [70]:
make_array(0, 1, 2, 3, 4, 5, 6)

array([0, 1, 2, 3, 4, 5, 6])

Can we writer it shorter?

In [73]:
np.arange(0, 7, 1)

array([0, 1, 2, 3, 4, 5, 6])

In [89]:
np.arange(0, 7)

array([0, 1, 2, 3, 4, 5, 6])

In [87]:
np.arange(7)

array([0, 1, 2, 3, 4, 5, 6])

**Question 1:** can we create an array from 0 to 100, including 100, with a step increase of 10?

In [80]:
np.arange(start=0, stop=110, step=10)

array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

**Question 2:** can we create an array from 100 to 200, not including 200, with a step increase of 10?

In [81]:
np.arange(100, 200, 10)

array([100, 110, 120, 130, 140, 150, 160, 170, 180, 190])

**Challenge question:** can we create an array that *decreases* from 10 to 0 (including both 10 and 0)?

In [85]:
np.arange(10, -1, -1)

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

**Exercise:** What will the following produce:

```python
np.arange(40, -1, -5) 
```

In [84]:
np.arange(40, -1, -5) 

array([40, 35, 30, 25, 20, 15, 10,  5,  0])

## Accessing Elements

For this exercise lets start with this array of strings.

**Question:** how do we get the largest item in the `heights` array. Hint: use `np.sort()`

In [90]:
string_array = make_array("cat", "dog", "bird")
string_array

array(['cat', 'dog', 'bird'],
      dtype='<U4')

You can use `array_name.item( NUMBER )` to get an element from an array.

**Exercise:** What will the following expression return?

```python
string_array.item(1)
```

In [94]:
string_array.item(1)

'dog'

**Bonus!** This is called **array indexing**.  There is a shorter "equivalent" syntax that people will often use. However, for this class you only need to know about `.item()`.

```python
string_array[ INDEX ]
```

In [95]:
string_array[1]

'dog'

**Exercise:** Use the `len` function to determine the length of the string array.

Arrays also have a **member variable** `array_name.size` that contains the size of the array.  

**Exercise:** Use the size **member variable** to check the size of the array:


## Common Bugs

**Exercise:** What happens if we run the following:

```python
a = make_array(0,1,2,3)
bigger_array = make_array(1,2,3,4,5)
a * bigger_array
```

In [None]:
# a = make_array(0, 1,2,3)
#bigger_array = make_array(1,2,3,4,5)
#a * bigger_array

**Exercise:** What happens if I run the following:

```python
uhoh = make_array(0,1,2,3)
a / uhoh
```

In [None]:
#uhoh = make_array(0,1,2,3)
#a / uhoh

**Exercise:** What happens if I run the following:

```python
a.item(4)
```

In [None]:
#a.item(4)

**Exercise:** What happens if I run the following:

```python
a.item(-1)
```

In [None]:
#a.item(-1)

# Tables
Tables are Made of Arrays

We are covering arrays because this is the mathematical object that is returned when we work on specific columns of a table. Here we load a table of NBA salaries from a local file `nba_salaries.csv`.

In [96]:
nba = Table.read_table('data/nba_salaries.csv')
nba

PLAYER,POSITION,TEAM,'15-'16 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625
Jeff Teague,PG,Atlanta Hawks,8.0
Kyle Korver,SG,Atlanta Hawks,5.74648
Thabo Sefolosha,SF,Atlanta Hawks,4.0
Mike Scott,PF,Atlanta Hawks,3.33333
Kent Bazemore,SF,Atlanta Hawks,2.0
Dennis Schroder,PG,Atlanta Hawks,1.7634
Tim Hardaway Jr.,SG,Atlanta Hawks,1.30452


Let's focus on the **Golden State Warriors**.

**Exercise:** Use the `my_table.where` function to select the rows where team is the `"Golden State Warriors"`.

In [97]:
nba.where('TEAM', "Golden State Warriors")

PLAYER,POSITION,TEAM,'15-'16 SALARY
Klay Thompson,SG,Golden State Warriors,15.501
Draymond Green,PF,Golden State Warriors,14.2609
Andrew Bogut,C,Golden State Warriors,13.8
Andre Iguodala,SF,Golden State Warriors,11.7105
Stephen Curry,PG,Golden State Warriors,11.3708
Jason Thompson,PF,Golden State Warriors,7.00847
Shaun Livingston,PG,Golden State Warriors,5.54373
Harrison Barnes,SF,Golden State Warriors,3.8734
Marreese Speights,C,Golden State Warriors,3.815
Leandro Barbosa,SG,Golden State Warriors,2.5


We can also select columns by name. 

**Exercise**: Make a table with just the `"name"` and `"salary"` columns. 


In [99]:
nba.select("PLAYER", "'15-'16 SALARY")

PLAYER,'15-'16 SALARY
Paul Millsap,18.6717
Al Horford,12.0
Tiago Splitter,9.75625
Jeff Teague,8.0
Kyle Korver,5.74648
Thabo Sefolosha,4.0
Mike Scott,3.33333
Kent Bazemore,2.0
Dennis Schroder,1.7634
Tim Hardaway Jr.,1.30452


**Exercise:** Compute the average average salary of the warriors.  Which of the following works?

*Option (A):*
```python
warriors.mean()
```

*Option (B):*
```python
warriors.select("salary").mean()
```

*Option (C):*
```python
warriors.column("salary").mean()
```

**Exercise:** Would the following work?

```python
np.average(warriors.select("salary"))
```

In [None]:
np.average(warriors.select("salary"))

Why?

**Exercise:** Use `np.average` to compute the average salary of the Warriors:

<details><summary>Solution</summary>
   
```python
np.average(warriors.column("salary"))
```
    
</details></br></br>

**Exercise:** Compute the difference in the average salaries of the warriors and the `"Los Angeles Lakers"`.

<details><summary>Solution</summary>
   
```python
lakers = nba.where('team', 'Los Angeles Lakers')
warriors.column('salary').mean() - lakers.column('salary').mean()
```
    
</details></br></br>

## Creating a Table from Arrays

Let's start with an array of street names.

In [100]:
streets = make_array('Embarcadero De Norte', 
                     'Embarcadero De Mar', 
                     'Camino Pescadero', 
                     'Camino Del Sur')
streets

array(['Embarcadero De Norte', 'Embarcadero De Mar', 'Camino Pescadero',
       'Camino Del Sur'],
      dtype='<U20')

We can make an empty table (no rows, no columns, no problems ...).

The `Table()` function makes an empty table.

In [101]:
empty_table = Table()
empty_table

**Exercise:** Check that the empty table has 0 rows and 0 columns
 using the `num_rows` and `num_columns` attribute

In [102]:
empty_table.num_columns

0

In [103]:
empty_table.num_rows

0

**Exercise:** Use the `table.with_column` function to add a column to the table and save the new table as `IV`.

In [106]:
iv = empty_table.with_column('Street Names', streets)
iv

Street Names
Embarcadero De Norte
Embarcadero De Mar
Camino Pescadero
Camino Del Sur


**Exercise:** What is the output of:
```python
empty_table.with_column("Streets", streets)
empty_table.num_columns
```

</br></br></br></br>

**Exercise:** Can you do the same thing without using `empty_table`? Hint: use `Table()` directly.

In [110]:
Table().with_column('Street Names', streets)

Street Names
Embarcadero De Norte
Embarcadero De Mar
Camino Pescadero
Camino Del Sur


**Exercise:** Extend the IV table to include the blocks from campus (use `np.arange`). ([map](https://goo.gl/maps/2pnJvxSWfmNaKJvUA))

In [112]:
blocks = np.arange(1, 5, 1)

In [114]:
iv.with_column('blocks', blocks)

Street Names,blocks
Embarcadero De Norte,1
Embarcadero De Mar,2
Camino Pescadero,3
Camino Del Sur,4


**Exercise:** Build the entire table with blocks from campus in one call to the `Table.with_columns()` function.

In [118]:
Table().with_columns('Street Names', streets, 'blocks', blocks)

Street Names,blocks
Embarcadero De Norte,1
Embarcadero De Mar,2
Camino Pescadero,3
Camino Del Sur,4


---
Back to slides

---

# Case Study: Understanding the [W. E. B. Du Bois](https://en.wikipedia.org/wiki/W._E._B._Du_Bois) Visualization

![Picture from Wikipedia](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/W.E.B._Du_Bois_by_James_E._Purdy%2C_1907_%28cropped%29.jpg/167px-W.E.B._Du_Bois_by_James_E._Purdy%2C_1907_%28cropped%29.jpg)

**From Wikipedia:**  *William Edward Burghardt Du Bois (/djuːˈbɔɪs/ dew-BOYSS;[1][2] February 23, 1868 – August 27, 1963) was an American sociologist, socialist, historian, and Pan-Africanist civil rights activist. Born in Great Barrington, Massachusetts, Du Bois grew up in a relatively tolerant and integrated community. After completing graduate work at the University of Berlin and Harvard University, where he was the first African American to earn a doctorate, he became a professor of history, sociology, and economics at Atlanta University. Du Bois was one of the founders of the National Association for the Advancement of Colored People (NAACP) in 1909.*

For more context on the visualization in lecture checkout [Du Bois’ Data Portraits Tell A Story About Black Life In Georgia And Beyond](https://www.wabe.org/du-bois-data-portraits/)



In [1]:
du_bois = Table.read_table('data/du_bois.csv')
du_bois

NameError: name 'Table' is not defined

**Exercise:** Compute the amount of money spent on food and add it to the table and add it to the table as `"FOOD $"`:

**Exercise:** Use the table functions we learned this week to find the income bracket ("class") that spent the most money on rent.

**Bonus:** use the `set_format()` function to display the shares as percentages (using the `PercentFormatter`)