In [1]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

# Recap

## Arithmetic on numbers

In [2]:
(1 + 4) / 5 # adding ints and then dividing by an int
# my output, because I'm dividing will be a float (has a decimal place)
# even though 1 is a whole number, it is represented as a float
# default behavior in python is when we divide, whether they're ints or floats, we get a float in return

1.0

In [3]:
2 ** 4

16

In [4]:
3 * 4

12

In [5]:
4.0 - 4 # if we do an operation, like subtract, between a float and an int
# our output will return as a float

0.0

In [6]:
0.0 == 0 # checking for equality here with two "=" equal signs

True

In [7]:
type(0.0)

float

In [8]:
type(0)

int

## Built-in functions

In [9]:
abs(5) # abs is a built-in function in python
# built-in meaning this function is built into the language, we do not need to import any packages to use it

5

In [10]:
abs(-5)

5

In [11]:
help(abs)

Help on built-in function abs in module builtins:

abs(x, /)
    Return the absolute value of the argument.



In [12]:
abs?

[0;31mSignature:[0m [0mabs[0m[0;34m([0m[0mx[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return the absolute value of the argument.
[0;31mType:[0m      builtin_function_or_method

In [13]:
min(-5, 5)

-5

In [14]:
min?

[0;31mDocstring:[0m
min(iterable, *[, default=obj, key=func]) -> value
min(arg1, arg2, *args, *[, key=func]) -> value

With a single iterable argument, return its smallest item. The
default keyword-only argument specifies an object to return if
the provided iterable is empty.
With two or more arguments, return the smallest argument.
[0;31mType:[0m      builtin_function_or_method

In [15]:
min(make_array(1,2,3))

1

In [16]:
min(1,2,3)

1

## Variables
- Create
- Use
- Overwrite

In [23]:
a = 3 # assignment statement, we set the value of 3 equal to a
# we assign the value of 3 to the variable name, a
# by default, python does not display anything to the screen when we do an assignment statement
# if we want to check what's inside the variable we just created above, we need to call on it
a # calling on variable, a to see what's inside
a = 9 # an example here of me not working top to bottom in my notebook
# if i wanted to make an update to the value of a, it would be tidier to add a cell down below

In [19]:
a = 5
a

5

In [21]:
a = 7


In [24]:
a # why is a still 5?

9

## Errors
- undefined name
- unsupported operand
- wrong number of arguments
- invalid keyword argument
- division by zero

In [25]:
# undefined name
b

NameError: name 'b' is not defined

In [26]:
animal = "dog"
1 + animal # unsupported operand (operation) between an int and str data type

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [27]:
abs(4, 5) # wrong number of arguments (inputs)

TypeError: abs() takes exactly one argument (2 given)

In [28]:
round( number = 1.2345, ndigits = 1)


1.2

In [29]:
round( ndigits = 1, number = 1.2345)


1.2

In [30]:
round?

[0;31mSignature:[0m [0mround[0m[0;34m([0m[0mnumber[0m[0;34m,[0m [0mndigits[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Round a number to a given precision in decimal digits.

The return value is an integer if ndigits is omitted or None.  Otherwise
the return value has the same type as the number.  ndigits may be negative.
[0;31mType:[0m      builtin_function_or_method

In [31]:
round( 1, 1.2345) # invalid keyboard argument
# expecting second argument of round to be a int
# first argument can be an int or a float


TypeError: 'float' object cannot be interpreted as an integer

In [32]:
round( 1, 1)

1

In [33]:
round(1.0, 1)

1.0

In [34]:
1 / 0

ZeroDivisionError: division by zero

## Types 
- int
- float
- string/text
- bool
- table `Table.read_table('data/skyscrapers.csv')`
- `type()` function

In [35]:
a

9

In [36]:
type(a)

int

In [37]:
type(1.0)

float

In [38]:
type('1.0')

str

In [39]:
type(True)

bool

In [40]:
type(False)

bool

In [42]:
type(Table.read_table('data/skyscrapers.csv'))
# the above is a Table data type, recognized in the datascience package
# unique data type, to the datascience package
#  not built-in the python language
# to work with this data type, we need to import the datascience package (which we did in the first cell in this notebook)

datascience.tables.Table

**Question:** what will `type(int(float('3.14159')))` return?

In [44]:
type('3.14159') # '3.14159' is a string that contains a float value

str

In [43]:
float('3.14159') # this is a float now, we converted a string that contained a float value to a float data type

3.14159

In [45]:
int(float('3.14159')) # convert 3.14159 to a int, 
# notice some information is lost, we essentially chopped off the decimal place values
# did not round up or round down

3

In [46]:
type(int(float('3.14159')))

int

In [47]:
'4' + '5.6'

'45.6'

In [48]:
int('4' + '5.6') # but we can't convert a string that stores a float value directly to an int
# first we need to convert it to a float, then an int

ValueError: invalid literal for int() with base 10: '45.6'

In [52]:
3 + float('4' + '5.6') # convert a string that stores a float value directly to a float


48.6

In [51]:
3 + int(float('4' + '5.6'))

48

--- 
Back to slides

---

# Arrays
Arrays are ordered "lists" of elements that can be directly accessed by location.

## Making Arrays
**Exercise**: Make an array of 4 numbers using `make_array()`

In [53]:
make_array(1, 2, 3, 4) 
# make_array is a function from the datascience package
# that creates arrays

array([1, 2, 3, 4])

In [55]:
make_array?
help(make_array)

Help on function make_array in module datascience.util:

make_array(*elements)
    Returns an array containing all the arguments passed to this function.
    A simple way to make an array with a few elements.
    
    As with any array, all arguments should have the same type.
    
    >>> make_array(0)
    array([0])
    >>> make_array(2, 3, 4)
    array([2, 3, 4])
    >>> make_array("foo", "bar")
    array(['foo', 'bar'],
          dtype='<U3')
    >>> make_array()
    array([], dtype=float64)



[0;31mSignature:[0m [0mmake_array[0m[0;34m([0m[0;34m*[0m[0melements[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Returns an array containing all the arguments passed to this function.
A simple way to make an array with a few elements.

As with any array, all arguments should have the same type.

>>> make_array(0)
array([0])
>>> make_array(2, 3, 4)
array([2, 3, 4])
>>> make_array("foo", "bar")
array(['foo', 'bar'],
      dtype='<U3')
>>> make_array()
array([], dtype=float64)
[0;31mFile:[0m      ~/.local/lib/python3.11/site-packages/datascience/util.py
[0;31mType:[0m      function

**Exercise:** Arrays can be any type. Make an array of `Strings` called `string_array`:

In [57]:
string_array = make_array('i', 'am','a','string')
string_array

array(['i', 'am', 'a', 'string'],
      dtype='<U6')


**Exercise:** Mixing types (Strings, Numbers, Booleans).  Make an array of multiple types:

In [59]:
make_array('string', 1, 1.2, True) # arrays needs everyone to be the same data type
# and when it isn't, it does a conversion to one data type
# here, they're all strings
make_array( 1, 1.2) # converted all elements to a float

array([ 1. ,  1.2])

In [61]:
type(make_array(1, 1.2).item(0)) # access the first item in my array


float

In [62]:
make_array(1, 1.2, True)

# what do you see below? What did python do with my elements -- int, float, boolean value
# is any information lost?


array([ 1. ,  1.2,  1. ])

**Question**: What is the type of elements inside `weird_array`?

In [None]:
weird_array = make_array(1, 1.2, True)
# what do you see below? What did python do with my elements -- int, float, boolean value
# is any information lost?
#Take home!

## Simple Arithmetics on Arrays

Assume, we have:
- an array of heights given in inches 
- an array of heights given in feet
- an array of masses in kg
    
We want to:
- convert the heights to centimeters (1 ft = 12 in; 1 in = 2.54 cm)
- calculate avg height
    - manuall
    - using np function
- convert back to inch and ft
- add an offset
- calculate BMI
    - $BMI = m/h^2$ 
    - $[m] = kg$
    - $[h]=m$

In [63]:
heights_inches = make_array(8, 6, 3, 5)
heights_feet = make_array(5, 5, 5, 6)
mass_kg = make_array(68, 75 ,67, 80)

In [64]:
heights_inches_total = heights_feet * 12 + heights_inches
heights_inches_total

array([68, 66, 63, 77])

In [65]:
np.floor(heights_inches_total / 12)

array([ 5.,  5.,  5.,  6.])

In [67]:
np.average(heights_inches_total) # 5' 8.5"

68.5

In [69]:
sum(heights_inches_total) / 4 # 

68.5

In [None]:
# try on your own
# convert back to inch and ft
# add an offset
# calculate BMI


In [66]:
np.floor?

[0;31mCall signature:[0m  [0mnp[0m[0;34m.[0m[0mfloor[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m            ufunc
[0;31mString form:[0m     <ufunc 'floor'>
[0;31mFile:[0m            /opt/conda/lib/python3.11/site-packages/numpy/__init__.py
[0;31mDocstring:[0m      
floor(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])

Return the floor of the input, element-wise.

The floor of the scalar `x` is the largest integer `i`, such that
`i <= x`.  It is often denoted as :math:`\lfloor x \rfloor`.

Parameters
----------
x : array_like
    Input data.
out : ndarray, None, or tuple of ndarray and None, optional
    A location into which the result is stored. If provided, it must have
    a shape that the inputs broadcast to. If not provided or None,
    a freshly-allocated array is returned. A tuple (possible only as a
    keyword argu

### Aggregation Operations

You will often need to compute summaries of an array like the `sum`, `max`, or the `min`.  These are all **member functions** of an array.  Here is the documentation on all the **[member functions](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)** for arrays.

**Exercise:** Use the `sum`, `min`, `mean`, and `max` operations to summarize the cool numbers array.

In [70]:
heights_inches_total

array([68, 66, 63, 77])

In [71]:
sum(heights_inches_total)

274

In [72]:
max(heights_inches_total)

77

In [73]:
min(heights_inches_total)

63

## Ranges
We use ranges to make arrays of number sequence easily.  The numpy `np.arange(start, stop, step)` function produce an array starting at `start` and ending *before* `stop`, in increments of `step`.

In [74]:
make_array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [75]:
np.arange(1, 11) # start at 1, go up to but not including 11

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [77]:
np.arange(10) # start at 0 go up to, but not including 10
# interestingly, this ensures i have 10 elements in my array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [83]:
np.arange(0, 101, 10)
np.arange(0, 11, 2)
np.arange(0,10,2) # start from 0 go up to 10, not including 10, step increase of 2

array([0, 2, 4, 6, 8])

In [76]:
help(np.arange)

Help on built-in function arange in module numpy:

arange(...)
    arange([start,] stop[, step,], dtype=None, *, like=None)
    
    Return evenly spaced values within a given interval.
    
    ``arange`` can be called with a varying number of positional arguments:
    
    * ``arange(stop)``: Values are generated within the half-open interval
      ``[0, stop)`` (in other words, the interval including `start` but
      excluding `stop`).
    * ``arange(start, stop)``: Values are generated within the half-open
      interval ``[start, stop)``.
    * ``arange(start, stop, step)`` Values are generated within the half-open
      interval ``[start, stop)``, with spacing between values given by
      ``step``.
    
    For integer arguments the function is roughly equivalent to the Python
    built-in :py:class:`range`, but returns an ndarray rather than a ``range``
    instance.
    
    When using a non-integer step, such as 0.1, it is often better to use
    `numpy.linspace`.
    
    


**Exercise:** Make an array of the nubmers 0 through 6:

In [84]:
make_array(0,1,2,3,4,5,6)

array([0, 1, 2, 3, 4, 5, 6])

Can we writer it shorter?

In [91]:
np.arange(0,7)
np.arange(0,7, 1)
np.arange(7)

array([0, 1, 2, 3, 4, 5, 6])

**Question 1:** can we create an array from 0 to 100, including 100, with a step increase of 10?

In [88]:
np.arange(0, 101, 10) # want to do this one to include 100
np.arange(0, 100, 10)

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

**Question 2:** can we create an array from 100 to 200, not including 200, with a step increase of 10?

In [89]:
np.arange(100, 200, 10)

array([100, 110, 120, 130, 140, 150, 160, 170, 180, 190])

**Challenge question:** can we create an array that *decreases* from 10 to 0 (including both 10 and 0)?

In [90]:
np.arange(10, -1, -1)

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0])

**Exercise:** What will the following produce:

```python
np.arange(40, -1, -5) 
```

In [92]:
np.arange(40, -1, -5)

array([40, 35, 30, 25, 20, 15, 10,  5,  0])

## Accessing Elements

For this exercise lets start with this array of strings.

**Question:** how do we get the largest item in the `string_array` array. Hint: use `np.sort()`

In [94]:
string_array = make_array("cat", "dog", "bird")
max(string_array) #

'dog'

In [96]:
'a' < 'b'

True

In [98]:
'cat ' < 'dog'

True

In [99]:
np.sort(string_array)

array(['bird', 'cat', 'dog'],
      dtype='<U4')

In [100]:
1 < 2 < 3

True

In [101]:
'b' < 'c' < 'd'

True

You can use `array_name.item( NUMBER )` to get an element from an array.

**Exercise:** What will the following expression return?

```python
string_array.item(1)
```

In [102]:
string_array.item(1)

'dog'

In [103]:
string_array.item(0)

'cat'

In [104]:
string_array.item(2)

'bird'

In [105]:
string_array.item(3)

IndexError: index 3 is out of bounds for axis 0 with size 3

**Bonus!** This is called **array indexing**.  There is a shorter "equivalent" syntax that people will often use. However, for this class you only need to know about `.item()`.

```python
string_array[ INDEX ]
```

In [106]:
string_array[0]

'cat'

**Exercise:** Use the `len` function to determine the length of the string array.

In [107]:
len(string_array)

3

Arrays also have a **member variable** `array_name.size` that contains the size of the array.  

**Exercise:** Use the size **member variable** to check the size of the array:

In [108]:
string_array.size

3


## Common Bugs

**Exercise:** What happens if we run the following:

```python
a = make_array(0,1,2,3)
bigger_array = make_array(1,2,3,4,5)
a * bigger_array
```

In [None]:
# take home
# finish this section!

**Exercise:** What happens if I run the following:

```python
uhoh = make_array(0,1,2,3)
a / uhoh
```

**Exercise:** What happens if I run the following:

```python
a.item(4)
```

**Exercise:** What happens if I run the following:

```python
a.item(-1)
```

# Tables
Tables are Made of Arrays

We are covering arrays because this is the mathematical object that is returned when we work on specific columns of a table. Here we load a table of NBA salaries from a local file `nba_salaries.csv`.

In [None]:
nba = Table.read_table('data/nba_salaries.csv')
nba

Let's focus on the **Golden State Warriors**.

**Exercise:** Use the `my_table.where` function to select the rows where team is the `"Golden State Warriors"`.

We can also select columns by name. 

**Exercise**: Make a table with just the `"name"` and `"salary"` columns. 


**Exercise:** Compute the average average salary of the warriors.  Which of the following works?

*Option (A):*
```python
warriors.mean()
```

*Option (B):*
```python
warriors.select("salary").mean()
```

*Option (C):*
```python
warriors.column("salary").mean()
```

**Exercise:** Would the following work?

```python
np.average(warriors.select("salary"))
```

In [None]:
np.average(warriors.select("salary"))

Why?

**Exercise:** Use `np.average` to compute the average salary of the Warriors:

<details><summary>Solution</summary>
   
```python
np.average(warriors.column("salary"))
```
    
</details></br></br>

**Exercise:** Compute the difference in the average salaries of the warriors and the `"Los Angeles Lakers"`.

<details><summary>Solution</summary>
   
```python
lakers = nba.where('team', 'Los Angeles Lakers')
warriors.column('salary').mean() - lakers.column('salary').mean()
```
</details>

## Creating a Table from Arrays

Let's start with an array of street names.

In [None]:
streets = make_array('Embarcadero De Norte', 
                     'Embarcadero De Mar', 
                     'Camino Pescadero', 
                     'Camino Del Sur')
streets

We can make an empty table (no rows, no columns, no problems ...).

The `Table()` function makes an empty table.

In [None]:
empty_table = Table()
empty_table

**Exercise:** Check that the empty table has 0 rows and 0 columns
 using the `num_rows` and `num_columns` attribute

**Exercise:** Use the `table.with_column` function to add a column to the table and save the new table as `IV`.

**Exercise:** What is the output of:
```python
empty_table.with_column("Streets", streets)
empty_table.num_columns
```

**Exercise:** Can you do the same thing without using `empty_table`? Hint: use `Table()` directly.

**Exercise:** Extend the IV table to include the blocks from campus (use `np.arange`). ([map](https://goo.gl/maps/2pnJvxSWfmNaKJvUA))

**Exercise:** Build the entire table with blocks from campus in one call to the `Table.with_columns()` function.

---
Back to slides

---

# Case Study: Understanding the [W. E. B. Du Bois](https://en.wikipedia.org/wiki/W._E._B._Du_Bois) Visualization

![Picture from Wikipedia](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/W.E.B._Du_Bois_by_James_E._Purdy%2C_1907_%28cropped%29.jpg/167px-W.E.B._Du_Bois_by_James_E._Purdy%2C_1907_%28cropped%29.jpg)

**From Wikipedia:**  *William Edward Burghardt Du Bois (/djuːˈbɔɪs/ dew-BOYSS;[1][2] February 23, 1868 – August 27, 1963) was an American sociologist, socialist, historian, and Pan-Africanist civil rights activist. Born in Great Barrington, Massachusetts, Du Bois grew up in a relatively tolerant and integrated community. After completing graduate work at the University of Berlin and Harvard University, where he was the first African American to earn a doctorate, he became a professor of history, sociology, and economics at Atlanta University. Du Bois was one of the founders of the National Association for the Advancement of Colored People (NAACP) in 1909.*

For more context on the visualization in lecture checkout [Du Bois’ Data Portraits Tell A Story About Black Life In Georgia And Beyond](https://www.wabe.org/du-bois-data-portraits/)



In [None]:
du_bois = Table.read_table('data/du_bois.csv')
du_bois

**Exercise:** Compute the amount of money spent on food and add it to the table and add it to the table as `"FOOD $"`:

**Exercise:** Use the table functions we learned this week to find the income bracket ("class") that spent the most money on rent.

**Bonus:** use the `set_format()` function to display the shares as percentages (using the `PercentFormatter`)