In [1]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
import math

# Review of Tables

## Basics

- what is a CSV?
- read the CSV 'data/skyscrapers.csv' using `read_table()`
- display a table using `show()`
- sort a table using `sort()` ascending/descending
- select column(s) using `select()`
- drop column(s) using `drop()`

In [2]:
# a CSV is a comma separate value file (similar to .xlsx file or some excel/google spreadsheet
# instead of columns separating values, or attributes
# to save space, and create smaller footprints with our datasets
# we use commas to separate values or attributes


skyscrapers = Table.read_table('data/skyscrapers.csv') # assignment statement for a new table
# Table initializes a new table, the Table method .read_table reads in a csv file and converts it into a table format
# we assigned the new table to the variable name skyscrapers
# below we are calling on new variable name, skyscrapers to display it in our notebook
skyscrapers

# name - name of skyscraper
# material - material the skyscraper was largely built with
# city - city skyscraper located in
# height is measured in meters (m)
# completed is a the completed year that the skyscraper was built

name,material,city,height,completed
One World Trade Center,composite,New York City,541.3,2014
Willis Tower,steel,Chicago,442.14,1974
432 Park Avenue,concrete,New York City,425.5,2015
Trump International Hotel & Tower,concrete,Chicago,423.22,2009
Empire State Building,steel,New York City,381.0,1931
Bank of America Tower,composite,New York City,365.8,2009
Stratosphere Tower,concrete,Las Vegas,350.22,1996
Aon Center,steel,Chicago,346.26,1973
John Hancock Center,steel,Chicago,343.69,1969
WITI TV Tower,steel,Shorewood,329.0,1962


In [3]:
skyscrapers.show()

name,material,city,height,completed
One World Trade Center,composite,New York City,541.3,2014
Willis Tower,steel,Chicago,442.14,1974
432 Park Avenue,concrete,New York City,425.5,2015
Trump International Hotel & Tower,concrete,Chicago,423.22,2009
Empire State Building,steel,New York City,381.0,1931
Bank of America Tower,composite,New York City,365.8,2009
Stratosphere Tower,concrete,Las Vegas,350.22,1996
Aon Center,steel,Chicago,346.26,1973
John Hancock Center,steel,Chicago,343.69,1969
WITI TV Tower,steel,Shorewood,329.0,1962


In [4]:
#skyscrapers.show(20)
skyscrapers.show(2)

name,material,city,height,completed
One World Trade Center,composite,New York City,541.3,2014
Willis Tower,steel,Chicago,442.14,1974


In [5]:
#skyscrapers.sort('completed') # default order for sort is ascending (smallest to greatest)
skyscrapers.sort('completed', descending = True)

name,material,city,height,completed
432 Park Avenue,concrete,New York City,425.5,2015
Sky,concrete,New York City,206.0,2015
Mansions at Acqualina,concrete,Sunny Isles Beach,196.0,2015
One World Trade Center,composite,New York City,541.3,2014
One57,steel/concrete,New York City,306.07,2014
4 World Trade Center,composite,New York City,297.73,2014
Courtyard & Residence Inn Manhattan/Central Park,concrete,New York City,229.62,2013
Devon Energy Center,concrete,Oklahoma City,257.23,2012
Revel Resort and Casino,concrete,Atlantic City,218.92,2012
Eight Spruce Street,concrete,New York City,265.18,2011


In [6]:
#skyscrapers.select('name', 'material', 'completed').sort('completed')
# .select is a Table method
# we can "combine" Table methods on the same line
# all my old skyscrapers were made of steel!

# what about my newer skyscrapers?
skyscrapers.select('name', 'material', 'completed').sort('completed', descending = True).drop('name')
# above we have three Table methods "combined", 
#first we select columns we want, 
#then we sort
# then we drop the "name" column

material,completed
concrete,2015
concrete,2015
concrete,2015
composite,2014
steel/concrete,2014
composite,2014
concrete,2013
concrete,2012
concrete,2012
concrete,2011


In [7]:
skyscrapers

name,material,city,height,completed
One World Trade Center,composite,New York City,541.3,2014
Willis Tower,steel,Chicago,442.14,1974
432 Park Avenue,concrete,New York City,425.5,2015
Trump International Hotel & Tower,concrete,Chicago,423.22,2009
Empire State Building,steel,New York City,381.0,1931
Bank of America Tower,composite,New York City,365.8,2009
Stratosphere Tower,concrete,Las Vegas,350.22,1996
Aon Center,steel,Chicago,346.26,1973
John Hancock Center,steel,Chicago,343.69,1969
WITI TV Tower,steel,Shorewood,329.0,1962


In [8]:
material_completedyear = skyscrapers.select('name', 'material', 'completed').sort('completed', descending = True).drop('name')
material_completedyear

material,completed
concrete,2015
concrete,2015
concrete,2015
composite,2014
steel/concrete,2014
composite,2014
concrete,2013
concrete,2012
concrete,2012
concrete,2011


## Subsetting:
- subset rows using `where()`
    - skyscrapers in LA
    - skyscrapers in LA that were completed in 1971
- subset rows using `where()` and [predicates](http://www.data8.org/datascience/reference-nb/datascience-reference.html#Table.where-Predicates)
    - skyscrapers that were completed after 2014
    - skyscrapers that were completed in the 1990s
    
> some predicates: 
> - `are.equal_to()` 
> - `are.above()`,`are.below()`
> - `are.above_or_equal_to()`, `are.below_or_equal_to()`, 
> - `are.between()`, `are.between_or_equal_to()`, 

- skyscrapers that are a 'Tower' (using `are.containing()`) 
- skyscrapers that are in Chicago or Houston  (using `are.contained_in()`)

In [9]:
#subset rows using where()
#skyscrapers in LA
skyscrapers.where('city', 'Los Angeles').sort('completed').show() # notice 'los angeles' won't work, neither will 'LA'
# interesting, all my LA skyscrapers are made of steel. This is different from the general shift of skyscrapers
# being built with concrete and composite... ??

#skyscrapers in LA that were completed in 1971
#skyscrapers.where('city', 'Los Angeles').where('completed', 1971)
# we used the Table.where method twice here, first "filtering" for LA skyscrapers, then "filtering" for skyscrapers built in
# 1971


name,material,city,height,completed
City National Tower,steel,Los Angeles,213.06,1971
Paul Hastings Tower,steel,Los Angeles,213.06,1971
Aon Center,steel,Los Angeles,261.52,1974
Bank of America Plaza,steel,Los Angeles,224.03,1975
Wells Fargo Tower,steel,Los Angeles,220.37,1983
Figueroa at Wilshire,steel,Los Angeles,218.54,1989
U.S. Bank Tower,steel,Los Angeles,310.29,1990
Gas Company Tower,steel,Los Angeles,228.3,1991
777 Tower,steel,Los Angeles,221.0,1991
Two California Plaza,steel,Los Angeles,228.6,1992


In [10]:
#subset rows using where() and predicates
#skyscrapers in LA that were completed AFTER 1974
#skyscrapers.where('city', 'Los Angeles').where('completed', are.above(1974)).sort('completed')
# are.above is an example of a predicate

#skyscrapers in LA that were completed AFTER 1971
#skyscrapers.where('city', 'Los Angeles').where('completed', are.not_equal_to(1971)).sort('completed')

#skyscrapers that were completed after 2014
#skyscrapers.where('city', 'Los Angeles').where('completed', are.above(2014)).sort('completed')

#skyscrapers that were completed in the 1990s = 1990 - 1999 (are.between_or_equal_to(1990, 1999) OR are.between(1990, 2000)
#skyscrapers.where('city', 'Los Angeles').where('completed', are.between_or_equal_to(1990, 1999)).sort('completed')
skyscrapers.where('city', 'Los Angeles').where('completed', are.between(1990, 2000)).sort('completed')


name,material,city,height,completed
U.S. Bank Tower,steel,Los Angeles,310.29,1990
Gas Company Tower,steel,Los Angeles,228.3,1991
777 Tower,steel,Los Angeles,221.0,1991
Two California Plaza,steel,Los Angeles,228.6,1992


In [11]:
skyscrapers.sort('completed', descending = True)

name,material,city,height,completed
432 Park Avenue,concrete,New York City,425.5,2015
Sky,concrete,New York City,206.0,2015
Mansions at Acqualina,concrete,Sunny Isles Beach,196.0,2015
One World Trade Center,composite,New York City,541.3,2014
One57,steel/concrete,New York City,306.07,2014
4 World Trade Center,composite,New York City,297.73,2014
Courtyard & Residence Inn Manhattan/Central Park,concrete,New York City,229.62,2013
Devon Energy Center,concrete,Oklahoma City,257.23,2012
Revel Resort and Casino,concrete,Atlantic City,218.92,2012
Eight Spruce Street,concrete,New York City,265.18,2011


## Visualizations
1. read the csv `data/movies_by_year.csv`
1. Plot the number of movies over time using `plot()`
1. Plot the number of movies vs the total gross using `scatter()`
    - add a trendline
    - try adding year as label

# Numbers 
Python has two number types 
- int: an integer of any size
- float: a number with an optional fractional part

An **int** never has a decimal point; a **float** always does. A float might be printed using scientific notation.

Three limitations of float values:
- They have limited size (but the limit is huge)
- They have limited precision of 15-16 decimal places
- After arithmetic, the final few decimal places can be wrong

## Basics: ints vs floats
- multiplication of two ints -> int
- division of two ints -> float
- raise integer by integer -> int
- raise by float -> float
- inspect data type using `type()` function

In [12]:
2 * 2

4

In [13]:
type(2 * 2)

int

In [14]:
2 / 2 # when we divide numbers, we always get a float data type as output

1.0

In [15]:
type(2 / 2 )

float

In [16]:
type?

[0;31mInit signature:[0m [0mtype[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
type(object) -> the object's type
type(name, bases, dict, **kwds) -> a new type
[0;31mType:[0m           type
[0;31mSubclasses:[0m     ABCMeta, EnumType, _AnyMeta, NamedTupleMeta, _TypedDictMeta, _DeprecatedType, PyCStructType, UnionType, PyCPointerType, PyCArrayType, ...

In [17]:
2 ** 3

8

In [18]:
2.0 ** 3.0

8.0

## Why integers and float
- ints (accurately) representing very large numbers. 
    - Try creating a very large number
- floats have a limit in precision
    - `10/3 != 3 1/3`
    - try calculating difference between two precise numbers
    - $(\sqrt 13)^2 \neq 13$

In [19]:
123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789

123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789

In [20]:
123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789 == 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789

True

In [21]:
type(123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789)

int

In [22]:
10 / 3 # notice this value is the same thing as 3 1/3
# and 3 1/3 = 3.333333333333333333

3.3333333333333335

In [23]:
3.333333333333333333 == 3.3333333333333335


True

In [24]:
0.123456789123456789 - 0.123456789123456788
# 0.000000000000000001

0.0

In [25]:
# (square root of 13 )^2 
# (13^ (1/2))^2 = 13
# sqrt(13)^2 = 13
((13)** (1/2))**2

12.999999999999998

## Converting ("Casting") between int and float
- careful with decimals/rounding

In [26]:
float(5)

5.0

In [27]:
int(5.0)

5

In [28]:
int(5.9)

5

## Scientific notation
- divide by large number
- write scientific notation
- underscores for thousand separators

In [29]:
1 / 123456789
# 8.1 * 10^-9
# 0.0000000081

8.100000073710001e-09

In [37]:
type(8.100000073710001e-09)


float

In [31]:
0.0000000081

8.1e-09

In [32]:
1.23456789e08
# same as 123456789

123456789.0

In [33]:
123,456,789

(123, 456, 789)

In [34]:
123_456_789

123456789

# Text / Strings
A string value is a snippet of text of any length
- `'a'`
- `'word'`
- `"there can be 2 sentences. Here's the second!"`

Strings consisting of numbers can be converted to numbers
- `int('12')`
- `float('1.2')`

Any value can be converted to a string
- `str(5)`

 **Question:** What does the following evaluate to:

```python
'1' + '2'
```

## Basics
- single quotes vs double quotes 
    - escaping
    - apostrophe
- add/concatenate string.
- multiply by int/float
- add string and number
- convert from string to number
- string formatting

In [35]:
'1' + '2' # add strings together, it concatenates the strings

'12'

In [36]:
'1' + 2

TypeError: can only concatenate str (not "int") to str

In [None]:
"1" + "2"

In [None]:
"1" + '2'

In [None]:
"1'

## Discussion question:
Assume you have run the following statements:
```python
x = 3
y = '4'
z = '5.6'
```

What's the source of the error in each example?
1. `x + y`
1. `x + int(y + z)`
1. `str(x) + int(y)`
1. `y + float(z)`


# Boolean
- inequalities
- logic (and / or)
- cast bool to int
- sum/add bools

# Types
We’ve seen 6 types so far:
- `int: 2`
- `float: 2.2`
- `str: 'Red fish, blue fish'`
- `builtin_function_or_method: abs`
- `Table`
- `boolean`


The `type` function can tell you the type of a value
- `type(2)`
- `type(2 + 2)`

An expression’s “type” is based on its value, not how it looks

- `x = 2`
- `type(x)`

## Arrays 
An array contains a sequence of values
- All elements of an array should have the same type
- Arithmetic is applied to each element individually
- Adding arrays adds elements (**if same length!**)
- A column of a table is an array

## Let's:
- Create an array using `make_array()`
- multiply/raise/add/divide array with/by constant
- sum up all elements in array using `sum()`
- calculate the average value
- create a new array 
- add two arrays
    - check the size of array using `len()`
- make an array of strings
- use numpy functions on arrays
    - `np.average()` / `np.mean`
    - `np.median()`

## Columns of Tabldtypeare Arrays 
- select the height coumn of the skyscrapers using
    - `select`
    - `column`
- calculate the average height of the skyscrapers in SF vs LA
- What units are the heights in?