In [1]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
from datetime import date
plots.style.use('fivethirtyeight')

# Words of Caution
- Remember to run the cell above. It's for setting up the environment so you can have access to what's needed for this lecture. For now, don't worry about what it means: we'll learn more about what's inside of it in the next few lectures.
- Data science is not just about code, so please don't go over this notebook by itself. Have the relevant textbook sections or lecture video at hand so that you can go over the discussion along with the code. Thank you! 

# Markdown
1. write bold and italic
1. create a list
    1. bullet
    1. numbered
1. create a table
1. write a formula
1. write code
1. add a link
1. add an image

*italic text*, **bold text**

- item 1
- item 2

1. item 1
1. item 1a
1. item 2
1. item 3


| Column a | Column b |
| ---      | ---|
| 1 | 2|

$E = mc^2$
$\alpha$

`a=1`

[Linked to canvas](https://ucsb.instructure.com/courses/9595)

![](https://upload.wikimedia.org/wikipedia/commons/1/15/Cat_August_2010-4.jpg)

# Intro to Python

let's do some basic math operations on numbers: 
- add
- divide
- multiply
- raise
- evaluate inequalities

Let's do some operations on text (aka: string)
- add
- multiply

Optional: Let's do some operations on dates

In [3]:
3 + 4

7

In [4]:
3 / 4

0.75

In [13]:
3/1

3.0

In [5]:
3 * 4

12

In [8]:
2 ** 3

8

In [14]:
1 >= 4

False

In [16]:
"this is a text" + "more text"

'this is a textmore text'

In [18]:
'ha' * 4

'hahahaha'

In [19]:
'ha' + 1

TypeError: can only concatenate str (not "int") to str

# Names (aka: variables)
- let's assign some values to variables/names
- let's use the variables
- let's overwrite variables

## Why Names?
- Calculate the annual salary for a person working full-time under California minimum wage of 15 USD/hour. 
- on 2023-01-01, the minimum wage was raised to 15.50 USD/hour. Recalculate the minimum wage.

In [20]:
15 * 40  * 52

31200

In [23]:
hours_per_week = 40
weeks_per_year = 52
salary = 15.0 # USD/h

In [24]:
hours_per_year = hours_per_week * weeks_per_year

In [25]:
hours_per_year

2080

In [None]:
weekly_wages = hours_per_week * salary
weekly_wages

In [None]:
yearly_wages = hours_per_year * salary
yearly_wages

# Functions
- let's use some built-in functions. E.g:
    - absolute value
    - the lower of two values
    - round    
- let's use keywords for arguments
- let's define our own function
- lets use the `help()` function to learn about the usage of functions

In [27]:
abs(-45)

45

In [28]:
temp_day = 27
temp_night = 18

abs(temp_day - temp_night)

9

In [29]:
min(3, 7)

3

In [None]:
round(1.1, 1)

In [37]:
round(ndigits=1, number=17.234)

17.2

In [32]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



In [38]:
def f(a):
    return a

In [47]:
def add_values(a, b):
    added_values = a + b
    return added_values

In [48]:
add_values(5, 5)

10

In [39]:
f(1)

1

# Tables A)
1. read a CSV from `data/cones.csv` into a table using `Table.read_table()`
1. show the first n_rows using `show()`
1. select a single column from the table using `select()`
1. select multiple columns from the table using `select()`
1. remove a column from the table using `drop()`
1. subset the table to only chocolate cones using `where()` 
1. sort the cones by price using `sort()`
    1. most expensive first
    1. cheapest first
1. add a new column containing your rating using `with_column()`
    
Remember that you can use the `help()` function or `?` to learn about each function

In [51]:
cones = Table.read_table('data/cones.csv')
cones

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25
bubblegum,pink,4.75


In [52]:
cones.show(2)

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75


In [55]:
cones.select('Price')

Price
3.55
4.75
5.25
5.25
5.25
4.75


In [56]:
cones.select('Flavor', 'Price')

Flavor,Price
strawberry,3.55
chocolate,4.75
chocolate,5.25
strawberry,5.25
chocolate,5.25
bubblegum,4.75


In [57]:
cones.drop('Color')

Flavor,Price
strawberry,3.55
chocolate,4.75
chocolate,5.25
strawberry,5.25
chocolate,5.25
bubblegum,4.75


In [59]:
cones_without_color = cones.drop('Color')

In [60]:
cones.where('Flavor', 'chocolate')

Flavor,Color,Price
chocolate,light brown,4.75
chocolate,dark brown,5.25
chocolate,dark brown,5.25


In [61]:
cones.sort('Price')

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
bubblegum,pink,4.75
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25


In [63]:
cones.sort('Price', descending=True)

Flavor,Color,Price
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25
chocolate,light brown,4.75
bubblegum,pink,4.75
strawberry,pink,3.55


In [65]:
cones.with_column('rating', (5, 1, 5, 4, 1, 2))

Flavor,Color,Price,rating
strawberry,pink,3.55,5
chocolate,light brown,4.75,1
chocolate,dark brown,5.25,5
strawberry,pink,5.25,4
chocolate,dark brown,5.25,1
bubblegum,pink,4.75,2


In [66]:
cones

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
chocolate,dark brown,5.25
strawberry,pink,5.25
chocolate,dark brown,5.25
bubblegum,pink,4.75


# Tables B)
1. read the CSV `data/skyscrapers.csv`
1. show the number of skyscrapers in the dataset using the *attribute* `num_rows`
1. sort the table by completion year. What can we learn?
1. subset the data to skyscrapers in 'Los Angeles'
    1. subset to skyscrapers in `Los Angeles` that were built in the year 1971
1. get data on the 'Empire State Building' in 'New York City'
1. rename the column 'completed' column using `relabel()`
1. get all skyscrapers in 'New York City' and sort them by when they have been built

In [71]:
skyscrapers = Table.read_table('data/skyscrapers.csv')
skyscrapers.num_rows

200

In [74]:
skyscrapers

name,material,city,height,completed
One World Trade Center,composite,New York City,541.3,2014
Willis Tower,steel,Chicago,442.14,1974
432 Park Avenue,concrete,New York City,425.5,2015
Trump International Hotel & Tower,concrete,Chicago,423.22,2009
Empire State Building,steel,New York City,381.0,1931
Bank of America Tower,composite,New York City,365.8,2009
Stratosphere Tower,concrete,Las Vegas,350.22,1996
Aon Center,steel,Chicago,346.26,1973
John Hancock Center,steel,Chicago,343.69,1969
WITI TV Tower,steel,Shorewood,329.0,1962


In [80]:
skyscrapers.sort('completed', descending=True).show(10)

name,material,city,height,completed
432 Park Avenue,concrete,New York City,425.5,2015
Sky,concrete,New York City,206.0,2015
Mansions at Acqualina,concrete,Sunny Isles Beach,196.0,2015
One World Trade Center,composite,New York City,541.3,2014
One57,steel/concrete,New York City,306.07,2014
4 World Trade Center,composite,New York City,297.73,2014
Courtyard & Residence Inn Manhattan/Central Park,concrete,New York City,229.62,2013
Devon Energy Center,concrete,Oklahoma City,257.23,2012
Revel Resort and Casino,concrete,Atlantic City,218.92,2012
Eight Spruce Street,concrete,New York City,265.18,2011


In [83]:
skyscrapers.where('city', 'Los Angeles').where('completed', 1971)

name,material,city,height,completed
City National Tower,steel,Los Angeles,213.06,1971
Paul Hastings Tower,steel,Los Angeles,213.06,1971


In [86]:
skyscrapers.relabel('completed', 'year finished')

name,material,city,height,year finished
One World Trade Center,composite,New York City,541.3,2014
Willis Tower,steel,Chicago,442.14,1974
432 Park Avenue,concrete,New York City,425.5,2015
Trump International Hotel & Tower,concrete,Chicago,423.22,2009
Empire State Building,steel,New York City,381.0,1931
Bank of America Tower,composite,New York City,365.8,2009
Stratosphere Tower,concrete,Las Vegas,350.22,1996
Aon Center,steel,Chicago,346.26,1973
John Hancock Center,steel,Chicago,343.69,1969
WITI TV Tower,steel,Shorewood,329.0,1962


In [88]:
skyscrapers.where('city', 'New York City').sort('year finished')

name,material,city,height,year finished
Metropolitan Life Tower,steel,New York City,213.36,1909
Woolworth Building,steel,New York City,241.4,1913
Chanin Building,steel,New York City,197.8,1929
Mercantile Building,steel,New York City,192.6,1929
Chrysler Building,steel,New York City,318.9,1930
The Trump Building,steel,New York City,282.55,1930
One Grand Central Place,steel,New York City,205.13,1930
Empire State Building,steel,New York City,381.0,1931
Twenty Exchange,steel,New York City,225.86,1931
500 Fifth Avenue,steel,New York City,212.45,1931


In [90]:
skyscrapers.where?

# Visualizations
1. read the csv `data/movies_by_year.csv`
1. Plot the number of movies vs the total gross using `scatter()`
    - add a trendline
1. Plot the number of movies over time using `plot()`