In [1]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
from datetime import date
plots.style.use('fivethirtyeight')

# Words of Caution
- Remember to run the cell above. It's for setting up the environment so you can have access to what's needed for this lecture. For now, don't worry about what it means: we'll learn more about what's inside of it in the next few lectures.
- Data science is not just about code, so please don't go over this notebook by itself. Have the relevant textbook sections or lecture video at hand so that you can go over the discussion along with the code. Thank you! 

# Markdown
1. write bold and italic
1. create a list
    1. bullet
    1. numbered
1. create a table
1. write a formula
1. write code (psuedocode) 
1. add a link
1. add an image

- item 1
- item 2

*italic text*, **bold text**

| Column a | Column b|
| --       | ---     |
|1|2|

1. item 1
1. item 1a
1. item 2
1. item 3

$E = mc^2$

$\alpha$

'a=1'

[Linked to canvas](https://ucsb.instructure.com/courses/17881)

![](https://img.freepik.com/free-photo/cute-domestic-kitten-sits-window-staring-outside-generative-ai_188544-12519.jpg)

# Intro to Python

let's do some basic math operations on numbers: 
- add
- divide
- multiply
- raise
- evaluate inequalities

Let's do some operations on text (aka: string)
- add
- multiply

Optional: Let's do some operations on dates

In [5]:
3+4

7

In [6]:
3*4

12

In [7]:
3/1

3.0

In [8]:
2**3

8

In [9]:
1 >= 4

False

In [10]:
'this is a character string'

'this is a character string'

In [12]:
'add this' + ' and this'

'add this and this'

In [13]:
"single or double quotes don't matter in character strings"

"single or double quotes don't matter in character strings"

In [15]:
'until they do or don't"

SyntaxError: unterminated string literal (detected at line 1) (4139057957.py, line 1)

In [16]:
# let's fix the above
"until they do or don't"

"until they do or don't"

In [17]:
'ha'*4

'hahahaha'

In [18]:
'ha'+'ha'+'ha'+'ha'

'hahahaha'

In [19]:
'ha'+ 1

TypeError: can only concatenate str (not "int") to str

In [20]:
# let's fix the above using the typeError info given
'ha'+'1'

'ha1'

In [21]:
'ha'+'one'

'haone'

In [22]:
'ha'+ one

NameError: name 'one' is not defined

In [23]:
one = '1' # assigning value of '1' to the variable name one

In [24]:
'ha' + one # both of these are strings
# so python allows us to do this

'ha1'

# Names (aka: variables)
- let's assign some values to variables/names
- let's use the variables
- let's overwrite variables

In [25]:
one = 1 
# assigned the value 1 to variable name one
# if we want it to display, we need to explicitly call on it 

In [27]:
one # calling on the variable name one, which we defined up above

1

In [28]:
16 * 40 * 52

33280

In [37]:
hours_per_week = 38
weeks_per_year = 52
dollars_per_hour = 17

In [31]:
hours_per_week * weeks_per_year * dollars_per_hour

33280

In [38]:
hours_per_year = hours_per_week * weeks_per_year # assigning value here
hours_per_year # call on variable here

1976

In [34]:
weekly_wages = hours_per_week * dollars_per_hour
weekly_wages

640

In [39]:
yearly_wages = hours_per_year * dollars_per_hour

In [40]:
yearly_wages

33592

In [42]:
# spaces are purely aesthetic or personal formatting choice
yearlywages=hours_per_year*dollars_per_hour
yearlywages

33592

## Why Names?
- Calculate the annual salary for a person working full-time under California minimum wage of 15 USD/hour. 
- on 2023-01-01, the minimum wage was raised to 15.50 USD/hour. Recalculate the minimum wage.

In [46]:
hours_per_week = 39
weeks_per_year = 52
salary = 16.0 # USD/h

In [49]:
hours_per_year = hours_per_week * weeks_per_year
hours_per_year

2028

In [47]:
39 * 52

2028

In [50]:
hours_per_year

2028

In [None]:
weekly_wages = hours_per_week * salary
weekly_wages

In [None]:
yearly_wages = hours_per_year * salary
yearly_wages

# Comments

# Functions
- let's use some [built-in functions](https://docs.python.org/3/library/functions.html). E.g:
    - absolute value
    - the lower of two values
    - round    
- let's use keywords for arguments
- let's define our own function
- lets use the `help()` function to learn about the usage of functions

In [51]:
help(max)

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.



In [52]:
max(15, 27)

27

In [53]:
max(27, 15)

27

In [54]:
max(27, 15, 30)

30

In [56]:
abs(-45)

45

In [59]:
temp_day = 62
temp_night = 40

abs(temp_day - temp_night) # putting an expression within the argument of a function, absolute value or abs
# remember python behind the scenes is evaluating the expression (in this case subtraction)
# abs(62-40) = abs(22)

22

In [60]:
temp_day = 64

In [61]:
difference_in_temp = abs(temp_day - temp_night)
difference_in_temp

24

In [62]:
temp_night = 39
difference_in_temp

24

In [63]:
difference_in_temp = abs(temp_day - temp_night)
difference_in_temp

25

In [64]:
min(3,7)

3

In [65]:
round(1.1, 1)

1.1

In [66]:
round(1.1, 0)

1.0

In [67]:
round(1.1)

1

In [68]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



In [69]:
round(number=17.234, ndigits=1)

17.2

In [70]:
round(ndigits=1, number=17.234)

17.2

In [71]:
def f(a): # function header, it needs to end with a colon :
    # this is my function body
    return a

In [72]:
f(1)

1

In [73]:
f('a')

'a'

In [75]:
a = 'my input'

In [76]:
f(a)

'my input'

In [77]:
def add_values(a, b): # function header, remember comments are not executed by python
    added_values = a + b # we are adding a and b together, and assigning its value to added_values
    return added_values # returning added_values, the sum of a and b

In [78]:
add_values(1,1)

2

In [79]:
add_values(5,5)

10

In [80]:
add_values(5) # what's b?
# sum, i need two values at least

TypeError: add_values() missing 1 required positional argument: 'b'

In [81]:
add_values(5,5)

10

---
slides

---

# Tables A)
[Documentation](http://www.data8.org/datascience/reference-nb/datascience-reference.html#Table-Functions-and-Methods)

1. read a CSV from `data/cones.csv` into a table using `Table.read_table()`
1. show the first n_rows using `show()`
1. select a single column from the table using `select()`
1. select multiple columns from the table using `select()`
1. remove a column from the table using `drop()`
1. subset the table to only chocolate cones using `where()` 
1. sort the cones by price using `sort()`
    1. most expensive first
    1. cheapest first
1. add a new column containing your rating using `with_column()`
    
Remember that you can use the `help()` function or `?` to learn about each function

In [82]:
cones = Table.read_table('data/cones.csv')
cones

Flavor,Color,Price,Rating
strawberry,pink,3.55,1
chocolate,light brown,4.75,4
chocolate,dark brown,5.25,3
strawberry,pink,5.25,2
chocolate,dark brown,5.25,5
bubblegum,pink,4.75,1


In [83]:
cones.show(2)

Flavor,Color,Price,Rating
strawberry,pink,3.55,1
chocolate,light brown,4.75,4


In [84]:
cones.select('Price')

Price
3.55
4.75
5.25
5.25
5.25
4.75


In [85]:
cones.drop('Color')

Flavor,Price,Rating
strawberry,3.55,1
chocolate,4.75,4
chocolate,5.25,3
strawberry,5.25,2
chocolate,5.25,5
bubblegum,4.75,1


# Tables B)
1. read the CSV `data/skyscrapers.csv`
1. show the number of skyscrapers in the dataset using the *attribute* `num_rows`
1. sort the table by completion year. What can we learn?
1. subset the data to skyscrapers in 'Los Angeles'
    1. subset to skyscrapers in `Los Angeles` that were built in the year 1971
1. get data on the 'Empire State Building' in 'New York City'
1. rename the column 'completed' column using `relabel()`
1. get all skyscrapers in `New York City` and sort them by when they have been built

# Visualizations
1. read the csv `data/movies_by_year.csv`
1. Plot the number of movies vs the total gross using `scatter()`
    - add a trendline
1. Plot the number of movies over time using `plot()`