# DSC 10 Lecture 2: Expressions and Data Types

## Jupyter Notebooks, expressions, and data types
![image](./logos.png)

# Programming languages help extract information from data

* Python is popular for data science and software development.
* Learn through practice!
* Learn just enough to use it, as you need it!
* Follow along at [datahub.ucsd.edu](https://datahub.ucsd.edu)
* See the documentation for references: [Jupyter Documentation](https://jupyter-notebook.readthedocs.io/)

# Notebooks mix text with code

* Perfect for experimenting with code
    - Annotate code experimentation for others
* Perfect for presentations about data
    - Annotate data analysis with explanations
* It's not perfect for everything!
    - Software development, big projects, compiled languages...

# Notebook cells have two types: 
* `markdown`
* `code`
### (DEMO)

This is my text cell.

# Getting started with Python: expressions

Write an expression in a "code cell" and either hit "Shift-Enter" or press the "Run" button to evaluate the code!

In [134]:
3+500

503

# Read, Evaluate, Print
* Type an **expression** into a code cell.
* The python interpreter **evaluates** the expression.
* The notebook **displays the value** of the (last) expression in the cell.

In [137]:
# this is a comment
1 + 2 #addition

3

In [138]:
5*5

25

# Numbers and Arithmetic

<img src="arithmetic_table.png" width=900/>

## Python uses typical order of operations (PEMDAS)

In [141]:
# example: exponentiation and multiplication
(2**4)*(3**1)*4

192

In [143]:
2*3**2

18

# Assignment: names and variables

$$ \overbrace{\texttt{myvariable}}^{\text{name}} = \overbrace{\texttt{2 + 3}}^{\text{any expression}} $$

* Assignment statements like above don't have a value.
* An assignment statement changes the meaning of the name to the left of the `=` symbol.
* `myvariable` is bound to `5` (value) not `2 + 3` (expression).

In [150]:
myvariable = 2 + 3 #assignment statements

In [151]:
myvariable #typing the name of a variable displays its value (contents of a box in memory)

5

In [152]:
myvariable = 6*7

In [153]:
myvariable

42

Try doubling the value of `myvariable`.

In [159]:
myvariable = myvariable*2
myvariable

336

### A variable's value is set at the time of assignment

In [160]:
x = 2
y = 3 + x
y

5

In [161]:
x = 4
x

4

In [162]:
y

5

# Call Expressions
* Call expressions invoke functions
* Functions are called in python just like in standard mathematics:
$$ y = f(x) $$
* Inputs are called arguments

In [164]:
abs(-67)

67

### Functions can be named

In [166]:
f = abs #assignment statment
x = -12
f(x)

12

### Functions can take more than one argument and even a variable number of arguments

In [171]:
max?

In [169]:
max(3, -4, 2, 5, 6) #four arguments

6

### Use the ```?``` after a function to see the documentation for a function
* or use the `help` function.

In [172]:
round?

In [177]:
# round
my_number = 11244.22
round(my_number, -1)

11240.0

In [178]:
help(round)

Help on built-in function round in module builtins:

round(...)
    round(number[, ndigits]) -> number
    
    Round a number to a given precision in decimal digits (default 0 digits).
    This returns an int when called with one argument, otherwise the
    same type as the number. ndigits may be negative.



In [179]:
round(1.22222, 3, 5)

TypeError: round() takes at most 2 arguments (3 given)

### What functions are available for use?     [built-in functions](https://docs.python.org/3/library/functions.html)

<img src="./python_builtins.png"  width="900" align="middle"/>

## Import functions from Python modules
* Modules are collections of Python functions.
* Access these functions via an *import statement*.
* Call the functions using `module.function()` syntax.

### Import the `math` module and look around
* sqrt, log, etc...

In [180]:
import math

In [184]:
# tab completion for browsing
math.sqrt(45)

6.708203932499369

In [188]:
math.pow?

In [187]:
# what base is log?
math.log(9, 3)

2.0

<center><img src="q2.png"  width="1000"/></center>

In [189]:
x=3 
y=-2

In [190]:
abs(x, y)

TypeError: abs() takes exactly one argument (2 given)

In [191]:
math.pow(x, abs(y))

9.0

In [198]:
9

9

In [196]:
max?

In [197]:
math.pow(x, math.pow(y,x))

0.00015241579027587258

# Data Types
* Every value in Python has a type, which describes how the value is stored. 
* Use the `type` function to find out the data type of any value.
* Understanding the data often requires understand how the data was stored.

# Two data types for numbers: ```float``` and ```int```
* ```int``` : an integer of any size
* ```float```: a number with an optional fractional part

### ```int```
* ints have arbitrary precision
* integer arithmetic: `+`, `-`, `*`, `**`

In [201]:
type(3+5)

int

In [202]:
2**6000

1513470582304237072513410067329391955423482356622077508836389416646889306993564534635830817676552455824162236150182627025523267446014684388515185452610872385131925014977944482910893194864870039450549067298170721939711827195277899152348801107671644590882157659897905715342306574668169658354699728703352747795407934837779271083755217530954282733292552820320384388452736605849854097009866199175847289532126439677946323677218741195176031143605520246681993939075046911841617410627221987267713724332646446061053160572807286503464245558500643221584631072641658731573097806459864337910226647829569494284055229751599736619753728848188090692318773574702174698843299086831373062575703757942149263399264787530048897154373819381136097105118607145954825590031647682047062652157439770794084025717418220995063370720546665168546633747096935011148065108431975289654040627344874607009807315363771104716985869481861040739858883444667635709902593581610755376089941291884120821811227210157156298148402819214731929270895508

### ```float```
* a float is specified using a decimal point
* a float might be printed using scientific notation

In [203]:
type(2.0 + 3.2)

float

In [206]:
3.0**400

7.055079108655333e+190

### ```float```
* floats have limited size (but the limit is huge)
* floats have limited precision of 15-16 decimal places
* after arithmetic, the final decimal few places can be wrong (limited precision!)

In [207]:
1/3

0.3333333333333333

In [209]:
1/3*100000000000000

33333333333333.332

## Type coercion: changing the type between ```int``` and ```float```
* By default, python changes ``int`` to ``float`` in an expression with both types.
* The type can be explicitly changed using ```int``` and ```float``` functions.
* Division of two integers automatically returns a float value.

In [210]:
int(3.0)

3

In [211]:
float(3)

3.0

### Type coercion: changing the type between ```int``` and ```float```

In [214]:
2.5 + 3

5.5

In [213]:
3/2 #division of two integers always results in a float

1.5

In [215]:
round(3.9)

4

In [216]:
# int rounds float down to the nearest integer below it
int(3.9)

3

### Be careful converting between ```int``` and ```float```

In [217]:
2.51 * 100

250.99999999999997

In [219]:
round(2.51 * 100) #round to nearest, int rounds down

251

### The consequences of `float` to `int` conversion error

The Ariane I exploded on launch in 1996 due to floating point conversion errors: 
[see story here](https://itsfoss.com/a-floating-point-error-that-caused-a-damage-worth-half-a-billion/)

<center><img src="ariane.jpg" width="400"/></center>

# Text, Strings, and Types

## A string value is a snippet of text of any length
* Enclose a string in either single or double quotes.

In [222]:
"word"

'word'

In [223]:
'word'


'word'

### String arithmetic: + and *

In [225]:
s1 = 'hello'
s2 = 'world'
s1+" "+s2 #concatenation

'hello world'

In [226]:
s1+s1+s1

'hellohellohello'

In [227]:
s1*3

'hellohellohello'

In [230]:
int("3")+int("3")

6

In [231]:
(s1+" ")*2+s1

'hello hello hello'

### String methods
* Strings are associated with certain functions called *string methods*.
* Access string methods with a `.` after the string.
* e.g. `.upper()`, `.replace()`,...

In [236]:
my_cool_string = 'data science is super cool!'
my_cool_string = my_cool_string.capitalize()

In [237]:
my_cool_string

'Data science is super cool!'

In [238]:
my_cool_string = my_cool_string.replace('super', 'really')
my_cool_string

'Data science is really cool!'

In [239]:
my_cool_string.upper()

'DATA SCIENCE IS REALLY COOL!'

### Special characters in strings
* apostrophies, quotes, new-lines, etc...

In [240]:
'my string's full of apostrophes!'

SyntaxError: invalid syntax (<ipython-input-240-50d82291aa71>, line 1)

In [241]:
"my string's full of apostrophes!"

"my string's full of apostrophes!"

In [242]:
# escape the apostrophe with a backslash!
'my string\'s "full" of apostrophes!'

'my string\'s "full" of apostrophes!'

In [243]:
print('my string\'s "full" of apostrophes!')

my string's "full" of apostrophes!


## Digression: ```print()``` vs. string representation
* By default, Jupyter notebooks display the string represenation of the value of the expression of the last line in a cell.
* The function ```print``` displays the value in human-readable text.

In [244]:
my_newline_str = 'Here is a string with two lines.\nHere is the second line'  # '\n' inserts a new line
my_newline_str

'Here is a string with two lines.\nHere is the second line'

In [245]:
print(my_newline_str)  # notice the quotes disappear!

Here is a string with two lines.
Here is the second line


# Type conversion to and from strings

### Type conversion causes messy data!

Genomics data (string-to-date):
> "Geneticists use MARCH1 as shorthand for membrane associated ring-CH-type finger 1. But Excel interprets MARCH1 as a date, automatically converting it to 1-Mar or another designation for the first of March."

[Excel Is Autocorrecting Scientific Research. And That's Not Cool](https://science.howstuffworks.com/innovation/scientific-experiments/excel-is-autocorrecting-scientific-research-thats-not-cool.htm)

### Type conversion causes messy data!

Genomics data (string-to-float):

> "Excel misidentifies some other gene names as coordinates or floating points. You might be able to suss out that 1-Mar is actually MARCH1, but how about 2.31E+13? That's how Excel converts the RIKEN identifier 2310009E13."

[Excel Is Autocorrecting Scientific Research. And That's Not Cool](https://science.howstuffworks.com/innovation/scientific-experiments/excel-is-autocorrecting-scientific-research-thats-not-cool.htm)

<center><img src="./type_inference_2.png"  width="800"/></center>

## Type conversion to and from strings
* Any value can be converted to a string using ```str```
* Strings can be converted to ```int``` and ```float``` when possible

In [246]:
str(3.0)

'3.0'

In [247]:
float('3')

3.0

In [248]:
int('chicken!')

ValueError: invalid literal for int() with base 10: 'chicken!'

In [250]:
float('6.0') + 3

9.0

In [253]:
int('4.0')

ValueError: invalid literal for int() with base 10: '4.0'

In [254]:
int(float('4.0'))

4

<center><img src="q3.png"  width="1000"/></center>

In [255]:
x=3
y='4'
z='5.6'

In [256]:
x+y #like 5+"chicken"

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [257]:
x+int(y+z)

ValueError: invalid literal for int() with base 10: '45.6'

In [258]:
str(x)+int(y)

TypeError: must be str, not int

In [259]:
str(x)+z

'35.6'

# Arrays and Ranges

# Arrays
* An array contains a sequence of values.
* All elements of an array should have the **same type**.
* Arithmetic is applied to each element individually
* When two arrays are added, they must have the same size; corresponding elements are added in the result.
    - Unless one of the arrays has size one.

In [None]:
from datascience import *        # datascience library for course
import numpy as np               # 'numerical python library' for working with arrays

## Arrays make working with data easy
* Add, subtract, multiply, divide, exponentiate.
* Use ``.item`` to access an array element by index.
* Warning: array indices start with zero!

In [None]:
a1 = make_array(1,2,3)
a2 = make_array(3,2,1)

In [None]:
a1 + a2

## Arrays for basic statistics: newborn birth weight

In [None]:
baby1 = 3.405 
baby2 = 3.207
baby3 = 2.42
baby4 = 3.984

### Load the weights into an array of floats
* `make_array()`

In [None]:
weights_kg = 

### Calculate the deviation of weights from average
* Subtracting a number from an array subtracts the number from each element.

In [None]:
avg_weight = 3.5 # average weight of ALL newborns (in kg)

### Convert the weights to pounds (2.2 lbs per kg)

### How many baby weights are recorded in the array?
* `len()` or `.size`

## Arrays for basic statistics: daily temperatures

### Below is an array of daily high temperatures in San Diego from August 2018

In [None]:
temps = make_array(86, 85, 85, 84, 85, 86, 91, 89, 90, 88, 88, 85, 83, 82, 79, 81, 82,
                   83, 82, 79, 81, 83, 83, 79, 80, 80, 79, 80, 82, 82, 80)

Numbers of days temperatures are collected in August:

### Temperature statistics (mean, min, max)

In [None]:
temps.sum() / temps.size  # use sum and size

In [None]:
temps.mean() # mean method

In [None]:
min(temps), max(temps) # built-in functions work on arrays

In [None]:
temps.min(), temps.max() # the array has it's own min/max method (faster)

### Sort the temperatures / calculate differences

In [None]:
np.sort(temps)

In [None]:
np.diff(temps)

### Convert from Fahrenheit to Celsius
$$C = \dfrac{5}{9}\left(F-32\right)$$

# Ranges
* A range is an array of consecutive numbers
* ```np.arange(end)```: An array of increasing integers from 0 up to end
* ```np.arange(start, end)```: An array of increasing integers from start up to end
* ```np.arange(start, end, step)```: A range with step between consecutive values
* The range always includes start but excludes end (i.e. a half-open interval)

In [None]:
np.arange(5)

In [None]:
np.arange(3, 9)

In [None]:
np.arange(3, 30, 5)

In [None]:
np.arange(-3, 2, 0.5)

In [None]:
np.arange(1, -3)

In [None]:
np.arange(1, -3, -1)

<center><img src="q4.png"  width="800"/></center>


In [None]:
x = make_array(2, 3, 4)
y = np.arange(2, 3, 4)
z = np.arange(3)

In [None]:
x+y

In [None]:
x+z

In [None]:
z.item(0)+y.item(0)

In [None]:
x.item(1)+y.item(1)