<h1 style='color:white'> Statistics 21 <br/> Python & Other Technologies for Data Science </h1>

<h3 style='color:white'>Vivian Lew, PhD - Monday, Week 2</h3>

## (previously) Important Notes about Python Syntax
##### based on A Whirlwind Tour of Python by Jake VanderPlas (https://jakevdp.github.io/WhirlwindTourOfPython/)
### Comments Are Marked by `#`

In [1]:
# this is a comment and is not run

### Lines
The end of a line terminates a statement. No need for using a semi-colon to end a statement ; although you can optionally use the semi-colon to write two statements in one line.


In [45]:
# example
x = 5
print(x)

5


### Lines (cont'd)
If you want to have a single statement cover multiple lines, you can use a backslash \ or encase the statement in parenthesis. If you are defining a list or other data structure that already uses some sort of bracket, this is handled automatically.

In [3]:
# semicolon to include multiple statements in one line
y = 6; z = 7
print(y + z)

13


Having multiple statements in one line is generally considered bad style and should be avoided.

### Lines (cont'd)
We use the backslash or parentheses or in certain cases brackets to continue a statement over multiple lines


In [4]:
a = 1 + 2 + 3 \
    + 4 + 5
print(a)

15


In [5]:
# or use parenthesis
b = (1 + 2 + 3
    + 4 + 5)
print(b)

15


### Lines (cont'd)

But some data structures (list here) use a comma to continue over multiple lines:

In [6]:
l = ['a', 2, 3, 'd',
    'e', 6, 
    'b']
print(l)

['a', 2, 3, 'd', 'e', 6, 'b']


### (Important) Indentation defines code blocks

- Python does not use curly braces `{}` to define code blocks.
- IPython is smart enough to automatically indent lines after you use a colon `:` which indicates that the following lines are part of a code block.
- We haven't covered conditionals yet, but I'll introduce them here briefly to show how code blocks work.

In [7]:
# we will learn if statements later, but here's an example
x = 8
if(x > 5):
    print('x is greater than 5')   # the two indented lines only run 
    print(x)                       # when the if statement is true
print('hello')    # this line is not indented and will run regardless of the if statement

x is greater than 5
8
hello


In [8]:
x = 4
if(x > 5):
    print('x is greater than 5')   # the two indented lines only run 
    print(x)                       # when the if statement is true
print('hello')    # this line is not indented and will run regardless of the if statement

hello


In [9]:
x = 4
if(x > 5):
    print('x is greater than 5')
print(x)
print('hello')

4
hello


## Getting Help

- Official Python Documentation https://docs.python.org/3/

- PEP  https://peps.python.org/  (e.g., PEP 8, PEP 20)

- Connect with others -- in here and elsewhere

## and the help function

In [10]:
help(print)

Help on built-in function print in module builtins:

print(*args, sep=' ', end='\n', file=None, flush=False)
    Prints the values to a stream, or to sys.stdout by default.
    
    sep
      string inserted between values, default a space.
    end
      string appended after the last value, default a newline.
    file
      a file-like object (stream); defaults to the current sys.stdout.
    flush
      whether to forcibly flush the stream.



## Lexis and Semantics

- Lexis - the words and symbols used to write programs

A lexically correct example in Python (no syntax errors):

`x = "1" * 2`

- Syntax - the way you combine words and symbols has rules and structure

but this example breaks 

- semantic rules, multiplying a string and a number together may not make sense and may not produce the desired results.

Lexical/syntax errors are easy to catch (Python will do this well).  Semantic errors are a challenge - always consider "Is this what we want/need?"


## Variables

- A powerful feature of Python is its ability to manipulate variables

- A variable is a named storage location (which holds a value) in memory. 

- We assign values to variables using the `=`  with the variable name on the left and value(s) on  the right

In [11]:
quote = 'Every new discovery is just a reminder... '

sample_size = 100

GPA = 3.78

print(quote)
print(sample_size)
print(GPA)

Every new discovery is just a reminder... 
100
3.78


## Variable Types and Identity

- Variables have types and variable type is determined by the (data) value a variable is referencing. For example:

In [12]:
print(type(quote))

print(type(sample_size))

print(type(GPA))

married = True
print(type(married))

<class 'str'>
<class 'int'>
<class 'float'>
<class 'bool'>


## Variable Types and Identity (cont'd)

- Variables have an ID,  this will be more important later but for now:

In [13]:
id(quote)

4411691568

In [14]:
id(sample_size)

4375496888

In [15]:
id(GPA)

4410832496

In [16]:
id(married)

4374487360

## More about variables

- We can create variables just by naming them and by assigning a value(s) and we can change their value(s) and their type(s) dynamically... but note its id( ) and its behavior.

In [17]:
quote = quote + "We're all small and stupid."
print(quote)
id(quote)

Every new discovery is just a reminder... We're all small and stupid.


4411937072

In [18]:
sample_size = 120
print(sample_size)
id(sample_size)

120


4375497528

## More about variables (cont'd)

In [19]:
GPA = 'Grade Point Average'
print(GPA)
id(GPA)

Grade Point Average


4411973168

In [20]:
sample_size += 10
print(sample_size)
id(sample_size)

130


4375497848

In [21]:
max_age = 130
id(max_age)

4375497848

## More about variables (cont'd)

The id of sample_size is its memory address (location) and because max_age has the same value, it id given the same id. This is how Python manages small integer values.  Python is using an optimization technique for small integers, booleans, and short strings.  Python reuses memory to minimize resource use.  This does not apply to floats in Python.

In [22]:
a = 4.712
b = 4.712
print(id(a))
print(id(b))

4410823888
4410831120


In [23]:
c = "happy"
d = "happy"
print(id(c))
print(id(d))

4412015856
4412015856


## Naming Variables

- can contain letters (both uppercase and lowercase), numbers, and underscores (_).
- must start with a letter or an underscore. They cannot start with a number.
- variable names are case-sensitive, so **ucla** and **UCLA** are different names.
- avoid using names that Python uses, like True, False, print, input etc.
- try to choose descriptive names that reflect the purpose of the object. So maybe movie_quote instead of quote
- use underscores to separate words (snake case) instead of camel case. Example, use `my_variable` rather than `myVariable`.



## Data Types

- Data type in Python defines the type of values that a variable can hold. 

- Python has numerous built-in data types, we will identify a few vital ones now and more later:

| data type      | examples          |
| -------------- | ----------------- |
| int (integer)  | 1, 3, 1000        |
| float          | 2., 2.0           |
| str (string)   | 'UCLA', 'abcde'   |
| bool (Boolean) | True, False       |


## Basic Type Conversion

In Python, data type conversion is dynamic, it happens whenever you want.

int(x) - converts x to an integer. If x is a floating-point number, it is truncated towards zero.  
float(x) - converts x to a floating-point number.  
str(x) - converts x to a string.  
bool(x) - converts x to a Boolean value. Any non-zero value is converted to True, and 0 or an empty  sequence (such as an empty string or list) is converted to False  

## Why Convert?

Data preprocessing - real world data is messy and conversions may be necessary to facilitate

- Compatibility with other parts of your code
- Efficiency in storage
- Function and method type constraints (more on methods later)

In [24]:
age = input("How old are you?")

How old are you? 18


In [25]:
type(age)

str

In [26]:
new_age = int(age)
type(new_age)

int

## How large?

It depends.  For integers, in theory there is no limit (depends on your available resources).  For  float, it's more of a precision issue. (see https://docs.python.org/3/library/sys.html)

In [27]:
# sys module is part of the PSL  no need to install it

import sys

print(sys.maxsize)

# By design the integer range is symmetric around zero, 
# with the maximum positive integer value being one less 
# than the negation of the minimum integer value so we
# subtract from the negation to get the true minimum

print(-sys.maxsize -1)

9223372036854775807
-9223372036854775808


In [28]:
print(sys.float_info.max)
print(sys.float_info.min)


1.7976931348623157e+308
2.2250738585072014e-308


## Operators

- In Python, operators are built-in language features.
- They have pre-defined behavior depending on the data type.
- Here they all are in one place:

| Symbol   | Operation            |
| -------- | -------------------- |
| +        | Addition             |
| -        | Subtraction          |
| *        | Multiplication       |
| **       | Exponentiation       |  
| /        | Division             |
| //       | Floor division       |
| %        | Modulo (remainder)   |   
| @        | Matrix multiplication|   
| <        | Less than            |
| \>       | Greater than         |
| <=       | LT or equal to       | 
| \>=      | GT or equal to       |
| ==       | Equality comparison  |
| !=       | Inequality comparison|



## And more operators

( --  Left parenthesis (used to open a group or a function call)       
) --  Right parenthesis (used to close a group or function call)    
\[ --  Left square bracket (used to index or slice a sequence)  
] --  Right square bracket (used to close an index or a slice)  
{ -- Left curly brace (used to create a dictionary or a set)  
} -- Right curly brace (used to close a dictionary or a set)  
, -- Comma  separates elements (in a list, tuple, or function call)   
: -- Colon  used in function definitions, in slices, and other places  
. -- Dot accesses attributes of a Python object  
; -- Semicolon separates statements on the same line  
@ --  `At` used for matrix multiplication   
' --  Single quote encloses a string literal e.g., 'UCLA'  
" -- Double quote also encloses a string literal e.g. "McDonald's"  
\# -- Hash or pound begins a single-line or inline comment in Python  
\ -- Backslash escaping special characters or for line continuation  
= -- Assignment operator assigns values to a variable or an attribute   


## Other Assignment Operators (useful in loops)

`+=` -- Increment and assign  
`-`= -- Decrement and assign  
`*=` -- Multiply and assign  
`/=` -- Divide and assign  
`%=` -- Modulus (returns remainder) and assign   
`**=`  -- Exponentiate and assign  
`//=` -- Floor (integer) divide and assign:


## Working with Strings
### Formatting strings

From the documentation:

*F-strings provide a way to embed expressions inside string literals, using a minimal syntax. It should be noted that an f-string is really an expression evaluated at run time, not a constant value. In Python source code, an f-string is a literal string, prefixed with ‘f’, which contains expressions inside braces. The expressions are replaced with their values.*


In [29]:
name = 'Alice'
age = 7
geography = 'Wonderland'

intro = f"My name is {name} and I am {age} years old. \
I live in {geography}."
print(intro)

My name is Alice and I am 7 years old. I live in Wonderland.


## more examples

You can read all about formats in the Python documentation.  See

https://docs.python.org/3/library/string.html#format-examples

Here, price is an integer  but I'd like to have a thousands separator.  Using the curly  brackets with a `:` separator followed by a format, so: {variablename:,}

In [30]:
price = 1234567
formatted_price = f"The price is ${price:,} dollars."
print(formatted_price)

The price is $1,234,567 dollars.


In this example, price is a float and the :.2f inside the curly braces specifies that price should be formatted as a floating point number with two decimal places.  We can use a format to round floats too.

In [31]:
price = 37149.9912345678
formatted_price = f"The price is ${price:,.2f}."
print(formatted_price)

The price is $37,149.99.


In [32]:
formatted_price = f"The price is ${price:,.0f}."
print(formatted_price)

The price is $37,150.


In [33]:
price = 37149.49
formatted_price = f"The price is ${price:,.0f}."
print(formatted_price)

The price is $37,149.


### What is a string anyway?

In Python, a string can be understood as a sequence of characters (more on sequences generally next time)

We access strings in Python using indexing and slicing.

Indexing a string means accessing a character in the string using its position (or index) within the string. 

**In Python, indexing starts at 0, which means that the first character in the string has an index of 0, the second character has an index of 1, and so on.** 

To index a string, you use square brackets and the index of the character you want to access. 

### String Index Example

In [34]:
s = "I love pizza"
len(s)

12

In [35]:
# but...
s[11]

'a'

s[11] would access the 12th character in the string s because

In [36]:
s[0]

'I'

### String Slices

- Slicing a string means accessing part of it by specifying a range of indices. 

- We still use square brackets, but instead of a single index, we use two indices separated by a colon. The first index specifies the starting position of the slice (inclusive), and the second index specifies the ending position of the slice (exclusive).  

- If the first index is omitted it implies 0 and if the second is omitted it implies the last indexed value

- You can also have a third index (step size, defaults to 1 if omitted)

string[start:stop:step]

## Slicing examples

The main difference between indexing and slicing a string is that indexing returns a single character, while slicing returns a substring that can contain one or more characters

In [37]:
# slice from the beginning of the string (0) to the 6th character (exclusive)
print(s[:6])  # output: "I love"

# slice from the 2nd character to the end of the string
print(s[2:])  # output: "love pizza"

# slice from the 2nd to the 6th character (exclusive)
print(s[2:6])  # output: "love"

# slice every other character 
print(s[::2])  # output: "Ilv iz"

# slice the string in reverse order
print(s[::-1])  # output: "azzip evol I"

I love
love pizza
love
Ilv iz
azzip evol I


## Splitting a string and string methods
https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str

In [38]:
s = "I love pizza with pepperoni"

- `s` is an instance of the str class and it represents a string object. 
- A class is a template or blueprint that defines the properties and methods (associated functions) that are common to all concrete realizations (instances) of that class
- Example, we split a string by invoking string methods

In [39]:
words = s.split()  # split the string into a list of words
print(words)  # output: ["I", "love", "pizza"]
print(len(words))


['I', 'love', 'pizza', 'with', 'pepperoni']
5


## More string methods

In [40]:
' '.join(words)

'I love pizza with pepperoni'

In [41]:
s.title()

'I Love Pizza With Pepperoni'

In [42]:
s.upper()

'I LOVE PIZZA WITH PEPPERONI'

In [43]:
s.lower()

'i love pizza with pepperoni'

In [44]:
s.replace("pepperoni", "mushrooms")

'I love pizza with mushrooms'