# Midterm Exam Study Guide

## 1. Python Basics

Type | Mutable?
----- | -----
String | No
List | Yes
Tuple | No
Set | No
Dictionary| Yes (but not labels)

Proving the mutability of lists

In [2]:
temp_list = [1, 2, 3]
print(temp_list)
temp_list.append(4)
print(temp_list)

[1, 2, 3]
[1, 2, 3, 4]


Proving the mutability of dictionaries

In [3]:
my_dict = {
    "brand": "Subaru",
    "model": "Outback",
    "year": 2011
}
print(my_dict)
my_dict["color"] = "black" #adding a new item to the dictionary
print(my_dict)

{'brand': 'Subaru', 'model': 'Outback', 'year': 2011}
{'brand': 'Subaru', 'model': 'Outback', 'year': 2011, 'color': 'black'}


What is Aliasing?
- It's when you set a variable to another, it's like making a reference in C++
- In the example below `num2` is set to `num1` which gives it the same value and id

In [4]:
num1 = 5
num2 = num1
print("num1 =", num1, id(num1))
print("num2 =", num2, id(num2))

num1 = 5 140729725886336
num2 = 5 140729725886336


What is inheritance?
- Inheritance is the act of creating a class that inherits data members and member functions from a parent class, this helps cut down on redundant code and make things reusable

In [5]:
class Person: #creating parent class Person
    def __init__(self, fname, lname):
        self.first_name = fname
        self.last_name = lname
    
    def print_name(self):
        print(self.first_name, self.last_name)

x = Person("Zach", "Fechko")
x.print_name()

Zach Fechko


In [6]:
#creating child class Student that inherits from the person class
class Student(Person): #by putting the parent class in parenthesis, Student inherits from Person
    pass

y = Student("Anthony", "Ghimpu")
y.print_name()

Anthony Ghimpu


Functions vs Methods
- **Methods** are member functions, or functions that are defined as part of a class
- **Functions** are defined outside of a class

## 2. Python for Data Analytics

What is the `ndarray` class? 
- ndarray means *N-dimensional array*, it's a numpy array that is a multidimensional container of items of the same type and size. It's size is usually fixed and defined when the array is made

In [7]:
import numpy as np
# A 2-dimensional array of size 2x3 composed of 4-byte insteger values
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
type(x)


numpy.ndarray

What is numpy vectorizing?
- Vectorizing means that the function is applied over many values simultaneously instead of one value at a time, this cuts down on code runtime as well as making your code cleaner
- Vectorized functions run faster because numpy runs on pre-compiled and pre-allocated C code instead of having to allocate and run new C code for each iteration
- As you can see below, the `multiply_arrays` function runs in almost half the time compared to the `multiply_lists`

In [8]:
import timeit
def multiply_lists(li_a, li_b):
    for i in range(len(li_a)):
        li_a[i] * li_b[i]

def multiply_arrays(arr_a, arr_b):
    arr_a * arr_b

li_a = [1, 2, 3, 4]
li_b = [5, 6, 7, 8]

arr_a = np.array(li_a)
arr_b = np.array(li_b)

%timeit -n 10000 -r 5 multiply_lists(li_a, li_b)
%timeit -n 10000 -r 5 multiply_arrays(arr_a, arr_b)

844 ns ± 117 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)
466 ns ± 27.2 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)


What is NumPy broadcasting?
- Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. The smaller array would be "broadcast" across the larger array so they have compatible shapes
- In the example below, the 2 from `b` is stretched across the size of `a` so that `a` can be multiplied by `b`

In [9]:
a = np.array([1, 2, 3])
b = 2
a * b #when multiplying, b gets turned from 2 into [2, 2, 2] in order to make them the same size

array([2, 4, 6])

## 3. Algorithm Analysis

Algorithm analysis is checking to make sure that a function can use its resources efficiently, either in memory size or average runtime

- **Time Complexity** is the amount of time a function takes to run based on the size of its inputs
- **Space Complexity** is the amount of memory a function takes during runtime based on the size of its inputs

- <font color='green'>Constant time</font> = $O(1)$
- <font color='lime'>Logarithmic</font> = $O(log(n))$
- <font color='yellow'>Linear time</font> = $O(n)$
- <font color='orange'>Log linear</font> = $O(n log(n))$
- <font color='salmon'>Quadratic</font> = $O(n^2)$
- <font color='salmon'>Polynomial</font> = $O(n^k)$
- <font color='salmon'>Exponential</font> = $O(2^n)$
- <font color='salmon'>Factorial</font> = $O(n!)$

Operation counting example
```python
def sum(n):
    sum = 0                     #k
    for i in range(0, n, 1):    #k | 
        sum += 1                #k |  n
    return sum                  #k
```
Time Complexity ~ No. of operations <br>
$ = k + n * (k + k) + k$ <br>
$ = 2kn + 2k$ <br>
Linear time

**Asymptotic Analysis** <br>
How does $T(n)$ vary with n? 
- <font color='green'>Constant</font>: $T(n) = k = O(1)$
- <font color='yellow'>Linear</font>: $T(n) = a * n + b = O(n)$
- <font color='salmon'>Quadratic</font>: $T(n) = a * n^2 + b * n + c = O(n^2)$
- <font color='salmon'>Exponential</font>: $T(n) = a^n + b = O(2^n)$
- <font color='salmon'>Factorial</font>: $T(n) = a * n! + b = O(n!)$

A hack for determining big O notation is to use the largest degree of n, for example in the Quadratic one the highest degree of n is $n^2$, therefore the time complexity is $O(n^2)$

Time complexities of common algorithms:
- Linear Search: <font color='yellow'>$O(n)$</font>
- Binary Search: <font color='lime'>$O(log(n))$</font>
- Merge Sort: <font color = 'orange'>$O(n log(n))$</font>
- Bubble Sort: <font color='salmon'>$O(n^2)$</font>
- Selection Sort: <font color='salmon'>$O(n^2)$</font>
- Shell sort: <font color='salmon'>$O(n(log(n))^2)$</font>