# Final Exam Review

## Topics Covered
- Everything we covered during lectures and assignments except Python packages
- Python basics
- Algorithm Analysis
    - What is algorithm analysis?
    - Time vs space complexity
    - Big o Notation
- Sorting Algorithms
- Data Structures 
- Hashing
    - Why use hashing?
    - When is hashing appropriate
- Trees
- Graphs


## Hashing
Mapping of a key to its unique slot position (hash value), that is, there is a function $f$ , such that $f(\text{key}) = \text{hashvalue}$

**Types of hash methods**
- Truncation
- Folding
- Radix Conversion
- Remainder method (the method we covered in class)

**Why use hashing?**
- Hashing allows for really fast <mark style='background-color:green'>$O(1)$</mark> data insertion and searching due to each element having a unique hash value

**When is hashing appropriate?**
- When we don't care about the order that the data is in
- When we want an alternative to a sorted binary search

**What is a collision?**
- A collision is when two or more elements have the same hashvalue despite having different data
- *How do you handle a collision*
    - Linear Probing: Find the next available slot if you can
    - Chaining: Have a list or a linked list at each slot in the hashtable/map and append to it

**How can you make hashing better?**
- Larger hash table size
- Have the size of the hash table/map be a prime number

## Algorithm Analysis
- **Time Complexity**: The amount of time an algorithm takes based on the size of its inputs
- **Space Complexity**: The amount of memory an algorithm takes up based on the size of its inputs

<img src='../images/bigo.png' style='height:50%;width:50%'>

### Sorting Algorithm Complexities
Operation | Best Time | Average Time | Worst Time | Space
----- | ----- | ----- | -----| -----
Bubble Sort | <mark>$O(n)$</mark> | <mark style='background-color:salmon'>$O(n^{2})$</mark> | <mark style='background-color:salmon'>$O(n^{2})$</mark> |  <mark style='background-color:green'>$O(1)$</mark>
Insertion Sort | <mark>$O(n)$</mark> | <mark style='background-color:salmon'>$O(n^{2})$</mark> | <mark style='background-color:salmon'>$O(n^{2})$</mark> |  <mark style='background-color:green'>$O(1)$</mark>
Selection Sort | <mark style='background-color:salmon'>$O(n^{2})$</mark> | <mark style='background-color:salmon'>$O(n^{2})$</mark> | <mark style='background-color:salmon'>$O(n^{2})$</mark> |  <mark style='background-color:green'>$O(1)$</mark>
Shell Sort | <mark style='background-color:orange'>$O(nlog(n))$</mark> | <mark style='background-color:salmon'>$O(n(log(n))^{2})$</mark> | <mark style='background-color:salmon'>$O(n(log(n))^{2})$</mark> |  <mark style='background-color:green'>$O(1)$</mark>
Heap Sort | <mark style='background-color:orange'>$O(nlog(n))$</mark> | <mark style='background-color:orange'>$O(nlog(n))$</mark> | <mark style='background-color:orange'>$O(nlog(n))$</mark> |  <mark style='background-color:green'>$O(1)$</mark>
Merge Sort | <mark style='background-color:orange'>$O(nlog(n))$</mark> | <mark style='background-color:orange'>$O(nlog(n))$</mark> | <mark style='background-color:orange'>$O(nlog(n))$</mark> | <mark>$O(n)$</mark>
Quick Sort | <mark style='background-color:orange'>$O(nlog(n))$</mark> | <mark style='background-color:orange'>$O(nlog(n))$</mark> | <mark style='background-color:salmon'>$O(n^{2})$</mark> | <mark style='background-color:lime'>$O(log(n))$</mark>

### Data Structure Operations
Structure | Average Access | Average Search | Average Insertion | Average Deletion
----- | ----- | ----- | ----- | -----
Linked List | <mark>$O(n)$</mark> | <mark>$O(n)$</mark> | can be either <mark>$O(n)$</mark> or <mark style='background-color:green'>$O(1)$ | can be either <mark>$O(n)$</mark> or <mark style='background-color:green'>$O(1)$
Stack | <mark>$O(n)$</mark> | <mark>$O(n)$</mark> | <mark style='background-color:green'>$O(1)$ | <mark style='background-color:green'>$O(1)$</mark>
Queue | <mark>$O(n)$</mark> | <mark>$O(n)$</mark> | <mark style='background-color:green'>$O(1)$ | <mark style='background-color:green'>$O(1)$</mark>
Binary Search Tree | <mark style='background-color:lime'>$O(log(n))$</mark> | <mark style='background-color:lime'>$O(log(n))$</mark> | <mark style='background-color:lime'>$O(log(n))$</mark> | <mark style='background-color:lime'>$O(log(n))$</mark>
AVL Tree | <mark style='background-color:lime'>$O(log(n))$</mark> | <mark style='background-color:lime'>$O(log(n))$</mark> | <mark style='background-color:lime'>$O(log(n))$</mark> | <mark style='background-color:lime'>$O(log(n))$</mark>
HashTable/HashMap | <mark style='background-color:grey'>N/A | <mark style='background-color:green'>$O(1)$ | <mark style='background-color:green'>$O(1)$ | <mark style='background-color:green'>$O(1)$
Binary Heap | $O(1)$ | $O(n)$ | <mark style='background-color:lime'>$O(log(n))$</mark> | <mark style='background-color:lime'>$O(log(n))$</mark>

**Note** A Binary search tree can have a worst case time of <mark>$O(n)$</mark> if all the nodes are arranged in a line. Tree operations can also be represented as $O(h)$ where $h$ is the height of the tree



## Trees
A tree is a non-linear, hierarchical data structure consisting of a collection of nodes(elements) and a collection of edges between the nodes
- Each node in a tree has a parent node and multiple child nodes
- A leaf node is a node that has no children
- A tree is built up of subtrees, which are trees themselves

### Height vs Depth
- The **Height** of a node in a tree is the number of edges to the most distant leaf node
- The **Depth** of a node in a binary tree is the number of edges from that node to the root node

### Tree Traversal
The three main ways to traverse a binary tree are
1. **Pre-Order**: Process data, visit left subtree, visit right subtree
2. **In Order**: Visit left subtree, process data, visit right subtree
3. **Post-Order**: Visit left, subtree, visit right subtree, process data

### Binary Tree Properties
- **Full Binary Tree**: Every node in the tree has either no or two child nodes
- **Complete Binary Tree**: Every level of the tree is full except for the deepest level, and all the nodes in the deepest level are as left as possible
- **Balanced Tree**: A tree is balanced if the left and right subtrees of any node have a height that differs by not more than 1

### Binary Search Tree Properties
- The left subtree of a node only contains values that are less than the parent
- The right subree of a node only contains values that are less than the parent
- The subtrees of a node must also be a Binary Search Tree

<img src="..\images\example tree.png" >

### AVL Trees
**Why is it important to make sure a BST is balanced?**
- It's to maintain <mark style='background-color:lime'>$O(log(n))$</mark> search time, if a tree isn't balanced it can potentially have a time complexity of <mark>$O(n)$</mark> because the nodes could be in a straight line, where at that point it functions more like a linked list

**What is the balance factor of a node?**
- The balance factor (BF) of a node is the difference between the heights of the left and right subtrees, if $|\text{BF}| > 1$ the tree gets rotated to rebalance

## Heaps

### Priority Queue
A priority queue is a queue that orders the items in the queue by their *priority*. The items with the highest priority are at the front of the queue and the items with the lowest priority are at the back of the queue. If an item with a high priority is enqueued, it will be stored toward (or possibly at) the front of the queue. It will thus be one of the first (or *the* first) items dequeued from the queue.

### Binary Heap Properties
#### Structure Property
- Complete Binary Tree
#### Heap Order Property
- For every parent node *p* with child nodes *m* and *n*:
    - *p* item <= *m* item (Min Heap)
    - *p* item <= *n* item (Min Heap)
- Note:
    - The min (or max) value is always in the root node
    - No relationship between *m* item and *n* item
    - Duplicate values are allowed

### Min vs Max Heap
The difference between a min heap and a max heap is that the smallest item gets the highest priority in a min heap and is the root of the tree, and the largest item gets the highest priority in a max heap

## Graphs
A graph is an abstract data type representing nodes (vertices) and their connections (edges). It can be thought of as a generalized tree where each node can be connected to another node

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/640px-6n-graf.svg.png" width="300">

### Terminology
1. **Adjacent**: two vertices $v_k$ and $v_l$ are adjacent if they are connected by an edge ($(v_k, v_l) \in E$)
    - Ex: vertices 4 and 6 are adjacent
    - Ex: vertices 4 and 2 are *not* adjacent
2. **Path**: a sequence of edges leading from a source (starting) vertex to a destination (ending) vertex
    - Ex: a path from 1 to 6 is: (1, 2), (2, 3), (3, 4), (4, 6)
3. **Path Length**: The number of edges in the path
    - Ex: The length of the above path is 4 edges
4. **Distance**: The distance between two vertices is the path length for the shortest path between two vertices
    - Ex: the shortest path from 1 to 6 is: (1, 5), (5, 4), (4, 6) which has a path length of 3, therefore the distance from 1 to 6 is 3
5. **Cycle**: A path that starts and ends at the same vertex, graphs without cycles are called *acyclic*

### Weighted Graph
Each edge in a weighted graph has an associated "weight" or value associated with it. This weight represents the cost to move from one vertex to another. The shortest distance between two vertices in a weighted graph is the path with the smallest edge weight, instead of the least number of edges

### Graph Traversal Algorithms
- **Breadth First Search**:  Makes use of a <mark style='background-color:cyan'>Queue</mark> and visits each adjacent vertex
- **Depth First Search**: Explores the deepest path of each adjacent vertex before moving onto the next, makes use of a <mark style='background-color:cyan'>Stack</mark> and makes use of backtracking 
- Both of them have a time complexity of $O(V + E)$ where $V$ is the number of vertices and $E$ is the number of edges