# 1. Algorithms analysis and Big-O notation
- **The relationship between the number of output values based on the number of input values, whilst the number of input values approaches infinity**
    - Describes the performance of an algorithm for the worst-case scenario, based on the execution-time-required or memory-space used 
    - E.g: O(n): Runtime grows linearly with the input size

<table>
<tr>
    <th><strong>Big-O</strong></th>
    <th><strong>Name</strong></th>
</tr>
<tr>
    <td>1</td>
    <td>Constant</td>
</tr>
<tr>
    <td>log(n)</td>
    <td>Logarithmic</td>
</tr>
    <tr><td>n</td>
    <td>Linear</td>
</tr>
    <tr><td>nlog(n)</td>
    <td>Log Linear</td>
</tr>
    <tr><td>n^2</td>
    <td>Quadratic</td>
</tr>
    <tr><td>n^3</td>
    <td>Cubic</td>
</tr>
    <tr><td>2^n</td>
    <td>Exponential</td>
</tr>
</table>

In [None]:
from math import log
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('bmh')

# Set up runtime comparisons
n = np.linspace(1,10,1000)
labels = ['Constant','Logarithmic','Linear','Log Linear','Quadratic','Cubic','Exponential']
big_o = [np.ones(n.shape),np.log(n),n,n*np.log(n),n**2,n**3,2**n]

# Plot setup
plt.figure(figsize=(12,10))
plt.ylim(0,50)

for i in range(len(big_o)):
    plt.plot(n,big_o[i],label = labels[i])


plt.legend(loc=0)
plt.ylabel('Relative Runtime')
plt.xlabel('n')

In [None]:
def comp(lst):
    '''
    This function prints the first item O(1)
    Then is prints the first 1/2 of the list O(n/2)
    Then prints a string 10 times O(10)
    '''
    print(lst[0])
    
    midpoint = int(len(lst)/2)
    
    for val in lst[:midpoint]:
        print(val)
        
    for x in range(10):
        print('number')

In [None]:
lst = [1,2,3,4,5,6,7,8,9,10]

comp(lst)

So let's break down the operations here. We can combine each operation to get the total Big-O of the function:

$$O(1 + n/2 + 10)$$

We can see that as n grows larger the 1 and 10 terms become insignificant and the 1/2 term multiplied against n will also not have much of an effect as n goes towards infinity. This means the function is simply O(n)!

In [None]:
def printer(n=10):
    '''
    Prints "hello world!" n times
    '''
    for x in range(n):
        print('Hello World!')
        
printer()

- Note how we only assign the 'hello world!' variable once, not every time we print. **So the algorithm has O(1) space complexity and an O(n) time complexity**

In [None]:
def create_list(n):
    new_list = []
    
    for num in range(n):
        new_list.append('new')
    
    return new_list

print create_list(5)

- Note how the size of the new_list object scales with the input n. **So the algorithm has O(n) space complexity and an O(n) time complexity**

# 2. Array sequences

- There are 3 main sequence classes:
    - List: [1,2,3]
    - Tuple: (1,2,3)
    - String: '123'
    
<br></br>
- **Dynamic arrays** double in memory allocation, when they are reaching full capcity (before the data overflows)
    - Has a Big O amortized cost of **O(1)** (constant)

# 3. Stacks, queues and deques
- Linear structures
    - Similar to arrays but each differs by how they:
        - Add items
        - Remove items

## Stacks
- Last in, first out (**LIFO**)

- **Stack()**:
    - creates a new stack that is empty
        - *It needs no parameters and returns an empty stack*
- **.push(item)**: 
    - Adds a new item to the top of the stack
        - *It needs the item and returns nothing*
- **.pop()**:
    - Removes the top item from the stack
        - *It needs no parameters and returns the item. The stack is modified*
- **.peek()**:
    - Returns the top item from the stack but does not remove it
        - *It needs no parameters. The stack is not modified*
- **.isEmpty()**:
    - Tests to see whether the stack is empty
        - *It needs no parameters and returns a boolean value*
- **.size()**:
    - Returns the number of items on the stack
        - *It needs no parameters and returns an integer*

In [None]:
class Stack:
    
    def __init__(self):
        self.items = []

    def isEmpty(self):
        return self.items == []

    def push(self, item):
        self.items.append(item)

    def pop(self):
        return self.items.pop()

    def peek(self):
        return self.items[len(self.items)-1]

    def size(self):
        return len(self.items)

# Queues
- First in, first out (**FIFO**)
    - **Enqueue**: Adding an item to the back of the queue
    - **Denqueue**: Removing an item from the back of the queue

- **Queue()**:
    - Creates a new queue that is empty
        - *It needs no parameters and returns an empty queue*
- **.enqueue(item):** 
    - Adds a new item to the rear of the queue
        - *It needs the item and returns nothing*
- **.dequeue():**
    - Removes the front item from the queue
        - *It needs no parameters and returns the item. The queue is modified*
- **.isEmpty():**
    - Tests to see whether the queue is empty
        - *It needs no parameters and returns a boolean value*
- **.size():**
    - Returns the number of items in the queue
        - *It needs no parameters and returns an integer*

In [None]:
class Queue:
    def __init__(self):
        self.items = []

    def isEmpty(self):
        return self.items == []

    def enqueue(self, item):
        self.items.insert(0,item)

    def dequeue(self):
        return self.items.pop()

    def size(self):
        return len(self.items)

# Deques
- Double-ended queue
     - Items can be added at the front or rear
     - Items can be removed at the front or rear

- **Deque()**:
    - Creates a new deque that is empty
        - *It needs no parameters and returns an empty deque*
- **.addFront(item)**:
    - Adds a new item to the front of the deque
        - *It needs the item and returns nothing*
- **.addRear(item)**
    - Adds a new item to the rear of the deque
        - *It needs the item and returns nothing*
- **.removeFront()**
    - Removes the front item from the deque
         - *It needs no parameters and returns the item. The deque is modified*
- **.removeRear()**
    - Removes the rear item from the deque
        - *It needs no parameters and returns the item. The deque is modified*
- **.isEmpty()**
    - Tests to see whether the deque is empty
        - *It needs no parameters and returns a boolean value*
- **.size()**
    - Returns the number of items in the deque
        - *It needs no parameters and returns an integer*

In [None]:
class Deque:
    def __init__(self):
        self.items = []

    def isEmpty(self):
        return self.items == []

    def addFront(self, item):
        self.items.append(item)

    def addRear(self, item):
        self.items.insert(0,item)

    def removeFront(self):
        return self.items.pop()

    def removeRear(self):
        return self.items.pop(0)

    def size(self):
        return len(self.items)

# 4. Linked lists
- Singly linked lists
- Doubly linked lists

## Singly linked lists
- A collection of nodes that collectively form a linear sequence
    - Each node stores two reference:
        - One to an element of the sequence
        - One to the next node of the list

- The first instance has a member called the head, and identifies the first node of the list  
- The second instance has a member called the tail, and identifies the last node of the list

<img src="pics/singly_linked_lists_head_and_tail.png">

- We can identify thetail as the node that has **None** as it's next reference
    - This proces is commonly known as **traversing** the linked list 
        - Aka link-hopping or pointer-hopping as the next reference can can seen as a link or pointer

- A linked list does not have predetermined size
- It's size is dependent on the number of items in the list

### Insert an element at the head of a singly linked list
- Create a new node
- Set it's element to the new element value
- Set it's next link reference to the previous head
- Set the head reference to point to the new node

<img src="pics/singly_linked_lists_insert_at_front.png">

### Insert an element at the end of a singly linked list
- Create a new node
- Set it's next link reference to **None**
- Set the next reference of the previous tail to the new node
- Set the tail reference to point to the new node

<img src="pics/singly_linked_lists_insert_at_end.png">

### Removing an element at the head of a singly linked list
- Link the head back to the next node (linking out)
- Remove the previous head node

### Remvoing an element at the end of a singly linked list
- Cannot easily delete the last node of a singly linked list
- We need to be able to access the node before the last node, in order to remove the last node
- We cannot reach the node before the tail by following the next reference links from the tail
- To make such an operation effcient the list needs to be made **doubly linked**

In [None]:
class Node(object):
    
    def __init__(self,value):
        
        self.value = value
        self.nextnode = None

In [None]:
a = Node(1)
b = Node(2)
c = Node(3)

In [None]:
a.nextnode = b

In [None]:
b.nextnode = c

In [None]:
a.value

In [None]:
a.nextnode.value

## Singly linked lists
- A collection of nodes that collectively form a linear sequence
    - Each node stores three reference:
        - One to an element of the sequence
        - One to the next node of the list
        - One to the previous node of the list

- A **header** node at the beginning of the list
- A **trailer** node at the end of the list
    - These dummy nodes are known as **sentinals**

<img src='pics/doubly_linked_lists_sentinal.png'>

## Insert a new node in a doubly linked list
- **Create** a new-node
- **Link** the previous-node to the new-node
    - **Link** the new-node to the previous-node
- **Link** the next-node to the new-node
    - **Link** the new-node to the next-node

## Delete a new node in a doubly linked list

- **Linking out**: The two nodes either side of the node targeted-for-deletion are linked to each other
- The node targeted for deletion will no longer be part of the list and is reclaimed by the system
- The same method is used to delete the first or last element, as sentinels are being used either side of each node

In [None]:
class DoublyLinkedListNode(object):
    
    def __init__(self,value):
        
        self.value = value
        self.next_node = None
        self.prev_node = None

In [None]:
a = DoublyLinkedListNode(1)
b = DoublyLinkedListNode(2)
c = DoublyLinkedListNode(3)

In [None]:
b.prev_node = a
a.next_node = b

In [None]:
b.next_node = c
c.prev_node = b

## 5. Recursion 

### What is recursion
- Two main instances of recursion
    - ***A technique for a function to make one or more calls to itself (most use cases)***
    - When a data structure uses smaller instances of the exact same type of data structure when it represents itself 

### Why use recursion
- Provides a powerful alternative for performing repetitions of tasks
    - In which a loop is not ideal

### Create using recursion
- **Define a base case** - The solution will need to return to the base case once all the recursive cases have been worked through

The factorial function is denoted with an exclamation point and is defined as the product of the integers from 1 to *n*. Formally, we can state this as:

$$ n! = n·(n-1)·(n-2)... 3·2·1 $$

we can rewrite the formal recursion definition in terms of recursion like so:

$$ n! = n·(n−1)!$$

Note, **if n = 0, then n! = 1**. This means the **base case** occurs once n=0, the *recursive cases* are defined in the equation above.

In [None]:
def fact(n):
    '''
    Note: use of recursion
    Returns factorial of n (n!).
    '''
    # Base case
    if n == 0:
        return 1
    
    # Recursion
    else:
        return n * fact(n-1)

In [None]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= 'http://faculty.cs.niu.edu/~freedman/241/241notes/recur.gif')

### Example 1
* Given an integer, create a function which returns the sum of all the individual digits in that integer. For example:
$$ if: n = 4321 $$ 
$$ return: 4+3+2+1 $$

In [None]:
def sum_func(n):
    # Base case
    if len(str(n)) == 1:
        return n
    
    # Recursion
    else:
        return n%10 + sum_func(n//10)

In [None]:
print(sum_func(4321))

### Example 2
* Create a function called word_split() which takes in a string **phrase** and a set **list_of_words**
* The function will then determine if it is possible to split the string in a way in which words can be made from the list of words
* You can assume the phrase will only contain words found in the dictionary if it is completely splittable

In [None]:
def word_split(phrase,list_of_words, output = None):
    '''
    Note: This is a very "python-y" solution.
    ''' 
    
    # Checks to see if any output has been initiated.
    # If you default output=[], it would be overwritten for every recursion!
    if output is None:
        output = []
    
    # For every word in list
    for word in list_of_words:
        
        # If the current phrase begins with the word, we have a split point!
        if phrase.startswith(word):
            
            # Add the word to the output
            output.append(word)
            
            # Recursively call the split function on the remaining portion of the phrase--- phrase[len(word):]
            # Remember to pass along the output and list of words
            return word_split(phrase[len(word):],list_of_words,output)
    
    # Finally return output if no phrase.startswith(word) returns True
    return output        

In [None]:
word_split('themanran',['the','ran','man'])

In [None]:
word_split('ilovedogsJohn',['i','am','a','dogs','lover','love','John'])

In [None]:
word_split('themanran',['clown','ran','man'])

## Memoisation
* Memoization effectively refers to remembering ("memoization" -> "memorandum" -> to be remembered) results of method calls based on the method inputs and then returning the remembered result rather than computing the result again. 
* It can be though of as as a cache for method results

### Fibonacci example
- The fibonacci sequance

In [82]:
# Create a recursive function
def fibonacci(n):
    # Base condition
    if n == 1 or n == 2:
        return 1
    # When n > 2, the function will run as recursive function
    elif n > 2:
        return fibonacci(n-1) + fibonacci(n-2)
    else:
        print('The input must be a positive integer')

In [86]:
fibonacci(10)

55

- if fibonacci(100) were to be run, then the program would take a very long time to run  
<br></br>
- **Memoisation** - *can be used to reduce the time taken to compute the result as it will cache previously computed values*

#### Impliment explicity
- Understand how memoisation works

In [None]:
fibonacci_cache = {}

def fibonacci(n):
    # Check in the nth term has  been cached
    # If cahced then return that value, rather than computing it again
    if n in fibonacci_cache:
        return fibonacci_cache[n]

    # If not then compute the nth term
    if n == 1:
        return 1
    elif n == 2:
        return 1
    elif n > 2:
        value = fibonacci(n-1) + fibonacci(n-2)
    
    # Add the new value to the cached values
    # Then return the value
    fibonacci_cache[n] = value
    return value

In [None]:
fibonacci(100)

#### Impliment using a built-in tool
- Implement memoisation, and save time by using a decorator built into python
- **lRU cache**: Least Recently Used cache
    - Add memoisation to a function in only one line

In [None]:
from functools import lru_cache

fibonacci_cache = {}

# Default max size = 128
@lru_cache(maxsize = 1000)
def fibonacci(n):
    if n == 1:
        return 1
    elif n == 2:
        return 1
    elif n > 2:
        return fibonacci(n-1) + fibonacci(n-2)

In [None]:
fibonacci(100)

#### Account for edge-cases

In [None]:
from functools import lru_cache

fibonacci_cache = {}

# Default max size = 128
@lru_cache(maxsize = 1000)
def fibonacci(n):
    if type(n) != int:
        raise TypeError("n must be a positve integer")
    if n < 1:
        raise ValueError("n must be a positve integer")
    if n == 1:
        return 1
    elif n == 2:
        return 1
    elif n > 2:
        return fibonacci(n-1) + fibonacci(n-2)

In [None]:
fibonacci('one')

In [None]:
fibonacci(100.2)

In [None]:
fibonacci(100)

## 6. Trees

### Introduction
- Can be illustrated as an up-side-down tree
- More general items on top, and more specific items towards the bottom
- All the children of one node and independent of the children of another node
- Each leaf node is unique  
    - E.g File systems/folders are structured as trees  

- A **node**'s name, is called the key
    - A node may have additional information, which is called the payload 
- An **edge** connects two nodes that show a relationship between them
    - The root is the only node that has no incoming edge
- A **path** is an ordered list of nodes
    - That are connected by edges  
<br></br>
- **Children** *(c)* are the set of nodes that have incoming edges from the same node
- The **parent** is a node that has outgoing edges to other nodes
- **Siblings** are nodes that a children of the same parent
- A **leaf** node is a node without children  
<br></br>
- The **level** *(n)* of a node is the number of edges on a path from the root node
- The **height** of the tree is the highest level within the tree

- A **binary tree** is a tree where each node has a maximum of 2 children  
<br></br>
- A **recursive tree** is a tree with a root, and zero or more sub-trees
    - Each sub-tree's root is connected to the root of the parent tree, by an edge

### Representing a Tree through Lists

#### Method 1: List of lists

In [54]:
# 'r' is the root
# The 2nd item in the list, is the left child and the 3rd item is the right child
def BinaryTree_l(r):
    return [r,[],[]]

In [23]:
# Each branch will have a left and right child, whether it's being used or not
# As long as 't' is in the correct branch (newBranch) (i.e. left postion: index pos = 1)
def insertLeft(root, newBranch):
    t = root.pop(1)
    
    if len(t) > 1:
        root.insert(1,[newBranch, t, []])
    else:
        root.insert(1,[newBranch, [], []])
    
    return root

In [24]:
# Each branch will have a left and right child, whether it's being used or not
# As long as 't' is in the correct branch (newBranch) (i.e. right postion: index pos = 1)
def inserRight(root, newBranch):
    t = root.pop(2)
    
    if len(t) > 1:
        root.insert(2,[newBranch, [], t])
    else:
        root.insert(2,[newBranch, [], []])
    
    return root

##### Access functions

In [55]:
def fetRootVal(root):
    return root[0]

In [56]:
def setRootVal(root, newVal):
    root[0] = newVal

In [57]:
def getLeftChild(root):
    return root[1]

In [58]:
def getRightChild(root):
    return root[2]

In [59]:
tree_1 = BinaryTree_l(3)

In [60]:
insertLeft(tree_1, 4)

[3, [4, [], []], []]

In [61]:
insertLeft(tree_1, 5)

[3, [5, [4, [], []], []], []]

In [62]:
inserRight(tree_1, 6)

[3, [5, [4, [], []], []], [6, [], []]]

In [63]:
inserRight(tree_1, 7)

[3, [5, [4, [], []], []], [7, [], [6, [], []]]]

In [64]:
left_child = getLeftChild(tree_1)
print(l)

[9, [4, [], []], []]


In [47]:
# Set the root of the left child from 5 to 9
setRootVal(left_child, 9)
print(tree_1)

[3, [9, [4, [], []], []], [7, [], [6, [], []]]]


#### Method 2: OOP

In [80]:
# Define a class that has attributes for the root value, left, and right sub-tree(s)
    # When create a new left child , we';ll be creating another instance of BinaryTree

class BinaryTree_OPP:
    
    def __init__(self, rootObj):
        self.key = rootObj
        self.leftChild = None
        self.rightChild = None
    
    def insertLeft(self, newNode):
        if self.leftChild == None:
            self.leftChild = BinaryTree_OPP(newNode)
        else:
            # Create a new sub-tree
            t = BinaryTree(newNode)
            # Set this new sub-tree-left-child value using the parent-tree-value
            t.leftChild = self.leftChild
            # Make this updated sub-tree the new left-child of the parent-tree
            self.leftChild = t
             

    def insertRight(self, newNode):
        if self.rightChild == None:
            self.rightChild = BinaryTree(newNode)
        else:
            # Create a new sub-tree
            t = BinaryTree(newNode)
            # Set this new sub-tree-left-child value using the parent-tree-value
            t.rightChild = self.rightChild
            # Make this updated sub-tree the new left-child of the parent-tree
            self.rightChild = t
            
    # Access functions
    def getRightChild(self):
        return self.rightChild
    
    def getLeftChild(self):
        return self.leftChild
    
    def setRootVal(self, obj):
        self.key = obj
    
    def getRootVal(self):
        return self.key

In [81]:
tree_1 = BinaryTree_OPP('a')

In [82]:
tree_1.getRootVal()

'a'

In [83]:
tree_1.getLeftChild()

In [84]:
print(tree_1.getLeftChild())

None


In [85]:
tree_1.insertLeft('b')

In [87]:
tree_1.getLeftChild()

<__main__.BinaryTree_OPP at 0x2202f5d8908>

In [88]:
tree_1.getLeftChild().getRootVal()

'b'

### Tree Traversal
- Preorder
- Inorder
- Postorder

#### Preorder traversal
- Visit the root node first
- Do a preorder traversal of the left sub-tree
- Do a recursive preorder traversal of the right sub-tree

In [91]:
# Internal method
def preorder(self):
    print(self.key)
    if self.leftchild:
        self.leftchild.preorder()
    if self.rightchild:
        self.rightchild.preorder()
        
# External fucntion
def preorder(tree):
    if tree:
        print(tree.getRootVal())
        preorder(tree.getLeftChild())
        preorder(tree.getRightChild())

#### Inorder traversal
- Visit the root node first
- Do a recursive inorder traversal of the left sub-tree
- Do a recursive inorder traversal of the right sub-tree

In [92]:
# The external function is a more useful solution
    # Typically other tasks will be done whilst traversing a tree
def preorder(tree):
    if tree:
        preorder(tree.getLeftChild())
        print(tree.getRootVal())
        preorder(tree.getRightChild())

#### Postorder traversal
- Do a recursive postorder traversal of the left sub-tree
- Do a recursive postorder traversal of the right sub-tree
- Visit the root node first

In [93]:
# External fucntion
def preorder(tree):
    if tree:
        preorder(tree.getLeftChild())
        preorder(tree.getRightChild())
        print(tree.getRootVal())

### Priority Queues with Binary Heaps
- Acts like a queue, where an item is deququed by removing it from the front  
<br></br>
- However the logical order of the items inside a queue is determined by their priority
    - The highest priority items are at the front of the queue
    - The lowest priority items are at the back of the queue  
<br></br>
- When an item is enqueued on a priority queue it may move all the way to the front  
<br></br>
- Priority queues are usually implemented using **Binary heaps**
    - A binary heap will allow for enqueuing and dequeuing items in O(log(n))

### Binary Heap
- Two common variations:
    - **Min heap** - Where the smallest key is always at the front
    - **Max heap** - Where the largest key is always at the front

<img src='pics/binary_heap_img_1.png'>

- Using a binary heap the tree can be stored in a single list
    - Rather than using a list of lists  
<br></br>
- E.g Level 1, left child = 9  
<br></br>
- The index postion of the value 9 = ***2***
    - Left child will be found at the index position **2P** = ***4***
        - Value at index position ***4*** = 14
    - Right child will be found at the index position **2P+1** = ***5***
        - Value at index position ***5*** = 18

#### Implementation

In [18]:
# How to build a binary heap
class Bin_heap():
    def __init__(self):
        self.heap_list = [0]
        self.current_size = 0

- A new item can be appended to the end of the list easily
    - **insert()** function
    - However to keep it a binary heap it must be swapped into the correct position 
        - **percUp()** function  
<br></br>
- **The heap order property must be restored**
    - Compare new item with it's parent
    - If he new item is less than the parent swap these items  
<br></br>
- The new item will percolate its way up to it's correct position witin the tree

<img src='pics/binary_heap_append_1.png'>

<img src='pics/binary_heap_append_2.png'>

<img src='pics/binary_heap_append_3.png'>

In [None]:
def perc_up(self, i):
    while i // 2 > 0:    # Find the position of the parent for a value, in the tree
        if self.heap_list[i] < self.heap_list[i // 2]:    # If the item < it's parent
            tmp = self.heap_list[i // 2]
            self.heap_list[i // 2] = self.heap_list[i]    # Swap these values
            self.heap_list[i] = tmp
    i = i // 2    # Move up one level in the tree to repeat the swapping process

In [None]:
def insert(self, k):
    self.heap_list.append(k)    # At a new item to the end of the list
    self.current_size = self.current_size + 1    # Increase value of current_size by 1
    self.perc_up(self.current_size)    # Percolate the last item into position

- The **delMin** method is trival to create
    - As the heap order property requires the root of the tree to be the smallest item in the tree  
<br></br>
- **The heap structure must be restored**
    - Restore the root by taking the last item and moving it to the root position
- **The heap order property must be restored**
    - Move the new root node to it's correct position within the tree, **percDown()**
        - Keep swapping items between nodes and their childeren
            - Until each node is less than both of its children

<img src='pics/binary_heap_remove_1.png'>

<img src='pics/binary_heap_remove_2.png'>

<img src='pics/binary_heap_remove_3.png'>

<img src='pics/binary_heap_remove_4.png'>

In [7]:
# When the index position of the node = p:
    # left child = 2p
    # right child = 2p +1
def minChild(self, i):
    '''
    Provides the position of the minimum value, within this binary tree.
    '''
    # See if this node has children
    if i * 2 + 1 > self.curentSize:
        return i * 2
    else:
        # Compare and see if whether the left or right child has the lowest value
        if self.heapList[i * 2] < self.heapList[i * 2 + 1]:
            # If the left child's value lower than the right child's value
            return i * 2
        else:
            # If the right child's value lower than the left child's value
            return i * 2 + 1 

In [8]:
# perdDown() is dependant on minChild()
def percDown(self, i):
    '''
    If a node's value is larger than either if it's children, then swap there positions.
    The largest value will then percolate down the tree to it's correct position.
    '''
    # Check if the level is found within the tree
    # Then, check to see if the currnt value is smaller or larger than it's children
    while (i * 2) <= self.currentSize:
        mc = self.minChild(i)
        # If the item's value is greater than the minimum child (mc), switch the values
        if self.heapList[i] > self.heapList[mc]:
            tmp = self.heapList[i]
            self.heapList[i] = self.heapList[mc]
            self.heapList[mc] = tmp
        i = mc

In [9]:
# An empty self.heapList = [0]
def delMin(self):
    '''
    Remove the minimum value in the tree (the root node's value) from the tree.
    '''
    retval = self.heapList[1]
    # The last item is made the root node
    self.heapList[1] = self.heapList[self.currentSize]
    # The current size of the list is rediced by 1 after deleting an item
    self.currentSize = self.currentSize - 1
    # Afetr setting the root node with the value of the last item, it is removed
    seld.heapList.pop()
    # Move the new root's value to the correct position within this binary tree
    self.percDown(1)

In [17]:
# Build the binary heap
def build_bin_heap(self, a_list):
    def __init__(self):
        # The starting position is the middle of the list
        i = len(a_list) // 2
        # Add a '0' as the first item as the heap list's start with a '0' value
        self.heap_list = [0] + a_list[:]
        # The current size is the len of the list
        self.current_size = len(a_list)
        # Stop origanising the heap order structure after organised the last list-item
        while (i > 0):
            # Move values into the correct positions, smallest at the top (min heap)
            self.percDown(i)
            # Starting moving up the levels, towards the top of the heap
            i = i - 1

### Binary Search Trees
- Implementations of map ADT (Abstract Data Type):
    - Binary search on list
    - Hash tables
    - Binary search tree  
<br></br>
- These methods are used to map a key of a value  
<br></br>
- For **Binary Search Trees**:
    - Not interested in the exact placement of items in the tree
    - Interested in providing efficient searching
    - Have an ***bst property*** - *ordering property*:
        - Keys that are less than the parent, are found in the left subtree
        - Keys that are greater than the parent, are found in the right subtree

- **Basic operations on a BST**  
<br></br>
    - *Create*: creates an empty tree
    - *Insert*: insert a node in the tree
    - *Search*: Searches for a node in the tree
    - *Delete*: deletes a node from the tree  
<br></br>
    - *Inorder*: in-order traversal of the tree
    - *Preorder*: pre-order traversal of the tree
    - *ostorder*: post-order traversal of the tree

* Arrange this list, using the bst ordering property [70, 31, 93, 94, 14, 23, 73]

<img src='pics/bst_ex1.png'>

* In order to create and work with an empty binary search tree:
    - Make one class for the binary search tree (**binary_search_tree**)
    - Make one class for the tree node (**tree_node**)

In [129]:
class tree_node():
    # The construction of this object's attributes
    def __init__(self,key,val,left=None,right=None,parent=None):
        self.key = key
        self.payload = val
        self.leftChild = left
        self.rightChild = right
        self.parent = parent
    
    # Return the left child
    def hasLeftChild(self):
        return self.leftChild
    
    # Return the right child
    def hasRightChild(self):
        return self.rightChild
    
    # # Return true if
    def isLeftChild(self):
        return self.parent and self.parent.leftChild == self
    
    # Return true if 
    def isRightChild(self):
        return self.parent and self.parent.rightChild == self
    
    # Return true if root node, if parent value is 'None' or 'False'
    def isRoot(self):
        return not self.parent
    
    # Return true if no children present (leaf node)
    def isLeaf(self):
        return not (self.rightChild or self.leftChild)
    
    # # Return true if any children present
    def hasAnyChildren(self):
        return self.rightChild or self.leftChild
    
    # # Return true if both children are present
    def hasBothChildren(self):
        return self.rightChild and self.leftChild
    
    # Repalce a node, with new key, payload
        # Update the parent left and right values
    def replaceNodeData(self,key,value,lc,rc):
        self.key = key
        self.payload = value
        self.leftChild = lc
        self.rightChild = rc
        # The left-child's parent field is reset to self
            # Which is an object with these 4 fields
                # As this object will be replacing the old root node
        if self.hasLeftChild():
            self.leftChild.parent = self
        # The right-child's parent field is reset to self
            # Which is an object with these 4 fields
                # As this object will be replacing the old root node
        if self.hasRightChild():
            self.rightChild.parent = self

In [134]:
class binary_search_tree():
    
    def __init__(self):
        self.root = None
        self.size = 0
        
    def length(self):
        return self.size
    
    def __len__(self):
        return self.size

    # Check to see if the tree already has a route
    def put(self,key,val):
        # If a root is present, call the private, recursive helper function _put
        if self.root:
            # Place the new node in the correct position (bst ordering property) 
                # Enter the new node key and value to insert (1st and 2nd arguement)
                # Enter the self.root-node-object as the current_node (3rd arguement)
            self._put(key,val,self.root)
        # If there is no root then create a new tree_node, and install it as the root
        else:
            self.root = TreeNode(key,val) # N.B. A class attribute can be an object
        self.size = self.size + 1
    
    # Starting at the root of the tree, and recursively search the binary-tree
        # Compare the new key, to the key of the curent-node
            # The current node is placed into the first empty child field
                # Based to the ordering proeprty
                    # Values less than the current node go to the left
                    # Values greater than the current node go to the right
    def _put(self,key,val,currentNode):
        # If the new key is less than the current node, search the left sub-tree
        if key < currentNode.key:
            # If left child present
            if currentNode.hasLeftChild():
                self._put(key,val,currentNode.leftChild)
            # If no left child present
            else:
                # Create a new node, using the tree_node object, at the position
                currentNode.leftChild = TreeNode(key,val,parent=currentNode)
        # If the new key is greater than the current node, search the right sub-tree
        else:
            # If right child present
            if currentNode.hasRightChild():
                self._put(key,val,currentNode.rightChild)
            # If no right child present
            else:
                # Create a new node, using the tree_node object, at the position
                currentNode.rightChild = TreeNode(key,val,parent=currentNode)
                
    # Rather than calling the method on the object (binary_search_tree.put(key, value)),
        # The object can be used as a dictionary (binary_search_tree[key])
    def __setitem__(self,k,v):
        self.put(k,v)
        

    # Starting at the root of the tree, and recursively search the binary-tree
    def get(self,key):
        # If a root node is present
        if self.root:
            # If the _get method finds a matching key from the binary tree
            res = self._get(key,self.root)
            if res:
                # Return the payload (value), for the node with the found key
                return res.payload
            # If the _get method does not find a matching key from the binary tree
            else:
                return None
        # If a root node is not present
        else:
            return None

    def _get(self,key,currentNode):
        # If the starting node (root) is not present
        if not currentNode:
            return None
        # If the targeted key matches the current node
        elif currentNode.key == key:
            return currentNode
        # If the targeted key is less tha current node
        elif key < currentNode.key:
            return self._get(key,currentNode.leftChild)
        # If the targeted key is greater than current node
        else:
            return self._get(key,currentNode.rightChild)

    # Rather than calling the method on the object (binary_search_tree.get(key)),
        # The object can be used as a dictionary (binary_search_tree[key])
    def __getitem__(self,key):
        return self.get(key)


    # Find the node to delete, by searching through the binary-tree
        # Search using the _get method to find the tree_node that will be removed
    def delete(self,key):
        '''
        1. If more than one node (size > 1) and a match exists then remove the node
            -> The delete function will use remove(), when:
                1.1 Only leaf node(s) exists
                1.2 Both children exists
                1.3 Only one child exists
                
        2. If only one node (size = 1) and a match exists then reset the root node
        '''
        
        if self.size > 1: # If the tree has more than one node
            nodeToRemove = self._get(key,self.root)
            if nodeToRemove: # If the target key is found
                self.remove(nodeToRemove) # Remove then node (remove() is defined below)
                self.size = self.size-1 # Update the size of the tree
            else:
                # Raise an error if the search key is not found
                raise KeyError('Error, key not in tree')
        # If the tree has only has one node, check if the key also matches
        elif self.size == 1 and self.root.key == key:
            self.root = None # Reset the root node to 'None'
            self.size = self.size - 1 # Update the size of the tree
        else:
            # Raise an error if the search key is not found
            raise KeyError('Error, key not in tree')

    def __delitem__(self,key):
        self.delete(key)

    # <-- currentNode is referencing self.root, which if present, is a treeNode -->
    def remove(self,currentNode):
        
        # <---------Deleting nodes case 1 (image below)--------->
        
        if currentNode.isLeaf(): # If the current node is a leaf node (no children)
            # N.B. The child node will be stored as an object within the parent
            # If the target-node matches as the left child
            if currentNode == currentNode.parent.leftChild: 
                currentNode.parent.leftChild = None # Remove current node
             # If the target-node matches as the right child (the only other option)
            else:
                currentNode.parent.rightChild = None # Remove current node
        
        
        # <--------- Deleting nodes case 2 (image below) --------->
        
        # if a node has both it's children
            # The node with the next largest key will take the current node's place
                # This node is called the sucessor (succ) and must be found
        elif currentNode.hasBothChildren(): # If current node has both children
            succ = currentNode._findSuccessor() # Find node with the next largest key
            succ._spliceOut() # Splice that node into curent node's position
            currentNode.key = succ.key # Succ's key becomes the current node's key
            currentNode.payload = succ.payload # Succ's pl becomes the current node's pl
        
        
        # <--------- Deleting nodes case 3 (image below) --------->
        
        # The only possible scenario now, is that the node has one child
            # Promote the child to take the place of it's parent
        # UPDATE the child node's parent-field, to the current node's parent-field
        # UPDATE the parent node's child-field, to the current node's child-field 
            # These updates will remove all reference to the current node
                # From it's parent and children
        else: 
            # Firstly check if the current node has a left child
                # Seeing as the left child will have a lower value than the right child
                # The left node will take the place of the current node
            # All the sub conditions in this condition, have a left child
                # Which will take the place of the current node
            if currentNode.hasLeftChild():
                # If the current node is a left child node
                if currentNode.isLeftChild():
                    currentNode.leftChild.parent = currentNode.parent
                    currentNode.parent.leftChild = currentNode.leftChild
                # # If the current node is a right child node 
                elif currentNode.isRightChild():
                    currentNode.leftChild.parent = currentNode.parent
                    currentNode.parent.rightChild = currentNode.leftChild
                # If the current-node is not a left or right child then it is the root
                    # eplace_node_date() is used as there is no parent
                # Rather than connecting the current node's lowest child and parent
                    # Seeing as there is no parent to the root
                # The fields of the root are reset
                    # Using the values of the current node's lowest child node
                else:
                    currentNode.replaceNodeData(currentNode.leftChild.key,
                                                currentNode.leftChild.payload,
                                                currentNode.leftChild.leftChild,
                                                currentNode.leftChild.rightChild)
            
            # If a left child is NOT present then the right node will take it's place
            else:
                # # If the current node is a left child node
                if currentNode.isLeftChild():
                    currentNode.rightChild.parent = currentNode.parent
                    currentNode.parent.leftChild = currentNode.rightChild
                # # If the current node is a right child node
                elif currentNode.isRightChild():
                    currentNode.rightChild.parent = currentNode.parent
                    currentNode.parent.rightChild = currentNode.rightChild
                # If the current-node is not a left or right child then it is the root
                    # eplace_node_date() is used as there is no parent
                # Rather than connecting the current node's lowest child and parent
                    # Seeing as there is no parent to the root
                # The fields of the root are reset
                    # Using the values of the current node's lowest child node
                else:
                    currentNode.replaceNodeData(currentNode.rightChild.key,
                                                currentNode.rightChild.payload,
                                                currentNode.rightChild.leftChild,
                                                currentNode.rightChild.rightChild)

    # * Helps _findSuccessor() to find the lowest left child value
    def _findMin(self):
        current = self
        while current.hasLeftChild(): # Whilst a left child exists for the current node
            current = current.leftChild # Reset 'current''s value with this new value
        return current # Return the furthest left child, in this search

    # Helps remove() to ___ when ___ both children are present
    # The successor is taken to be the next largest value
    def _findSuccessor(self):
        succ = None 
        if self.hasRightChild(): # Check if there is a right child to the current node
            # If there is a right child for the current node
            # Then furthest left child of the current node is the next value *
            succ = self.rightChild._findMin() 
        else: # If no right-child for the current node
            if self.parent: # If a parent node exists, for the current-node
                if self.isLeftChild(): # If the current-node is a left child
                    succ = self.parent # Then the next value is the parent
                # If current node is a right child, but doesn't have a right child
                    # Then the successor is the successor of it's parent
                    # I.e The next largest key, after the parent excluding current node
                else: 
                    self.parent.rightChild = None # Disregard current node from search
                    succ = self.parent._findSuccessor() # Find successor of parent node
                    self.parent.rightChild = self # Reset current node with the succ
        return succ
                    
    # Splice succ node from binary tree, and then move the nodes to maintain bst order
    def _spliceOut(self): # 'self' here is the successor node
        
        # Is a leaf node
        if self.isLeaf(): # If leaf node
            if self.isLeftChild(): # If successor node is a left child leaf node
                self.parent.leftChild = None # Remove succ node's references to parents 
            else: # If successor node is a right child leaf node
                self.parent.rightChild = None # Remove succ node's references to parents

        # Succ guarantee condition of only one child
        else self.hasAnyChildren # Figure 3.1
            # Update the references for the parent, left and right child nodes, for self
            if self.hasLeftChild(): # If the successor has a left child
                if self.isLeftChild():
                    self.parent.leftChild = self.leftChild # Reset parent's left child
                else:
                    self.parent.rightChild = self.leftChild # Reset parent's right child
                self.leftChild.parent = self.parent # Resets the left-child's parent
                
            else: # If the successor has a right child
                if self.isLeftChild():  
                    self.parent.leftChild = self.rightChild # Reset parent's left child
                else:
                    self.parent.rightChild = self.rightChild # Reset parent's right child
                self.rightChild.parent = self.parent # Resets the right-child's parent
                
    # An iterator should only return one node each time it's called
    def __iter__(self):
        if self:
            if self.hasLeftChild():
                for elem in self.leftChild:
                    yield elem
            yield self.key
            if self.hasRightChild():
                for elem in self.rightChild:
                    yield elem

#### Deleting nodes case 1  
<img src='pics/del_node_1.png'>

#### Deleting nodes case 2   
<img src='pics/del_node_2_i.png'>  
##### 2.1  
<img src='pics/binary_trees_remove_1_child.png'>

#### Deleting nodes case 3  
<img src='pics/del_node_3.png'>  
##### 3.1  
<img src='pics/binary_trees_remove_2_child.png'>

* If a duplicate key is inserted, this collision should be handled
    * One method is replace the previous key's value with the new entry's value

In [None]:
mytree = binary_search_tree()
mytree[3]="red"
mytree[4]="blue"
mytree[6]="yellow"
mytree[2]="at"

print(mytree[6])
print(mytree[2])

## 7. Searching and sorting
- Go through all the data and compare the elements

### Sequential search

#### Unordered list

In [None]:
def seq_search(arr,ele):
    pos = 0          # Starting position
    found = False    # True when number is found

    while pos < len(arr) and not found:
        if arr[pos] == ele:
            found = True
        else:
            pos +=1
    return found 

In [None]:
arr = [1,2,3,4,5]
seq_search(arr,6)

In [None]:
arr = [1,2,3,4,5]
seq_search(arr,3)

In [None]:
# Check edge cases
arr = [1,2,3,4,5]
seq_search(arr,1)

In [None]:
# Check edge cases
arr = [1,2,3,4,5]
seq_search(arr,5)

#### Ordered list

In [None]:
def seq_search(arr,ele):
    '''
    Input array must be ordered
    '''
    
    pos = 0          # Starting position
    found = False    # True when number is found
    stopped = Fasle  # Stop when the value is greate than the target element
    
    while pos < len(arr) and not found and not stopped:
        if arr[pos] == ele:
            found = True
        else:
            if arr[pos] > ele:
                stopped = True
            else:
                pos +=1
    
    return found 

In [None]:
arr = [1,2,3,4,5,6,7,8,9,10]
seq_search(arr,11)

In [None]:
arr = [1,2,3,4,5,6,7,8,9,10]
seq_search(arr,9)

In [None]:
# Check edge cases
arr = [1,2,3,4,5,6,7,8,9,1]
seq_search(arr,1)

In [None]:
# Check edge cases
arr = [1,2,3,4,5,6,7,8,9,10]
seq_search(arr,10)

### Binary search
- For ordered lists
- Will start by examing the middle items
    - If the **target item** is:
        - **Greater than the middle item**
            - The entire lower half is discarded
                - Search continues at the new middle-item
        - **Less than the middle item**
            - The entire upper half is discarded
                - Search continues at the new middle-item

#### Iteration
- In an iteration, variables are reintialised
    - first
    - last  
<br></br>
- These values are used to change the value of 'mid' 

In [None]:
def binary_search(arr,ele):
    
    first = 0
    last  = len(arr)-1
    found = False

    while first <= last and not found:
        mid = (first+last)//2
        
        if ele == arr[mid]:
            found = True
        elif ele < arr[mid]:
                last = mid - 1
        elif ele > arr[mid]:
                first = mid + 1
    
        return found

In [None]:
# Item present
arr = [1,2,3,4,5,6,7,8,9,10]
binary_search(arr,5)

In [None]:
# Item not present
arr = [1,2,3,4,5,6,7,8,9,10]
binary_search(arr,13)

#### Recursion
- In a recursion the first arguement of the fucntion is reinitialised 
    - 'mid' is feed back into the function arguement as a list index value 
        - arr[:mid]
        - arr[mid+1:]  
<br></br>
- The new list with a different length, will change the value of 'mid'

In [29]:
def rec_binary_search(arr,ele):
    
    # Base case
    if len(arr) == 0:
        return False 
    
    else: 

        mid = len(arr)//2
        
        if arr[mid] == ele:
            return True
        else:
            if ele < arr[mid]:
                return rec_binary_search(arr[:mid],ele)
            else:
                return rec_binary_search(arr[mid+1:],ele)
    
        return found

In [None]:
# Item present
rec_binary_search(arr,3)

In [None]:
# Item not present
rec_binary_search(arr,13)

## Hashing / Hash table

- **Hashing**: 
    - Building a data structure that can be searched in constant time O(1)  
<br></br>
- **Hash tables**: 
    - An array containing all of the keys to search on  
<br></br>
- **Hash functions**: 
    - Determines the position of each key in the array   
        - Can be any function which always maps the same input to the same output
            - Inserting a new value ~ O(1)
            - Looking up a new value ~ O(1)
    - Hash function methods:
        - *Folding method*: 
            - 6574837601, 6+5+7+4+8+3+7+6+0+1 = 47, 47 % 11 = 3
            - **slot_pos = 3**
        - *Mid-square method*: 
            - 44, 44^2 = 1936, 2 central ints = 93, 93 % 11 = 5
            - **slot_pos = 5**
        - *Non-integer values*: 
            - cat, use ordinal values, ord('c') = 99, a=97, t=116, cat=312, 312%11=4
            - **slot_pos = 4**
 
 <br></br>
- **Collision resolution**: 
    - There is nothing which guarantees that the hash function won't produce the same output for two different inputs  
        - Unless it's a perfect hash function which is difficult to produce  
 <br></br>
    - Solutions(s):
        - **Open addressing**:
            - Hash again (**rehash/rehashing**) to find another location with an empty slot
                - The *open addressing* technique of systematically visting each slot:
                    -  **1 by 1 = linear probing**
                        - With linear probing we keep moving down until empty slot
                    -  **1 by 1 = quadratic probing**
                        - To avoid clustering which is done by skipping slots
                            - More evenly distributing the items that cause collisions
                        - Rather than using constant 'skip' value
                            - Use a rehash function that increments the hash value
                                - E.g if first value is = h
                                    - Successive values = h+1, h+3, h+5 h+7, h+9
                                        - Any sequence of increasingly spaced ints
        - ***Separate chaining***: 
            - Allow each slot to hold a reference to a collection (or chain) of items
            - Allows many items to exist in the same location in the hash table
                - Each array slot has a linked list
            - O(1) to find correct index in the array + a potential linear search down
            - If the hash table increases in size then the linear searches will skew the complexity towards O(n)
                - The table is then *rehashed*, where:
                    - A new larger hash table is created
                    - All the data is inserted into the new hash table

In [141]:
class HashTable(object):
     
    def __init__(self,size):
            # Sets up slots, data and size
            self.size = size
            self.slots = [None] * self.size
            self.data = [None] * self.size
        
        
    def hashfunction(self,key,size):
        # Remainder method
        return key % size


    def rehash(self,oldhash,size):
    # Find the next possible position
        return (oldhash + 1) % size


    def put(self,key,data):
        # Get the hash value
        hashvalue = self.hashfunction(key,len(self.slots))

        # If slot is empty
        if self.slots[hashvalue] == None:
            seld.slots[hashbalue] = key
            seld.data[hashbalue] = data

        # If the key already exists, replace the old value
        elif self.slots[hashvalue] == key:
            self.data[hashvalue] == data

        else: # Find the next available slot
            nextslot = self.rehash(hashvalue,len(self.slots))

             # If the slot is not empty, and is not the key
            while self.slots[nextslot] != None and self.slots[nextslot] != key:
                nextslot = self.rehash(nextslot,len(self.slots))

            # Set new key and data, if element is None
            if self.slots[nextslot] == None:
                self.slots[nextslot] = key
                self.data[nextslot] = data

            # Set new data, if element is equal is key
            else:
                self.data[nextslot] = data


    def get(self,key):
        '''
        Getting items given a key
        '''
        # Set up variables for the search
        startslot = self.hashfunction(key,len(self.slots))
        data = None
        # Stops the while loop from searching after passing over all the elements once
        stop = False
        found = False
        # The starting position of the search
        position = startslot
        
        # The while loop will continue to cycle through all the elements in the list
            # Given element is empt, not key or unsearched
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == key:
                data = self.data[position]
                found = True
            else:
                position = self.rehash(position,len(self.slots))
                # Until returning to the first element that has already been checked
                if position == startslot:
                    # Make stop = True. So that the whole loop will exist
                    stop = True
                    
        return data

    # Allows for list indexing to be used to set a key and data item
    def __setitem__(self,key,data):
        return self.put(key,data)

    # Allows for list indexing to be used to return the data element for a given key
    def __getitem__(self,key):
        return self.get(key)

## Bubble sort
- Makes multiple passes through a list
- Compares adjacent items 
- Values sare moved if they are out of order
- The numbers are sorted in ascending order
    - With the largest value as the last item

<img src='pics/bubble_sort.png'>

In [None]:
def bubble_sort(arr):
    for n in range(len(arr)-1,0,-1):
        for k in range(n):
            if arr[k] > arr[k+1]:
                temp = arr[k]
                arr[k] = arr[k+1]
                arr[k+1] = temp
    
    return print(arr)

In [None]:
arr_b = [5,6,4,8,2,3]
bubble_sort(arr_b)

## Selection sort
https://stackoverflow.com/questions/15799034/insertion-sort-vs-selection-sort
- In comaparison to the bubble sort, only one swap is made for every pass
    - Rather than n-1 swaps for every pass
- Looks for the largest value at every pass
- Will place the item in position so that the numbers will be in ascending order
    - With the largest value as the last item

<img src='pics/selection_sort.png'>

In [5]:
 def selection_sort(arr):
        
        for sorted_n in range(len(arr) - 1, 0, -1):
            current_max = 0
            
            for unsorted_n in range(1, sorted_n + 1):
                if arr[unsorted_n] > arr[current_max]:
                    current_max = unsorted_n
            
            temp = arr[sorted_n]
            arr[sorted_n] = arr[current_max]
            arr[current_max] = temp
            
        return print(arr)

In [6]:
arr_s = [5, 3, 4, 6]
selection_sort(arr_s)

[3, 4, 5, 6]


## Insertion sort
https://stackoverflow.com/questions/15799034/insertion-sort-vs-selection-sort
- *Selection sort*: The **comparions** take place in the **unsorted** part of the list
- *Insertion sort*: The **comparions** take place in the **sorted** part of the list  
<br></br>
- *In comparison to a selection sort*: An insertion sort compares the next item, with the items found in the sorted list
    - The item is placed into the sorted part, in order  
<br></br>
- In some cases insertion sort can be advantages, as it shifts the values rather than exchanging the values
    - Takes up 1/3 of the processing power as only one assignment takes place
    - In contrast to 3 in bubble and selections sorts

<img src='pics/insertion_sort.png'>

In [81]:
def insertion_sort(arr):
    
    for i in range(1, len(arr)):
        current_value = arr[i]
        position = i
        
        # The while loop is used to allow for multiple swaps to be made
            # As, the value for position is fed back into the statement to be compared
        while position > 0 and arr[position - 1] > current_value:
            arr[position] = arr[position - 1]
            position -= 1
        
        arr[position] = current_value
    
    return arr

In [82]:
arr_i = [1, 9, 6, 2, 5, 7, 4, 8, 3]
insertion_sort(arr_i)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

## Shell sort
 
<br></br>
* Improves on the insertion sort by breaking the original list into a number of smaller sublists
* The sublists are made by choosing elements at a specific interval from each other
* These sublists are sorted using insertion sort

<br></br>
* When using insertion sort:
    - If the item to be sorted is near the opposite end of the list
    - The item would be swapped with every item between itself and the sorted position
* With the shell sort
    - Only the items in the sublists are sorted
    - Items can move a far larger distances along the array with far lower swaps
    - Thereby being a less time complex operation
    - If there are any unsorted items a final insertion sort on the list takes place
    - However, by now only a few items will be out of place
        - So the time complexity will not be significantly affected by this last sort

* Create the sublists
<br></br>
<img src='pics/shell_sort_1.png'>
<br></br>
* Sort the sublists
<br></br>
<img src='pics/shell_sort_2.png'>
<br></br>
* The final insertion sort will have an increment of 1 (the standard insertion sort)
<br></br>
<img src='pics/shell_sort_3.png'>

In [83]:
# In this example  after comparing all the sublists for a given gap, the gap will half
def shell_sort(arr):
    # The number of comparisons being made, based on the fact the list is split
    sublist_gap = len(arr) // 2
    # As long as the gap is greater than 0, stop after comparing at a gap of 1 
    while sublist_gap > 0:
        # Start is used to indicate the number of comparisons
        for start in range(sublist_gap):
            gap_insertion_sort(arr,start,sublist_gap)
            #print("When gap size is: ", sublist_gap)
            #print("List: " , arr)
        # Reduce the gap by half, until the increment (gap) between comparisons is 1
        sublist_gap = sublist_gap // 2
        
def gap_insertion_sort(arr,start,gap):
    # Iterate through the sublist that is created using the gaps
    for i in range(start+gap,len(arr),gap):
        current_value = arr[i]
        position = i
        # The first comparison to be made after crossing the first gap (position >= gap)
        # The items are to increse in value towards the end of the list
            # close-to-beginning = arr[position-gap] 
            # clsoe-to-end = current_value 
            # If item close-to-beginning > clsoe-to-end, swap items to sort correctly
        # The while loop is used to allow for multiple swaps to be made
            # As, the value for position is fed back into the statement to be compared
        while position >= gap and arr[position - gap] > current_value:
            # If the condition above is met swap the item values
            arr[position] = arr[position - gap]
            # After making the swap get the index position of the other item in the swap
            position = position - gap
        # If swap is made, the other item is placed into the pos cloest to the beginning
        # If the swap is not made, the position of the current_value is unchanged
        arr[position] = current_value

In [84]:
arr = [45,67,23,45,21,24,7,2,6,4,90]
shell_sort(arr)
arr

[2, 4, 6, 7, 21, 23, 24, 45, 45, 67, 90]

## Merge sort

* A recursive algorithm that continually splits a list in half

<br></br>
* If the list is more than one item:
    - It is plit into half 
    - We then recursively invoke merge sort on both halves
    
<br></br>
* If the list is empty or has one item:
    - It is sorted by definition as there are no items to sort it with 
    
<br></br>
* Once the two halves are sorted, they are **merged**
    - Taking two smaller sorted lists and combining them into a single sorted list

* Split
<br></br>
<img src='pics/merge_sort_1.png'>
<br></br>
* Order and merge
<br></br>
<img src='pics/merge_sort_2.png'>
<br></br>
* Animation
<br></br>
![SegmentLocal](pics/Merge-sort-animation.gif "segment")
* Double recusion
<br></br>
<img src='pics/merge_sort_merge.png'>

In [1]:
def merge_sort(arr):
    
    # The base case, to terminate the splitting process
        # If arguement is a list less than 2 items then the if statement does not run
            # The splitting will then stop when only one items is left in the list
        # In one level up (the previous level in the call stack)
            # The merge_sort(left_half) has no further actions to run
                # So we move onto the next line: merge_sort(right_half)
    if len(arr) > 1:
        mid = len(arr) // 2
        left_half = arr[:mid]
        right_half = arr[mid:]
        
        # The elements in the left_half will keep on splitting
            # Until the elements are in their own list
                # The left_half is passed in as arr recurrisively until the base case
        print('Left half:    ',left_half)
        merge_sort(left_half)
        
        # After all the left_half elements are split in their own list
            # The same process repeats for the all the elements in the right_half
                # The right_half is passed in as arr recurrisively until the base case
        print('Right half:   ',right_half)
        merge_sort(right_half)
        
        i = 0
        j = 0
        k = 0
        
        # 'and' prevents and out of bounds errors when indexing throught both the lists
        while i < len(left_half) and j < len(right_half):
            if left_half[i] < right_half[j]:
                arr[k] = left_half[i]
                i += 1
            else:
                arr[k] = right_half[j]
                j += 1
            # After the smallest value for a given 'k' index position is set
            k += 1 # Point to the next position in the result arr by using k
        
        # If the left_half has more elements than the right_half
        # If 'i' increments beyond the lenght of the left_half, the while loop exits
            # However, if an item is still to be sorted in the right_half
                # It is accounted for using this loop and inserted into arr[k] unsorted
        while i < len(left_half):
            arr[k] = left_half[i]
            i += 1
            k += 1
        
        # If the right_half has more elements than the left_half
        # If 'j' increments beyond the lenght of the right_half, the while loop exits
            # However, if an item is still to be sorted in the left_half
                # It is accounted for using this loop and inserted into arr[k] unsorted
        while j < len(right_half):
            arr[k] = right_half[j]
            j += 1
            k += 1
        
        print('\nMerged list:  ',arr, '\n')
    
    return arr

In [2]:
arr = [54,26,93,17,77,31,44,55,20]
merge_sort(arr)

Left half:     [54, 26, 93, 17]
Left half:     [54, 26]
Left half:     [54]
Right half:    [26]

Merged list:   [26, 54] 

Right half:    [93, 17]
Left half:     [93]
Right half:    [17]

Merged list:   [17, 93] 


Merged list:   [17, 26, 54, 93] 

Right half:    [77, 31, 44, 55, 20]
Left half:     [77, 31]
Left half:     [77]
Right half:    [31]

Merged list:   [31, 77] 

Right half:    [44, 55, 20]
Left half:     [44]
Right half:    [55, 20]
Left half:     [55]
Right half:    [20]

Merged list:   [20, 55] 


Merged list:   [20, 44, 55] 


Merged list:   [20, 31, 44, 55, 77] 


Merged list:   [17, 20, 26, 31, 44, 54, 55, 77, 93] 



[17, 20, 26, 31, 44, 54, 55, 77, 93]

* As each merge step halves the list size, there are **log(n)** merge steps
* At each merge step, each item is compared, to **n** work is done 
* Time and space complexity = **O(n log n**)

* Merge sort is more efficient and works faster than quick sort, for larger sized arrays or datasets
* Quick sort is more efficient and works faster than merge sort, for smaller size array or datasets

## Quick sort

* Uses divide and conquer as done in merge sort
    - While not using additional storage
* As a trade off, it is possible that the list may not be split in half
     - Which pushing the time complexity from **O(log n)** to **O(n)**
         - Which will decrease performance, by increasing the time taken to sort
         
 <br></br>
 * First select a value which is called the **pivot value**
 * The pivot value will be used to split the list
     - Values lower to the left and values larger than it to the right of it
 * The the pivot is moved to a new location, it is called the **split point**
     - Which is used for subsequents called, to the quick sort
 * The **partion** will:
     - Find the split point
     - Values lower than the value at the split point will be moved to the left
     - Values greater than the value at the split point will be moved to the right
         - Move the leftmark until a value greater than the pivot is found
         - Move the rightmark until a value less than the pivot is found
             - Switch this value around
     - Split found once the position of rightmark is less than that of the leftmark

* 54 is chosen to be the first pivot point
<img src='pics/quick_sort_1.png'>
<br></br>
* The partion process 
<img src='pics/quick_sort_2.png'>
<br></br>

In [21]:
def quick_sort(arr,first=0,last=len(arr) - 1):
    
    quick_sort_helper(arr,first,last)
    
    # Base case, to terminate the recursion
    if first < last:
        split_point = split(arr,first,last)
        
        quick_sort(arr,first,split_point - 1)
        quick_sort(arr,split_point + 1,last)
    
        
def split(arr,first,last):
    
    pivot_value = arr[first]
    
    left_mark = first + 1
    right_mark = last
    
    done = False
    
    while not done:
        
        while left_mark <= right_mark and arr[left_mark] <= pivot_value:
            left_mark += 1
            
        while right_mark >= left_mark and arr[right_mark] >= pivot_value:
            right_mark -= 1
        
        # Exit the main while loop
        # Return right_mark to create a new pivot_value for the new split
        if right_mark < left_mark:
            done = True
            
        else: # Make the swap if:
            # The left mark is greater than the pivot_value, and
            # The right mark is less than the pivot_value
            arr[left_mark], arr[right_mark] = arr[right_mark], arr[left_mark]
            
    # After the left_mark and right_mark cross 
        # Switch the pivot with the current right_kark value
    # So that the pivot_value is in the centre of the current partion
        # As we know all the items after this position will be greater than the pivot
    arr[first], arr[right_mark] = arr[right_mark], arr[first]
    
    return right_mark

if __name__ == '__main__':
    arr = [2,5,4,6,7,3,1,4,12,11]
    quick_sort(arr)
    print(arr)

[1, 2, 3, 4, 4, 5, 6, 7, 11, 12]


In [22]:
arr = [2,5,4,6,7,3,1,4,12,11]
quick_sort(arr)
arr

[1, 2, 3, 4, 4, 5, 6, 7, 11, 12]

### Merge Sort vs Quick Sort
* The merge sort algorithm performs in the exact same and precise manner regardless of the number of elements involved in the sorting
* It works fine well with the large data set
* Quick sort is faster than merge sort in some cases such as for small data sets
* Merge sort requires additional memory space to store the auxiliary arrays, **O(n)**
    - On the other hand, the quick sort doesn’t require much space for extra storage
* Merge sort is more efficient than quick sort
* The quick sort is internal sorting method where the data that is to be sorted is adjusted at a time in main memory
    - Conversely, the merge sort is external sorting method
        - Where the data that is to be sorted cannot be accommodated in the memory, at the same time. So some has to be kept in the auxiliary memory

<br></br>
#### Conculsion
* The quick sort is faster for smaller lists but is inefficient for larger lists. 
* It also performs a lot of comparisons as compared to merge sort. 
* Although, merge sort requires less comparison it needs an additional memory space of **0(n)**, for storing the extra array while quick sort needs space of **O(log n)**  

<br></br>
#### Considerations
**Why Quick Sort is preferred over MergeSort for sorting Arrays**
* Quick Sort in its general form is an in-place sort (i.e. it doesn’t require any extra storage) whereas merge sort requires O(N) extra storage, N denoting the array size which may be quite expensive. Allocating and de-allocating the extra space used for merge sort increases the running time of the algorithm. Comparing average complexity we find that both type of sorts have O(NlogN) average complexity but the constants differ. For arrays, merge sort loses due to the use of extra O(N) storage space.  
<br></br>

**Why MergeSort is preferred over QuickSort for Linked Lists?**
* In case of linked lists the case is different mainly due to difference in memory allocation of arrays and linked lists. Unlike arrays, linked list nodes may not be adjacent in memory. Unlike array, in linked list, we can insert items in the middle in O(1) extra space and O(1) time. Therefore merge operation of merge sort can be implemented without extra space for linked lists.  
<br></br>
* In arrays, we can do random access as elements are continuous in memory. Let us say we have an integer (4-byte) array A and let the address of A[0] be x then to access A[i], we can directly access the memory at (x + i*4). Unlike arrays, we can not do random access in linked list. Quick Sort requires a lot of this kind of access. In linked list to access i’th index, we have to travel each and every node from the head to i’th node as we don’t have continuous block of memory. Therefore, the overhead increases for quick sort. Merge sort accesses data sequentially and the need of random access is low.

## 8. Graph

* Graphs are a more general structure than trees
    * Trees and a type of graph  
<br></br>
* Graphs can be used to represent: 
    * Roads
    * Airline flights from city to city
    * How the internet is connected  
<br></br>
* Once the problem can be represented, standard graph algorithms can be used to solve the problem, with less complexity and quicker

## Components

* **Vertex** also known as a **node**
* If the vertex is named: **key**
* If A vertex has additional information, attached to it's key: **payload**