# Week 1 Notes – Basic Algorithms
## 2024 Data Structures & Algorithms Challenge
### Notes by Cyril Michino | Zindua School

## Before We Begin...
### Why learn data structures and algorithms (Computer Programming Fundamentals)
- Build optimised code: increase speed and reduce cost (Particularly at scale)
- Advance your problem-solving skills with code (Paradigms for solving code problems)
- Crack technical interviews (Typical questions for Big Tech companies)

### What this course will cover:
1. Week 1: Basic Algorithms
    - Big-O Notation
    - Arrays & Hashmaps
    - Search
    - Sorting
2. Week 2: Data Structures
    - Linear Data Structures: Linked Lists, Stacks, Queues
    - Non-Linear Data Structures: Trees, Graphs
3. Week 3: Divide & Conquer Algorithms (Real problem-solving starts)
    - Recursion
    - Dynamic Programming
    - Greedy Algorithms
4. Week 4: Advanced Algorithms
    - Dynamic Programming in Graphs & Grid
    - Advanced Graph Algorithms: Search, Pathfinding, Vertex Coloring
    - Optimise Greedy Algorithms: Hill Climbing (Gradient Descent)
    - NP-Completeness
5. Week 5: Bonus Sessions
    - 2 Career by Gebeya
    - 2 bonus sessions (unstuck)
    - Bonus: Numpy, Monte Carlo, Markov Chains

## Day 1: Introduction to Algorithms 
Objectives: Unbderstand Big-O Notation, learn array search algorithms (Linear vs Binary Search)

### Big-O Notation
Recommended Readings:
1. [FreeCodeCamp, What is Big O Notation Explained: Space and Time Complexity](https://www.freecodecamp.org/news/big-o-notation-why-it-matters-and-why-it-doesnt-1674cfa8a23c/)
2. [Big-O cheatsheet, Know thy complexities](https://www.bigocheatsheet.com/)

**Why the notation:** Different computer run at different speeds but this notation allows us to understand the relative speed of an algorithm. Note that we use the Big-O notation for both runtime and space complexity. We'll use runtime complexity (the number of operations your code makes) to understand this notation better.

These are the notations to take note of though when assessing the complexity of code:
- Big O (O()) describes the upper bound of the complexity (Worst case)
- Omega (Ω()) describes the lower bound of the complexity (Best case)
- Theta (Θ()) describes the exact bound of the complexity (Exact worst case bound)

When gauging the complexity of our code, we always focus on the worst case scenario. Hence, we use Big-O notation to determine the complexity of our code. However, since upper bounds can move to infinity (e.g. n^2,n^3 are all upper bounds of n), most of the time when we talk of Big-O, we are actually talking about Theta (Θ()) i.e. the exact bound of complexity.

#### Runtime Complexity Examples
##### Here is code that runs in constant time O(1)

In [7]:
## Constant time, operations remain the same regardless of the scale of the elements
arr = [3,4,6,7,8]
print(arr) #Whether the array has 2 elements or million elements, the print operation only runs once

a


In [16]:
4 + 5 #Regardless of the weight of the numbers, this operation also has constant time

9

#### Here is code that runs in linear time O(n)

In [17]:
arr = [3,4,6,7,8]
for i in arr:
    print(i) ## Number of print operations will be equal to the length of the array

3
4
6
7
8


#### Here is code that runs in quadratic time O(n<sup>2</sup>)

In [18]:
arr = [[3,4,6,7,8],[3,4,6,7,8],[3,4,6,7,8],[3,4,6,7,8],[3,4,6,7,8]]

for i in arr: # This loop runs n times
    for j in i: # The sub-loop runs n-times every time it is called
        print(j) ## Hence, we have n^2 print operations

3
4
6
7
8
3
4
6
7
8
3
4
6
7
8
3
4
6
7
8
3
4
6
7
8


Here is a diagram of different complexities ranked:

O(1) < O(log n) < O(n) < O(n log n) < O(n<sup>2</sup>) < O(n<sup>k</sup>) < O(2<sup>n</sup>) < O(n!) | where n is the number of elements and k is a constant greater than 2

Note the default base of the log in computer programming is always 2 given that we'll be dealing mostly with binary operations.

![Rank of Complexities](https://www.freecodecamp.org/news/content/images/2021/06/1_KfZYFUT2OKfjekJlCeYvuQ.jpeg)

### Linear Search
We search for elements sequentially in an array from the first index till we find it.
- Worst-case scenario O(n)
- Best-case scenario Ω(1)

In [15]:
def linearsearch(arr,element):
    
    for i in range(len(arr)):
        if arr[i] == element:
            return i
        
    return None ## If element is not in array

arr = [2,45,34,67,4,23,56]
print(linearsearch(arr,4))

4


### Binary Search
Only works for sorted arrays. We look for the mid-point of the array and ask ourselves whether the element is the mid-point, on the right-side (higher value), or the left-side (lower value). If the element is the mid-point, we return it, if it is not we focus on the side (sub-array) where it is found and we repeat the split search process up until we find the element.
- Worst-case scenario O(log n) i.e. number of operation half with an increase in elements
- Best-case scenario O(1)

In [21]:
### Day 1 Challenge: Write a script to search for an element in an array using binary searc
def binarysearch(arr,element):
    left = 0
    right = len(arr)

    while left<=right:
        mid = (left+right)//2 ## Use double-division to truncate out the decimal places
        if arr[mid] == element:
            return mid
        if arr[mid] > element:
            right = mid - 1
        if arr[mid] < element:
            left = mid + 1
    return None ## If element is not in the array


arr = [2,4,6,7,9,10,13] ## Binary search only works for a sorted array
element = 10

print(binarysearch(arr,element))

5


## Day 2: Introduction to Data Structures

### Big-O Notation – Space Complexity

### What are Data Structures?
A way to store, organise and manage data/information that we use in our programs. There are primitive (Integer, Float, String, Boolean) and non-primitive data structures. Here is how we can categosise non-primitive data structures:
1. Linear Data Structures
    - Arrays
    - Linked Lists
    - Stacks
    - Queues
2. Non-Linear Data Structures
    - Hashmaps
    - Trees
    - Graphs

Here are the operations we care about when choosing data structures:

- Accessing data
- Searching for data
- Inserting data
- Deleting data

In [1]:
arr = [4, "we", 5] ## Not an array (Data type should be similar)
arr = ["Cyril", "Gerald", "Ivy", "Shadrach"] ## Array of strings
arr = [4,5,6,7,8] ## Array of integers
arr = [[4,5,6,7,8],[4,5,6,7,8],[4,5,6,7,8],[4,5,6,7,8]] #n-dimensional array

In [2]:
arr = [4,5,6,7,8]
arr[2] = 9 ## Replacing values at an index O(1)
arr

[4, 5, 9, 7, 8]

In [14]:
## Finding
for i in range(len(arr)):
    if arr[i] == 6:
        print(i)

2


### Arrays
- List of items with similar data types (primitive data type).
- Array size cannot be changed (Fixed size). We have caveats in Python where array sizes can increase, or in Java where we have static and dynamic arrays

#### Common operations in arrays
- Accessing an element (Use of indexes, usually integers, to access data) O(1)
- Replacing values in an array O(1)
- Searching O(n)
- Inserting O(n)
- Deleting O(n)

#### Limitations of Arrays
- Fixed Size
- Inefficient when inserting and deleting

#### Alternative to Arrays: Dynamic Arrays (ArrayLists)
- Arrays that do not have a fixed size: Java and C++
- ArrayLists actually take more memory space to save the same amount of data (Data storage is non-contiguous)
- ArrayLists are slower given how data is stored (Array contain memory locations of the actual values in the array)
- ArrayLists cannot take in multidimensional data (Only take in objects, autoboxing used to make this seamless)
- Python Lists are a special hybrid of Arrays and ArrayLists (Call it dynamic arrays)

In [17]:
## Hashmaps
students = {1:"John",56:"Maureen",345:"Cyril"}
students[56] ## Accessing items through the key O(1)

'Maureen'

In [18]:
students[78] = "Edwin" ## Adding items to the hashmap O(1)
students

{1: 'John', 56: 'Maureen', 345: 'Cyril', 78: 'Edwin'}

In [19]:
students.pop(345) ## Removing items O(1)
students

{1: 'John', 56: 'Maureen', 78: 'Edwin'}

### Hashmaps (Dictionaries)
- Stores data as key and value (Keys replace indexes which are used in arrays, keys have to be unique)
- Hashmaps use hash functions and hash tables to map keys to an index (memory address) of the value
    - Hash functions should be collision proof
    - To handles collissions: Open Addressing (Next Address), Closed Addressing (Linked Lists)
- Time complexity for hasmap operations:
    - O(1) for all operations in the best case
    - O(n) if the hash function is inefficient (LinkedLists on one address
    - Note that for most programming languages the has function has been optimise to avoid collisions hence operations are still O(1)

In [None]:
## Trying to build a hash function if our keys could only be strings of dates in 2024

### Ignore the code below, watch the recording to understand what I was trying to raise about collisions
safaricom = {'01/01/2024':234,'02/02/2024':231, '23/11/2024':456}
hashtable = [31,31+28,31+28+31,31+28+31+30]
if int(key[3:5]) == 1:
    print(key[:2])
if int(key[3:5]) == 2:
    print(key[:2]+31)
if int(key[3:5]) == 3:
    print(key[:2]+31+28)

## Day 3: Basic Sorting Algorithms

### Selection Sort
#### Space Inefficient Selection Sort
- Worst-case scenario: O(n^2)
- Best-case scenario: O(n^2) – you always have to complete the loops even if array is already sorted
- Space complexity: 0(n) - we are creating a new array that will take in n elements (size of unsorted array)

In [4]:
def selectionsort(unsorted):
    sorted_ = []
    while len(unsorted) > 1:
        min_ = float('inf')
        for item in unsorted:
            if item < min_:
                min_ = item

        sorted_.append(min_)
        unsorted.remove(min_)

    return sorted_

In [11]:
arr = [23,1,45,32,90,87,78,78]
selectionsort(arr)

[1, 23, 32, 45, 78, 78, 87, 90]

#### Space Efficient Selection Sort
- This is still selection sort, therefore, best and worst scenarios are still the same O(n^2)
- Space complexity: O(1) – instead of creating a new array, we are sorting by swapping indexes

In [9]:
def selectionsort2(unsorted):
    for j in range(len(unsorted)): # switch the start-point by one
        min_ = float('inf')
        for i in range(j,len(unsorted)): #search for minimum element loop
            if unsorted[i] < min_:
                min_ = unsorted[i]
                index = i
        unsorted[index] = unsorted[j]
        unsorted[j] = min_

    return unsorted   

In [10]:
arr = [23,1,45,32,90,87,78]
selectionsort2(arr)

[1, 23, 32, 45, 78, 87, 90]

### Bubble Sort – O(n)
- Best-case runtime: O(n) – when array is already sorted
- Worst-case runtime: O(n^2) – when smallest element is at the last index
- Space complexity: O(1) – sorting happens within the input array

In [15]:
def bubblesort(unsorted):
    swapped = True

    while swapped == True:
        swapped = False
        for i in range(len(unsorted)-1):
            if unsorted[i] > unsorted[i+1]:
                unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i]
                swapped = True

    return unsorted

In [16]:
arr = [23,1,45,32,90,87,78]
bubblesort(arr)

[1, 23, 32, 45, 78, 87, 90]

### Insertion Sort
- Best-case runtime: O(n) – when array is already sorted
- Worst-case runtime: O(n^2) – when array is already sorted in reverse
- Space complexity: O(1) – sorting happens within the input array

In [6]:
def insertionsort(unsorted):
    for i in range(1,len(unsorted)):
        j = i-1
        target = unsorted[i]
        while j >= 0:
            if unsorted[j] > target:
                unsorted[j+1] = unsorted[j]
                unsorted[j] = target
                j = j-1
            else:
                break

    return unsorted

In [7]:
arr = [23,1,45,32,90,87,78,78]
insertionsort(arr)

[1, 23, 32, 45, 78, 78, 87, 90]

In [13]:
unsorted = [23,1,45,32,90,87,78,78]

### To debug and issue a video solution
for i in range(1,len(unsorted)):
    for j in range(i-1,len(unsorted),-1):
        print(j)
        if unsorted[j] > unsorted[i]:
            unsorted[j], unsorted[i] = unsorted[i], unsorted[j]
            print(unsorted)

print(unsorted)

[23, 1, 45, 32, 90, 87, 78, 78]


## Day 4: Recursive Sorting Algorithms

### What is Recursion?
These are functions that call themselves. Recursive functions make it easier to solve problems that have repetitive sub-problems. Here are some examples of recursive functions:
1. **Example 1:** Sum of numbers upto the nth number `1 + 2 + ... + (n-1) + n`. Let's put this into practice – sum of numbers till 4 will be `4 + 3 + 2 + 1 = 10` and sum of numbers till 3 will be `3 + 2 + 1 = 6`. What you'll realise from this is that the sum of numbers till 4 is simply the sum of numbers till 3 plus the number 4 (repetitive sub-problem). The sum of numbers till 5 will simply be the sum of numbers till 4 plus the number 5. If we were to encode this to a recursive function it will be `f(n) = n + f(n-1)`. However, it is worth noting that if you implemented this in code, the function will recall itself, always subtracting 1 `f(n-1)` till `-infinity`. This not what we want, therefore, we introduce a base case where `if n = 1; return 1` which will stop the recursion when n is 1. Base cases are very important to avoid recursive functions that run till the end of time.
2. **Example 2:** The fibonacci sequence is another good example where recursive functions can be applied. The nth term in the sequence if found by adding the two previous terms such that the terms are: `1,1,2,3,5,8,13,21,...`. If you were asked to compute the nth term of our sequence, we know the recursive subproblem can be written as `f(n) = f(n-1) + f(n-2)`. However, base cases are important, when at the first and second term, there are no two previous terms therefore in these instances we'll `return 1; if n = 1 or n = 2`.

See code solutions to these examples below:

In [8]:
## Example 1 code solution (Sum of numbers till n)
def sum_(n):
    if n == 1: ## The base case
        return 1
    return sum_(n-1) + n ## Recursive call

In [10]:
sum_(100)

5050

In [17]:
## Example 2 code solution (Nth term of the fibonacci sequence)
def fibonacci(n):
    if n == 1: ## First base case
        return 1
    if n == 2: ## 2nd base case
        return 1
    return fibonacci(n-1) + fibonacci(n-2) ## Recursive call

In [18]:
fibonacci(10)

55

### Merge Sort - O(n log n)

In [14]:
def merge(left,right):
    '''Takes in two sorted arrays;
    merges them into one sorted array'''
    
    merged = []
    while len(left) > 0 and len(right) > 0:
        if left[0] < right[0]:
            merged.append(left[0]) ## Add item to the merged array
            left = left[1:] ## Remove item from original sub-array
        else:
            merged.append(right[0]) ## Add item to the merged array
            right = right[1:] ## Remove item from original sub-array

    while len(right) > 0:
        merged.append(right[0])
        right = right[1:]

    while len(left) > 0:
        merged.append(left[0])
        left = left[1:]

    return merged

In [15]:
def mergesort(unsorted):
    if len(unsorted) == 0:
        return []
    if len(unsorted) == 1:
        return unsorted
    
    midpoint = len(unsorted)//2 ## We need to get an integer
    
    left = mergesort(unsorted[:midpoint])
    right = mergesort(unsorted[midpoint:])
    
    #print(merge(left,right))
    return merge(left,right)

In [16]:
unsorted = [23,1,45,32,90,87,78,78,78]
mergesort(unsorted)

[1, 23, 32, 45, 78, 78, 78, 87, 90]

### Quicksort

### Space Inefficient way of quicksort

In [53]:
def quicksort(unsorted):
    if len(unsorted) <= 1:
        return unsorted
    
    pivot = unsorted[-1]
    left = [] ### Try implement quicksort where we do in position
    right = []

    for i in range(len(unsorted)-1):
        if unsorted[i] < pivot:
            left.append(unsorted[i])
        else:
            right.append(unsorted[i])

    return quicksort(left) + [pivot] + quicksort(right)

In [55]:
unsorted = [23,1,45,32,90,87,78,78]
quicksort(unsorted)

[1, 23, 32, 45, 78, 78, 87, 90]

In [10]:
def quicksort2(unsorted):
    if len(unsorted) <= 1:
        return unsorted
    
    pivot = unsorted[-1]
    i = 0
    j = len(unsorted)-2
    
    while j >= i:
        while unsorted[i] < pivot:
            i += 1
        while unsorted[j] >= pivot:
            j -= 1
        unsorted[i], unsorted[j] = unsorted[j], unsorted[i]
    unsorted[-1] = unsorted[j]
    unsorted[j] = pivot

    return quicksort2(unsorted[:j]) + [pivot] + quicksort2(unsorted[j+1:])
        

In [11]:
unsorted = [23,1,45,32,90,87,78,78]
quicksort2(unsorted)

[23, 45, 45, 78, 87, 87, 90, 90]