### Quick Sort

- Quick sort is a comparison-based sorting algorithm, and is $O(n \log(n))$ run time on average
- The idea of quick sort is quite simple, but takes some getting used to. We'll talk through this using an example
    - Imagine we have an input array [6,4,2,3,9,8,9,4,7,6,1]
    - We first need to choose a value to sort against, known as the **pivot**. 
        - We will iterate through this list, and compare all the other values to this pivot
        - At the end of this process, we want to end up with 2 regions. One of the regions will have all values $v \le \text{pivot}$, and the other will have values $v \gt \text{pivot}$
        - Finally, we place our pivot in the middle of these 2 regions, and the pivot is now in the correct position
        - Call quicksort on the left and right regions
- Intuition and issues
    - Quicksort is faster because we are iteratively halving our sort space. Though the first sort requires a full scan of $n$ elements, we are recursively halving the comparisons, thus creating the $\log(n)$
    - However, quicksort performance is very dependent on your pivot choice. It is only $O(n \log(n))$ on average. In the worst case, it is $O(N^2)$, which is basically selection sort
    - To see how it is $O(N^2)$ in the worst case, we can simply imagine a list that is already sorted, and we pick the pivot from the end of the list
    - So instead of halving the list at each iteration, we are simply removing the last element

In [10]:
def quicksort(array, left_inclusive, right_inclusive, verbose=False, log_recurs=0):
    '''
    Time complexity: O(N log(N)) on average, O(N^2) in worst case
    Space complexity: O(N) because we are just permuting the same input array
    '''

    indent = ' '*3*log_recurs
    if verbose:
        print(indent + '='*50)
        print(indent + f'Calling {array=} from {left_inclusive=} to {right_inclusive=}')

    # If left and right indices are the same, then array is already sorted
    if left_inclusive >= right_inclusive:
        return array
    
    # take last value as the pivot value
    pivot = array[right_inclusive]
    if verbose:
        print(indent + f'Pivot value from index {right_inclusive=} is {array[right_inclusive]=}')

    # Loop from leftmost index to second last value, since last value is pivot
    left_segment_end_index = left_inclusive
    if verbose:
        print(f"{indent} {array=}")
    for index in range(left_inclusive, right_inclusive):
        if verbose:
            print(indent + f'Comparing {array[index]=} with {pivot=}')
        if array[index] <= pivot:
            if verbose:
                print(indent + f'{array[index]=} is less than {pivot=}')
            array[index], array[left_segment_end_index] = array[left_segment_end_index], array[index]
            left_segment_end_index+=1
        if verbose:
            print(f"{indent} {array=}")
    array[right_inclusive], array[left_segment_end_index] = array[left_segment_end_index], array[right_inclusive]
    if verbose:
        print(f"{indent} {array=}")
    
    array = quicksort(array, left_inclusive, left_segment_end_index-1, log_recurs=log_recurs+1, verbose=verbose)
    array = quicksort(array, left_segment_end_index+1, right_inclusive, log_recurs=log_recurs+1, verbose=verbose)
    return array

    
array = [10,4,9,1,8,3,7,2,5]
quicksort(array,0,len(array)-1, verbose=False)
    

[1, 2, 3, 4, 5, 7, 8, 9, 10]

- We can increase the practical running time of quicksort by randomising the pivot instead of taking an arbitrary rightmost/leftmost element. With randomness, we are more likely to get even partitions, and thus get a $O(N \log(N))$ run time

In [11]:
import numpy as np

def randomised_quicksort(array, left_inclusive, right_inclusive, verbose=False, log_recurs=0):
    '''
    Time complexity: O(N log(N)) on average, O(N^2) in worst case
    Space complexity: O(N) because we are just permuting the same input array
    '''
    indent = ' '*3*log_recurs
    if verbose:
        print(indent + '='*50)
        print(indent + f'Calling {array=} from {left_inclusive=} to {right_inclusive=}')

    # If left and right indices are the same, then array is already sorted
    if left_inclusive >= right_inclusive:
        return array
    
    # take last value as the pivot value
    pivot_index = np.random.randint(left_inclusive, right_inclusive, 1)[0]
    pivot = array[pivot_index]
    if verbose:
        print(indent + f'Pivot value from index {pivot_index=} is {array[pivot_index]=}')

    # Loop from leftmost index to second last value, since last value is pivot
    array[pivot_index], array[right_inclusive] = array[right_inclusive], array[pivot_index]
    left_segment_end_index = left_inclusive
    if verbose:
        print(f"{indent} {array=}")
    for index in range(left_inclusive, right_inclusive):
        if verbose:
            print(indent + f'Comparing {array[index]=} with {pivot=}')
        if array[index] <= pivot:
            if verbose:
                print(indent + f'{array[index]=} is less than {pivot=}')
            array[index], array[left_segment_end_index] = array[left_segment_end_index], array[index]
            left_segment_end_index+=1
        if verbose:
            print(f"{indent} {array=}")
    array[right_inclusive], array[left_segment_end_index] = array[left_segment_end_index], array[right_inclusive]
    if verbose:
        print(f"{indent} {array=}")
    
    array = randomised_quicksort(array, left_inclusive, left_segment_end_index-1, log_recurs=log_recurs+1, verbose=verbose)
    array = randomised_quicksort(array, left_segment_end_index+1, right_inclusive, log_recurs=log_recurs+1, verbose=verbose)
    return array

array = [10,4,9,10,1,8,1,3,7,1,2,5]
randomised_quicksort(array,0,len(array)-1, verbose=False)

[1, 1, 1, 2, 3, 4, 5, 7, 8, 9, 10, 10]

- What happens to quicksort when all elements are equal to each other?
    - It becomes $O(N^2)$!
    - Because regardless of what element you choose, you will never split it into 2 even arrays, so you can't make use of the recursion fully

- To overcome this issue, we can modify our partitioning approach
    - Instead of maintaining 2 regions (left region < pivot and right region > pivot), we maintain 3 instead!
    - left region where values < pivot
    - middle region where values == pivot
    - right region whre values > pivot

In [65]:
import numpy as np

def quicksort_3waypartition(array, left_index, right_index, verbose=False, log_recurs=0):
    indent = ' ' * 3 * log_recurs
    if left_index >= right_index:
        return array

    if verbose:
        print('='*50)
        print(indent + f"{array=}, {left_index=}, {right_index=}")

    pivot_index = np.random.randint(left_index, right_index, 1)[0]
    pivot = array[pivot_index]
    array[pivot_index], array[right_index] = array[right_index], array[pivot_index]
    if verbose:
        print(indent + f"{pivot_index=}, {pivot=}, {array=}")

    after_left_array_end_index = left_index
    index = left_index
    right_array_start_first_index = right_index

    while (index < right_array_start_first_index):
        if verbose:
            print(indent + f"Start loop {index}: {array=}, {index=}, {array[index]=}, {pivot=}, {after_left_array_end_index=}, {right_array_start_first_index=}")
        if array[index] < pivot:
            array[index], array[after_left_array_end_index] = array[after_left_array_end_index], array[index]
            after_left_array_end_index += 1
            index += 1
        elif array[index] > pivot:
            array[index], array[right_array_start_first_index-1] = array[right_array_start_first_index-1], array[index]
            right_array_start_first_index -= 1
        elif array[index] == pivot:
            index += 1
        if verbose:
            print(indent + f"End loop {index}: {array=}, {index=}, {array[index]=}, {pivot=}, {after_left_array_end_index=}, {right_array_start_first_index=}")

    if verbose:
        print(indent + f"Post loop {array=}, {after_left_array_end_index=}, {right_array_start_first_index=}")

    array[right_index], array[right_array_start_first_index] = array[right_array_start_first_index], array[right_index]
    if verbose:
        print(indent + f"Post loop post swap {array=}")

    array = quicksort_3waypartition(array, left_index, after_left_array_end_index-1, verbose=verbose, log_recurs=log_recurs+1)
    if verbose:
        print(indent + f"Recurs left {array=}")

    array = quicksort_3waypartition(array, right_array_start_first_index+1, right_index, verbose=verbose, log_recurs=log_recurs+1)
    if verbose:
        print(indent + f"Recurs right {array=}")

    return array

array = [10,1,5,9,2,5,8,3,5,7,4,5]
# array = [10,1,5,9,5]
quicksort_3waypartition(array, 0, len(array)-1, verbose=False)

### Space complexity of quicksort

- We already know that quicksort has an average time complexity of $O(N log(N))$ and worst case time complexity of $O(N^2)$
- What about space complexity?
    - The `partition` process (select pivot, loop to compare values) takes $O(N)$ time, but takes $O(1)$ space! Because you don't store any additional information that scales with the size of the input
    - But the recursive calls to left and right also takes some space on the call stack! So even though the quicksort algorithm just modifies the original input array, it is recursively creating storing values on the call stack
    - In the worst case, when the array is already sorted (i.e. in the $O(N^2)$ time complexity case), the recursion has to be called $N$ times, leading to $N$ copies of the array being made in the stack
    - So because of these copies, out-of-the-box quicksort is takes $O(N)$ space!!

- However, there is a way to modify quicksort so that is only takes $O(\log(N))$ space on the call stack instead of $O(N)$!
    - Instead of calling the quicksort recursively for both the left and right arrays at each step, we can amend out recursive calls to call the shorter of the 2 halves instead
    - After the short of the two halves is process, we continue with quicksort on the longer half

- Why is this helpful? Because if we call the arrays recursively in this manner, we are guaranteed that the recursion depth goes to AT MOST $O(\log(N))$! Because if we always pick the shorter half, the worst case for space complexity happens when we split the array exactly in 2 every step. So the call stack goes up to (at most) $\log(N)$

#### Tail call optimisation

- Tail call optimisation happens when we perform put tail recursion into our quicksort algorithm
    - What is tail recursion? This is when the recursive call is the last thing executed by the algorithm

- How does tail recursion help? Let's imagine a recursive Fibonaaci function, and we use it to compute `fib(4)`
```
def fib(n):
    if n == 0:
        return 0
    if n in [1,2]:
        return 1
    return fib(n-1) + fib(n-1)
```

- To compute `fib(4)`, we take `fib(3)` + `fib(2)`
    - To compute `fib(3)`, we take `fib(2)` + `fib(1)`
    - To compute `fib(2)`, we take `fib(1)` + `fib(0)`
    - Notice that, to compute `fib(4)`, the computer needs to hold the call tree of `fib(3)` and `fib(2)` in memory
    - Practically, every time we make a recursive call, a **stack frame** to the computer's **call stack**. If too many frames are added (e.g. when too many recursive calls are made), then we have a **stack overflow**.

- However, some compilers have optimisations when a method is tail recursive! Because your function returns a recursive call, there is nothing else left to do in the function, so you can return to the parent frame! This means that (again, only for some compilers) you never have additional stack frames, which minimises the memory use!

- Unfortunately, Python doesn't offer this capability. But we can still pretend it does, and implement a factorial with tail recursion.
    - Generally, tail recursion makes use of an `accumulator` so that you realise the change in the value in the same function, rather than requiring an in-memory compute
    - Here, the term `a+b` serves as the accumulator. So at each recursion step, we already pass the Fib value accumulated to that point into the function

- For better explanation: https://www.youtube.com/watch?v=XMBgja5u70M

```
def fib_tail_recurs(n, a, b):
    if n == 0:
        return a
    if n == 1:
        return b
    return fib_tail_recurs(n-1, b, a+b)
```

In [92]:
def fib_tail_recurs(n, a=0, b=1):
    if n == 0:
        return a
    if n == 1:
        return b
    return fib_tail_recurs(n-1, b, a+b)

fib_tail_recurs(10)

55

### Intro Sort

- Introsort is like a chimera of a few sorting algorithms. We give an overview here, but won't provide a full implementation

- Basically, we make use of Quicksort's faster average case performance, but in the event that the recursion depth exceeds some value, we switch to heap sort to avoid quicksort's worst case performance

- This creates a $O(N \log N)$ worst case