## Sorting

### Introduction to Sorting
* sorting algorithms are all about rearranging elements in a collection based on a common characteristic of those elements
* An ordering relation has two key properties: 1. Given two elements a and b, exactly one of the following must be true: 
  + It must be true that a<b, a=b, or a>b ( Law of Trichotomy ) 
  + If a<b and b<c, then a<c ( Law of Transitivity )
* A sort is formally defined as a rearrangement of a sequence of elements that puts all elements into a non-decreasing order based on the ordering relation  
* The ordering relation practically is defined as a method of comparison in programming languages. Most programming languages allow you to pass in custom functions for comparison whenever you want to sort a sequence of elements as shown in the following python code
```python
    class Solution:
        def sort_by_length(self, lst: List[str]) -> None:
            """
            Sorts a list of strings by the length of each string
            """        
            lst.sort(key=lambda x: len(x)) # Note we can also do lst.sort(key=len)
```        

* inversion
  + An inversion in a sequence is defined as a pair of elements that are out of order with respect to the ordering relation
  + example: in the list of \[“are”, “we”, “sorting”, “hello”, “world”, “learning”\], the following inversions have the opposite order of string lenghths:
    + (“are”, “we”), (“sorting”, “hello”), and (“sorting”, “world”)
  + a sorting algorithm is a sequence of operations that reduces inversions to 0
  
* stability of sorting algorithms
  + The key feature of a stable sorting algorithm is that it will preserve the order of equal elements
  + example: 
    + in the original list of \[“hello”, “world”, “we”, “are”, “learning, “sorting”\], there are two valid sorts: 
    1. \[“we”, “are”, “hello”, “world”, “sorting”, “learning”\]
    2. \[“we”, “are”, “world”, “hello”, “sorting”, “learning”\] 
    + the first sort is considered as a stable sort since the equal elements "hello" and "world" are kept in the same relative order as the original sequence
    
#### Exercise
1. Give the following array of strings \['hello', 'your', 'above', 'year', 'alone', 'friendly', 'crazy'\] where the ordering relation is the length, what is the stable sort 
  + solution: \['your', 'year', 'hello', 'above', 'alone', 'crazy', 'friendly'\]
  + explanation: all the four letter words and five letter words are in the same order as in original list
  
2. How many inversions exist in the following list of integers: \[3, 4, 6, 5, 2\] 
  + solution: 3 > 2, 4 > 2, 6 > 2, 5 > 2, 6 > 5 (altogether 5 inversions)
  
3. Which of the following are the key parts of an ordering relation? (Select all that apply)
  + It must be true that a<b, a=b, or a>b ( Law of Trichotomy ) 
  + If a<b and b<c, then a<c ( Law of Transitivity )

### Comparison Based Sort
* Comparison based sorts are sorting algorithms that require a direct method of comparison defined by the ordering relation

#### Selection Sort
* Selection sort will build up the sorted list by repeatedly finding the minimum element in that list and moving it to the front of the list through a swap.
* not a stable sorting algorithm
* time complexity
  + O(n^2) in the worst case. We have to search the entire array to find the minimum element of each position
* space complexity
  + O(1)  

In [2]:
# implementation of selection sort
from typing import List
class Solution:
    def selection_sort(self, lst: List[int]) -> None:
        """
        Mutates lst so that it is sorted via selecting the minimum element and
        swapping it with the corresponding index
        """
        for i in range(len(lst)):
            min_index = i
            for j in range(i + 1, len(lst)):
                # Update minimum index
                if lst[j] < lst[min_index]:
                    min_index = j

            # Swap current index with minimum element in rest of list
            lst[min_index], lst[i] = lst[i], lst[min_index]

#### Leetcode 75 Sort colors
* overview
  + Given an array nums with n objects colored red, white, or blue, sort them in-place so that objects of the same color are adjacent, with the colors in the order red, white, and blue.
  + We will use the integers 0, 1, and 2 to represent the color red, white, and blue, respectively.
  + You must solve this problem without using the library's sort function.
* Algorithm
  + apply quick partition to sort the array to different sections: section of 0s, section of 1s and section of 2s
  + initialize three pointers, index, left and right
  + left will point to the first non-zero element from the left if index > left
  + for rigt pointer, every element right to it will be 2, but the element it points to may or may not be 2, because of this, we need to check when index == right to ensure every element is checked
  + index starts from 0, and if its value is 0, swap the element at index and left, and then increment both left and index
  + if the index element has a value of 1, increment the index
  + if the index element has a value of 2, decrement the right
  + jump out of the while loop if index > right
* time complexity:
  + O(N)
* space complexity
  + O(1)

#### Bubble sort
* compare each pair of neighboring elements, and swap them if they are out of order
  + at the beginning, we define the target position as the right most element (n-1), and we find the max element of the array for that position by comparing element to its right neighbor, starting from index 0 to n-2, if an element is bigger than its next neighbor, we swap them
  + we then focus on the range excluding the right most position, and find the element to its left, and so on until we find elements for all the positions in the array
  + a stable sort algorithm
* time comlextiy:
  + O(N^2)
* space complexity
  + O(1)  

In [3]:
def bubble_sort(lst: List[int]) -> None:
    if not lst or len(lst) == 1:
        return
    
    n = len(lst)
    swap = True
    for i in range(n-1, 0, -1):
        swap = False
        for j in range(i):
            if lst[j] > lst[j+1]:
                swap = True
                lst[j], lst[j+1] = lst[j+1] , lst[j]
        if not swap:
            break
       

In [6]:
arr1 = [8, 7, 6, 5, 4, 3, 2]
arr2 = [1,2, 5, 10, 9, 8]
bubble_sort(arr2)
print(arr2)

[1, 2, 5, 8, 9, 10]


#### Leetcode Height Checker
* overview
+ A school is trying to take an annual photo of all the students. The students are asked to stand in a single file line in non-decreasing order by height. Let this ordering be represented by the integer array expected where expected\[i\] is the expected height of the ith student in line.
+ You are given an integer array heights representing the current order that the students are standing in. Each heights\[i\] is the height of the ith student in line (0-indexed).
+ Return the number of indices where heights\[i\] != expected\[i\].
* Algorithm
  + sort the list in a new list
  + traverse the original and new list and increment results if their values don't match
* time complexity
  + O(NlogN)
* space complexity
  + O(N)

#### Insertion sort
* traverse from the first element, and find its position by comparing its value to its left elements. If its value is smaller than the left element, swap them until its value is bigger than the element left to it
* each element may exhaust all the elments to its left, so each traverse is O(N), and we have N elements, so the algorithm is O(N*2)
* It is a stable sorting algorithm
* fast on almost sorted arrays due to the small number of swaps required
* best choice for small arrays, not a good option for big arrays with many inversions

In [7]:
class Solution:
    def insertion_sort(self, lst: List[int]) -> None:
        """
        Mutates elements in lst by inserting out of place elements into appropriate
        index repeatedly until lst is sorted
        """
        for i in range(1, len(lst)):
            current_index = i

            while current_index > 0 and lst[current_index - 1] > lst[current_index]:
                # Swap elements that are out of order
                lst[current_index], lst[current_index - 1] = lst[current_index - 1], lst[current_index]
                current_index -= 1

#### Leetcode 147. Insertion Sort List
* Overview
  + Given the head of a singly linked list, sort the list using insertion sort, and return the sorted list's head.
  + The steps of the insertion sort algorithm:
    + Insertion sort iterates, consuming one input element each repetition and growing a sorted output list.
    + At each iteration, insertion sort removes one element from the input data, finds the location it belongs within the sorted list and inserts it there.
    + It repeats until no input elements remain.
    + The following is a graphical example of the insertion sort algorithm. The partially sorted list (black) initially contains only the first element in the list. One element (red) is removed from the input data and inserted in-place into the sorted list with each iteration.
    
* Algorithm
  + if head is None or head.next is None, return head
  + initialize a dummy node whose next point to head
  + initialize curr = head
  + while curr.next, compare curr.val to curr.next.val
    + if curr.val < curr.next.val, curr=curr.next
    + otherwise, set tmp = curr.next, and bypass it by curr.next = curr.next.next, and then find the position of curr from head
      + if curr.val <= head.val, insert it before head
        + tmp.next = head
        + dummy.next = tmp
      + otherwise, while tmp.val < head.next, head = head.next, then out of while loop
        + tmp.next = head.next and head.next = tmp
      + reset head = dummy.next each time after reordering the position of curr.next
  + return head 
* time complexity
  + O(N^2)
* space complexity
  + O(1)              

#### Heap sort
* not a stable sort
* time complexity
  + O(NlogN)
* space complexity
  + O(1)
* In practice, this algorithm performs worse than other O(NlogN) sorts as a result of bad cache locality properties. 
  + Heapsort swaps elements based on locations in heaps, which can cause many read operations to access indices in a seemingly random order, causing many cache misses, which will result in practical performance hits.  
  
* Algorithm
  + convert the input array to a max heap by the following steps:
    1. Start from the end of the array (bottom of the binary tree).
    2. There are two cases for a node
      + It is greater than its left child and right child (if any).In this case, proceed to next node (one index before current array index)
      + There exists a child node that is greater than the current node. In this case, swap the current node with the child node. This fixes a violation of the max-heap property
        + Repeat the process with the node until the max-heap property is no longer violated
    3. Repeat step 2 on every node in the binary tree from bottom-up.
    + A key property of this method is that by processing the nodes from the bottom-up, once we are at a specific node in our heap, it is guaranteed that all child nodes are also heaps. 
  + Once we have “heapified” the input, we can begin using the max-heap to sort the list. To do so, we will:
    1. Take the maximum element at index 0 (we know this is the maximum element because of the max-heap property) and swap it with the last element in the array (this element's proper place).
    2. We now have sorted an element (the last element). We can now ignore this element and decrease heap size by 1, thereby omitting the max element from the heap while keeping it in the array.
    3. Treat the remaining elements as a new heap. There are two cases:
      + The root element violates the max-heap property
        + Sink this node into the heap until it no longer violates the max-heap property. Here the concept of "sinking" a node refers to swapping the node with one its children until the heap property is no longer violated.
      + The root element does not violate the max-heap property
        + Proceed to step (4)
    4. Repeat step 1 on the remaining unsorted elements. Continue until all elements are sorted.
  

In [9]:
class Solution:
    def heap_sort(self, lst: List[int]) -> None:
        """
        Mutates elements in lst by utilizing the heap data structure
        """
        def max_heapify(heap_size, index):
            left, right = 2 * index + 1, 2 * index + 2
            largest = index
            if left < heap_size and lst[left] > lst[largest]:
                largest = left
            if right < heap_size and lst[right] > lst[largest]:
                largest = right
            if largest != index:
                lst[index], lst[largest] = lst[largest], lst[index]
                max_heapify(heap_size, largest)

        # heapify original lst
        for i in range(len(lst) // 2 - 1, -1, -1):
            max_heapify(len(lst), i)

        # use heap to sort elements
        # note that we only traverse parent nodes (index in [0, n-1])
        for i in range(len(lst) - 1, 0, -1):
            # swap last element with first element
            lst[i], lst[0] = lst[0], lst[i]
            # note that we reduce the heap size by 1 every iteration
            max_heapify(i, 0)

#### Leetcode 912. Sort an Array
* Overview
  + Given an array of integers nums, sort the array in ascending order and return it.
  + You must solve the problem without using any built-in functions in O(nlog(n)) time complexity and with the smallest space complexity possible.
* Algorithm
  + heap sort is O(nlogn). This implemetation only use the heapify down  function
  + we don't use heapify up function. We first find the left and right child indices, and set the largest as the index, after checking the left and right indices exist in the current size, we find the largest index. if largest index != index, we swap the largest element with the index element, and heapify down the largest. If largest==index, we just return    

In [10]:
# heap sort implementation
class Solution:
   
    def sortArray(self, nums: List[int]) -> List[int]:
        if not nums or len(nums) == 1 or nums.count(nums[0])==len(nums):
            return nums
        
        def heapify(size:int, index:int)-> None:
            left = 2 * index + 1
            right = 2* index + 2
            
            largest = index
            if left < size and nums[left] > nums[largest]:
                largest = left
            if right < size and nums[right] > nums[largest]:
                largest = right
            if largest != index:
                nums[index], nums[largest] = nums[largest], nums[index]
                heapify(size, largest)
            
        # to sort, we first heapify all the array elements, by starting from 
        # the index of n//2 -1, back to 0 (elements with index > n/2-1 don't have children)
        
        for i in range(len(nums)//2-1, -1, -1):
            heapify(len(nums), i)
            
        # when sort, we start from n-1 (last index back to 0)
        # we first swap the current index with 0, then heapify(i, 0)
        # notice that the index is one less than the length. So heapify(i, 0)
        # reduce the size of array for heapify operation in each iteration
        
        for i in range(len(nums)-1, 0, -1):
            if nums[0] > nums[i]:
                nums[0], nums[i] = nums[i], nums[0]            
                heapify(0, i)            
        return nums         

In [None]:
class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:
        if not nums or len(nums) == 1 or nums.count(nums[0])==len(nums):
            return nums
        
        def partition(start: int, end: int) -> int:
            mid = start + (end-start) // 2
            target = nums[mid]
            nums[mid], nums[end] = nums[end], nums[mid]
            
            left = start
            while start < end:
                if nums[start] <= target:
                    nums[start], nums[left] = nums[left], nums[start]
                    left += 1
                    
                start += 1
                
            nums[left], nums[end] = nums[end], nums[left]
            return left
        
        def quick_sort(start: int, end: int) -> None:
            
            if start < end:
                if (nums[start:end+1]).count(nums[start]) == end-start+1:
                    return 
                index = partition(start, end)            
                quick_sort(start, index-1)
                quick_sort(index+1, end)
                
        quick_sort(0, len(nums)-1)  
        return nums
            
            

#### 215. Kth Largest Element in an Array
* Overview
  + Given an integer array nums and an integer k, return the kth largest element in the array.
  + Note that it is the kth largest element in the sorted order, not the kth distinct element.
  + You must solve it in O(n) time complexity.
* Algorithm
  + we can use quick partition as the first option
  + we can use heap
    + heapify the first k elements to keep a logk depth of the min heap
    + for the remaing elements, we heapreplace the element in min heap, if the element is bigger than the top element
    + return the heap top, note that the heap size is always kept at k

In [None]:
# heap sort implementation
class Solution:
    def findKthLargest(self, nums: List[int], k: int) -> int:
        if not nums or len(nums) < k:
            return 0
        
        min_heap = nums[:k]
        heapq.heapify(min_heap)
        
        for e in nums[k:]:
            if e > min_heap[0]:
                heapq.heapreplace(min_heap, e)
        
        return min_heap[0]

In [None]:
# quick sort implementation
class Solution:
    def findKthLargest(self, nums: List[int], k: int) -> int:
        
        def partition(start: int, end:int) -> int:
            left = start
            idx = random.randint(start, end)
            pivot = nums[idx]
            nums[idx], nums[end] = nums[end], nums[idx]
            
            idx = start
            while idx < end:
                if nums[idx] >= pivot:
                    nums[left], nums[idx] = nums[idx], nums[left]
                    left += 1
                idx += 1
            nums[left], nums[end] = nums[end], nums[left]
            return left
        
        def quick_find(start:int, end:int) -> int:
            if start < end:                
                idx = partition(start, end)
                if idx == k-1:
                    return nums[idx]
                elif idx > k-1:
                    return quick_find(start, idx-1)
                else:
                    return quick_find(idx+1, end)
            return nums[start]   
                
        return quick_find(0, len(nums)-1)        
            