# Priority Queues and Disjoint Sets

## Priority Queue
**Definition:** A generalisation of a queue where each element is assigned a priority and elements come out in order by priority.

#### Priority Queue Operations
| Operation      | Output | Time (sorted array/list)   | Time (unsorted array/list) | Time (binary heap)|
| :-            | :-    | :-:    |  :-: |:-:
`Insert(p)` |   add new element with priority `p`  | $O(1)$ | $O(n)$ | $O(\log n)$
`ExtractMax()` |   extract an element with maximum priority`  | $O(n)$ | $O(1)$| $O(\log n)$
`Remove(it)` |   remove an element pointed by an iterator `it`  | $O(1)$ | $O(1)$ | $O(\text{tree height})$
`GetMax()` |   return an element with maximum priority (without changing the set of elements)  | $O(1)$| $O(n)$ | $O(1)$
`ChangePriority(it, p)` |   change the priority of an element pointed by `it` to `p`  ||| $O(\text{tree height})$
*`SiftUp()` |     ||| $O(\text{tree height})$
*`SiftDown()` |    ||| $O(\text{tree height})$

$O(\text{tree height}) = O(\log n)$

**Algorithms that use Priority Queues**
* Dijkstra's algorithm: finding a shortest path in a graph
* Prim's algorithm: constructing a minimum spanning tree of a graph
* Huffman's algorithm: constructing an optimum prefix-free encoding of a string
* Heap sort: sorting a given sequence

## Binary Max-Heap
**Definition:** A binary tree (each node has zero, one or two children) where the value of each node is **at least** the values of its children.  
I.e. for each edge of the tree, the value of the parent is at least the value of the child.

<p align="center">
    <img src="images\binary_max_heap.png" width="450" style="display: inline-block; margin-right: 0px;">
</p>

#### How to Keep a Tree Shallow?
A binary tree is **complete** if all its levels are filled except possibly the last one which is filled from left to right.

**Binary Tree Advantage 1 (LEMMA):**  
A complete binary tree with $n$ nodes has height at most $O(\log n)$.

**Binary Tree Advantage 2:**  
Can store a binary tree as an array.


To ensure the tree stays complete,
1. `Insert` an element as a leaf in the **lefmost vacant position in the last level** and let it sift up.
2. `Extract` the maximum value by replacing the root with **the last leaf** and let it sift down.

## Heap Sort

Heap Sort is a comparison-based algorithm.  
Running time: $O(n \log n)$

In [3]:
def HeapSort(A):
    # create an empty priority queue
    # for i from 1 to n:
        # Insert(A[i])
    # for i from n downto 1:
        # A[i] = ExtractMax()
    ...

**Disadvantage:** Not in-place, i.e. uses additional space to store the priority queue.

**Solution:** Turn array into a heap. (Running time: $O(n \log n)$)

In [4]:
def BuildHeap(A):
    # size = n
    # for i from floor(n/2) downto 1:
        # SiftDown(i)
    ...

def InPlaceHeapSort(A):
    # BuildHeap(A)          {size=n}
    # repeat (n-1) times:
        # swap A[1] and A[size]
        # size = size - 1
        # SiftDown(1)
    ...

**In-place Heap Sort:** worst case $O(n \log n)$  
**Quick sort:** average $O(n \log n)$, worst case $O(n^2)$.  

Popular approach in practice: **Introsort algorithm**.
1. run quicksort
2. if recursion depth exceeds $c \log n$, stop quicksort
3. run heap sort

### Partial Sorting

**Input:** An array $A[1...n]$ and integer $k \in [1, n]$.

**Output:** The last $k$ elements of a sorted version of $A$.

Running time: $O(n+k \log n) = O(n)$ if $k = O(\frac{n}{\log n})$.

In [5]:
def PartialSorting(A, k):
    # BuildHeap(A)
    # for i from 1 to k:
        # ExtractMax()
    ...

## Binary Min-Heap
**Definition:** A binary tree (each node has zero, one or two children) where the value of each node is **at most** the values of its children.  

## $d$-ary Heap
**Definition:** A heap in which nodes on all levels except possibly the last one have exactly $d$ children.  
* Height: $\approx \log_d n$
* Running time of `SiftUp`: $O(\log_d n)$
* Running time of `SiftDown`: $O(d \log_d n)$ (on each level, we find the largest value amond $d$ children.)

## Disjoint-set data structure
**Supports the following operations:**

| Operation      | Output | Time (array)   | Time (linked-list) | Time (tree) | Time (tree) with <br> • rank heuristic <br> • path compression
| :-            | :-    | :-:    | :-: |  :-: |  :-: |
`MakeSet(x)` |   create a singleton set $\{x\}$  | $O(1)$ | $O(1)$ | $O(1)$ | $O(1)$ 
`Find(x)` |   return the ID of the set containing $x$: <br> • if $x$ and $y$ lie in the same set, then <br> `Find(x)` $=$ `Find(y)` <br> • otherwise, `Find(x)` $\neq$ `Find(y)` | $O(1)$ | $O(n)$ | $O(\log n)$ | $O(m \log^* n) = O(1)$ 
`Union(x, y)` |   merge two sets containing `x` and `y`  | $O(n)$ | $O(1)$ | $O(\log n)$ | $O(m \log^* n) = O(1)$ 

Note, for practical values of $n$, $\log^* n \leq 5$.

**Example Use-case:** Find whether there exists a path from point $A$ to point $b$ in a maze.  
(Apply `MakeSet` to each cell in the maze then repeatedly apply `Union` to create disjoint regions.)

### Basic structures for disjoint sets
**ARRAY**  
Cons:  
* `Union(x,y)` works in time $O(n)$

**LINKED LIST**  
Pros:
* Running time of `Union` is $O(1)$
* well-defined ID  

Cons:
* Running time of `Find` is $O(n)$ as we need to traverse the list to find its tail.
* `Union(x,y)` works in time $O(1)$ **only** if we can get the tail of the list of `x` and the head of the list of `y` in constant time!

**TREES**  
<p align="left">
    <img src="images\disjoint_set_tree.png" width="450" style="display: inline-block; margin-right: 0px;">
</p>