### Kruskal's Algorithm

- Given a weighted graph, find the minimum spanning tree

- Kruskal's Algorithm is a greedy algorithm. In general, it will
    - Sort all edges in non-decreasing order
    - Each time a new edge does not form a cycle, add the edge
    - Repeat until there are $V-1$ edges in the tree

### Example Walkthrough

- Imagine the following graph

In [12]:
import networkx as nx
G = nx.Graph()
G.add_edge(0, 1, weight=4)
G.add_edge(0, 7, weight=8)
G.add_edge(1, 2, weight=8)
G.add_edge(1, 7, weight=11)
G.add_edge(2, 3, weight=7)
G.add_edge(2, 5, weight=4)
G.add_edge(2, 8, weight=2)
G.add_edge(3, 4, weight=9)
G.add_edge(3, 5, weight=14)
G.add_edge(4, 5, weight=10)
G.add_edge(5, 6, weight=2)
G.add_edge(6, 7, weight=1)
G.add_edge(6, 8, weight=6)
G.add_edge(7, 8, weight=7)
# nx.draw_networkx_edge_labels(G, pos=nx.spring_layout(G))
# nx.draw_networkx(G)

- We represent every edge as a tuple `(from_node, to_node, edge_weight)`
- Sort every edge in ascending order by weight

- Sorted edges
    - (6,7,1)
    - (2,8,2)
    - (5,6,2)
    - (0,1,4)
    - (2,5,4)
    - (6,8,6)
    - (2,3,7)
    - (7,8,7)
    - (0,7,8)
    - (1,2,8)
    - (3,4,9)
    - (4,5,10)
    - (1,7,11)
    - (3,5,14)

- Add edges from the top, so long as no cycle is formed
    - Adding (6,7,1)
        - (6,7)
    - Adding (2,8,2)
        - (2,8), (6,7)
    - Adding (5,6,2)
        - (2,8), (5,6,7)
    - Adding (0,1,4)
        - (2,8), (5,6,7), (0,1)
    - Adding (2,5,4)
        - (2,5,6,7,8), (0,1)
    - Adding (6,8,6)
        - 6 and 8 are already connected through a subgraph!!! Adding edge 6-8 will lead to cycle. **DO NOT ADD**
        - (2,5,6,7,8), (0,1)
    - Adding (2,3,7)
        - (2,3,5,6,7,8), (0,1)
    - Adding (7,8,7)
        - Already connected, do not add
        - (2,3,5,6,7,8), (0,1)
    - Adding (0,7,8)
        - (0,1,2,3,5,6,7,8)
    - Adding (1,2,8)
        - Already connected, do not add
        - (0,1,2,3,5,6,7,8)
    - Adding (3,4,9)
        - (0,1,2,3,4,5,6,7,8)
    - All $N$ nodes are connected, terminate

### Union Find: How do we tell if a cycle will be formed from adding an edge?

- Kruskal's algorithm requires us to tell whether adding a node will result in a cycle. How can we write a procedure to do this?
    - We rely on an algorithm called **Union Find**

- Idea
    - We maintain 2 arrays while iterating through all the edges
        - `parent`, representing the parent of each node. Initially, all nodes are parents of themselves
        - `rank`, representing the number of children the node has. Initially, all nodes have `rank = 0`

    - When adding some arbitrary edge $(i, j)$, we check if $i$ and $j$ have the same parent
        - That is, we check if `parent[i] = parent[j]`
        - If the parents are equal, then $i$ and $j$ are already connected, and we should not add the edge
        - If the parents are not equal, then we add $(i, j)$, but at the same time, we also need to update `parent[i]` and `parent[j]` to point to the same node
    
    - How do we update the `parent` array in the event the `parent[i] != parent[j]`?
        - We check `rank[i]` and `rank[j]`
        - If `rank[i] >= rank[j]`
            - update `parent[j] = i` 
            - update `rank[i] = rank[i] + rank[j]`
        - Else
            - update `parent[i] = j`
            - update `rank[j] = rank[j] + rank[i]`

- Sub-Idea: Path Compression
    - In the Union-Find process, it is very likely that we come acrossa case where the parent of a node is not directly connected to it
        - That is, suppose we have this case: 
            - parent[i] = j, parent[j] = k, parent[k] = root
        - Then `root` is the parent of `i`, but to get to `root`, we need to make 3 jumps
        - How do we know we've reached the root? When the node is parent of itself. That is, `parent[root] == root`
    
    - To make things more efficient, we want to try to shorten the number of redundant jumps we need to get to the parent

    - So whenever we come across a case where `parent[i] != parent[parent[i]]` (i.e. the parent of the node is not the grandparent), this means that there is no direct connection between the node and its eventual parent
        - So before we recursively find the root node, we set `parent[i] = parent[parent[i]]`, so that we skip 1 hop in future 
        - Graphically, imagine we have A --> B --> C --> D
            - Then immediate parent of D is C, immedate parent of C is B etc. 
            - So we know `parent[D] = C`, and we check that `parent[C] != C`, we perform path compression
            - `parent[D] = parent[parent[D]] = parent[C] = B`
            - So the new graph is A --> B --> C / D

### Code Implementation

In [66]:
import numpy as np
import networkx as nx
G = nx.Graph()
G.add_edge(0, 1, weight=4)
G.add_edge(0, 7, weight=8)
G.add_edge(1, 2, weight=8)
G.add_edge(1, 7, weight=11)
G.add_edge(2, 3, weight=7)
G.add_edge(2, 5, weight=4)
G.add_edge(2, 8, weight=2)
G.add_edge(3, 4, weight=9)
G.add_edge(3, 5, weight=14)
G.add_edge(4, 5, weight=10)
G.add_edge(5, 6, weight=2)
G.add_edge(6, 7, weight=1)
G.add_edge(6, 8, weight=6)
G.add_edge(7, 8, weight=7)

In [79]:
inputs = [
    (6,7,1), (2,8,2),(5,6,2),(0,1,4),(2,5,4),(6,8,6),
    (2,3,7),(7,8,7),(0,7,8),(1,2,8),(3,4,9),(4,5,10),
    (1,7,11),(3,5,14)
]

def union(parents, rank, f, t):
    if rank[parents[f]] >= rank[parents[t]]:
        parents[parents[t]] = f
        rank[parents[f]] += rank[parents[t]]
    else:
        parents[parents[f]] = parents[t]
        rank[parents[t]] += rank[parents[f]]

    return parents, rank

def find_parent(i, parents):
    if parents[i] != i:
        parents[i] = find_parent(parents[i], parents)
    
    return parents[i]
       
def kruskals_algorithm(inputs: list[tuple[int,int,int]], n_vertices: int) -> list[tuple[int, int, int]]:
    sorted_inputs = sorted(inputs, key=lambda x: x[2])
    res = []

    parents = list(range(n_vertices))
    rank = [1] * n_vertices

    for (f,t,e) in sorted_inputs:
        # print('='*50)
        # print(f,t,e)
        # print(parents)
        # print(rank)
        if find_parent(f, parents) == find_parent(t, parents):
            continue
        
        res.append((f,t,e))
        parents, rank = union(parents, rank, f, t)
        
        # print(parents)
        # print(rank)
        # print(res)

    return res
    
kruskals_algorithm(inputs, 9)

[(6, 7, 1),
 (2, 8, 2),
 (5, 6, 2),
 (0, 1, 4),
 (2, 5, 4),
 (2, 3, 7),
 (0, 7, 8),
 (3, 4, 9)]

### Time Complexity

- Time complexity
    - Let the number of input edges be $E$
    - We sort the edges in $E \log E$ time

    - Then, looping through all edges in $O(E)$ time
        - Find parent in amortized $O(1)$ time
        - Perform union in $O(1)$ time

    - Total time complexity is E + E \log E = $O(E \log E)$

- Space complexity
    - We use $O(E)$ space to hold the sorted edges
    - We use $O(V)$ space to hold the `parents` and `rank` arrays
    - So total space complexity is $O(E+V)$