### Boruvka Algorithm

- Recap
    - In Kruskal's, we sort edges from smallest to largest, and iterative add them so long as adding an edge doesn't create a cycle
    - In Prim's, we add edges to a min-heap, and iteratively add the next smallest connected edge to the graph so long as it doesn't form a cycle

- Boruvka's Algorithm is similar to Kruskals', in that we use Union-Find to identify next edge to add

- Idea
    - At the start, Boruvka's assumes every vertex is standalone components (i.e. if you have 10 vertices, then you have 10 components)
    - Then at each iteration, find the cheapest edge from every component to another component and join them
    - So every iteration lets you halve the number of components, which goes on until you reach 1 component


- **DOWNSIDE**
    - Boruvka's assumes that an MST exists
    - If none exists, the algorithm will not terminate correctly

### Example Walkthrough

- Imagine the following graph

In [None]:
import networkx as nx
G = nx.Graph()
G.add_edge(0, 1, weight=4)
G.add_edge(0, 7, weight=8)
G.add_edge(1, 2, weight=8)
G.add_edge(1, 7, weight=11)
G.add_edge(2, 3, weight=7)
G.add_edge(2, 5, weight=4)
G.add_edge(2, 8, weight=2)
G.add_edge(3, 4, weight=9)
G.add_edge(3, 5, weight=14)
G.add_edge(4, 5, weight=10)
G.add_edge(5, 6, weight=2)
G.add_edge(6, 7, weight=1)
G.add_edge(6, 8, weight=6)
G.add_edge(7, 8, weight=7)
# nx.draw_networkx_edge_labels(G, pos=nx.spring_layout(G))
# nx.draw_networkx(G)

- There are 9 nodes. All are assumed to be standalone components
    - Let's create a variable to track component counts `n_components`

- As per the usual union find approach, create:
    - An array `parents` of size $V$. At the start, each vertex is its own parent
    - An array `rank` of size $V$, with value 1

- Beyond the 2 arrays from union-find, we init an array that records the cheapest edge from every vertex, which we call `cheapest_edge`
    
- Iterate until `n_components` = 1
    
- Iteration 1: 
    - Compute `cheapest_edge` by iterating over every edge
        - For each edge, check parents of vertex 1 and vertex 2 
        - If equal, then vertices belong to the same group, ignore this edge
        - Otherwise, check the current record in `cheapest_edge` for vertex 1 parent and vertex 2 parent
            - If the weight in the array exceeds current weight, replace the entry in `cheapest_edge` with this edge
        - By doing this for every edge, you will end up with the `cheapest_edge` for all parent vertex.
        - Remember, we don't store the cheapest edge of every vertex, but every **parent** vertex, so the minimisation happens in 1 location
    
    - Iterate over every edge again
        - For each edge, check parents of vertex 1 and 2
        - if equal, then vertices belong to the same group, so edge can be ignored
        - Else
            - Add edge to `edges`
            - Add edge weight to `cumulative_weights`
            - perform union find on parents of vertex 1 and 2, so `parents` and `rank` are updated
            - since the edge is added, there must be 1 fewer component, decrement `n_components` by 1

- Keep this up until `n_components == 1`


### Code Implementation

In [15]:
inputs = [
    (6,7,1), (2,8,2),(5,6,2),(0,1,4),(2,5,4),(6,8,6),
    (2,3,7),(7,8,7),(0,7,8),(1,2,8),(3,4,9),(4,5,10),
    (1,7,11),(3,5,14)
]
n_vertices = 9

n_components = n_vertices
def find_parent(vertex, parents):
    if parents[vertex] == vertex:
        return vertex

    parents[vertex] = find_parent(parents[vertex], parents)
    return parents[vertex]

parents = list(range(n_vertices))
rank = [1] * n_vertices
cheapest_edge = [None] * n_vertices
edges = []
weights = []

while n_components > 1:
    for (f, t, w) in inputs:
        f_parent, t_parent = find_parent(f, parents), find_parent(t, parents)
        if f_parent == t_parent:
            ## nodes are in the same component, no need for edge
            continue
        else:
            if (not cheapest_edge[f_parent]) or (cheapest_edge[f_parent][2] > w):
                cheapest_edge[f_parent] = (f,t,w)
                
            if (not cheapest_edge[t_parent]) or (cheapest_edge[t_parent][2] > w):
                cheapest_edge[t_parent] = (f,t,w)
        
    for vertex in range(n_vertices):
        if cheapest_edge[vertex]:
            f, t, w = cheapest_edge[vertex]
            f_parent, t_parent = find_parent(f, parents), find_parent(t, parents)

            if f_parent == t_parent:
                ## The cheapest edge is redundant
                continue
            
            edges.append((f,t))
            weights.append(w)
            if rank[f_parent] >= rank[t_parent]:
                parents[t_parent] = f_parent
                rank[f_parent] += rank[t_parent]
            else:
                parents[f_parent] = t_parent
                rank[t_parent] += rank[f_parent]
            n_components -= 1

    cheapest_edge = [None] * n_vertices

edges
sum(weights)

37

### Time Complexity

- Time Complexity
    - For $V$ vertices, there are at most $\log V$ iterations on average, since we halve the number of components in each iteration
    - For each iteration, we compare all $E$ edges to find the minimum edge to a vertex
    - This gives us time complexity of $O(E \log V)$
    - Union find takes amortised constant time, so we treat it as approximately $O(1)$

- Space complexity
    - Due to union find, we have 2 arrays `parents` and `rank` of size $V$
    - Thus, space complexity is $O(V)$