### Disjoint set 

- As the name suggests, the disjoint set structure helps us maintain disjoint groups, and helps us determine if elements belong to the same group easily! Let's take an example:
    - Imagine I have a maze, where every path is made up discrete units
    - Imagine that we want to know if we can reach an arbitrary point B from an arbitrary point A
        - For this to happen, we need to know if A and B belong to the same set
            - For us to know this, we must have a way of adding items to common sets iteratively

- From the requirements above, we define a **disjoint set** data structure with the following methods
    - `MakeSet(x)`: Creates a singleton set $\{x\}$
    - `Find(x)`: Return the ID of the set containing $x$
        - If x and y are in the same set, then $\text{Find}(x) = \text{Find}(y)$
    - `Union(x,y)`: Merges two sets containing $x$ and $y$ if $\text{Find}(x) \neq \text{Find}(y)$
    
- In the maze example, we make use of this data structure to answer our question by doing a preprocessing step:

```
def preprocess(maze):
    for cell in maze:
        MakeSet(cell)
    for cell in maze:
        for neighbour in cell.neighbours:
            Union(cell, neighbour)
```

- Then we can simply answer our question by doing
```
def is_reachable(A,B):
    return Find(A) == Find(B)
```

### Naive implementation of disjoint set

- Let's take the value of the smallest element of a set as its ID
- We will maintain an array called `group_id`, which shows the group_id for a each value
    - Assume we have 3 sets: {9,3,2,4,7}, {5}, {6,1,8}
    - then `group_id` array is [1,2,2,2,5,1,2,1,2]

In [14]:
class DisjointSetNaive:
    def __init__(self, input_array):
        self.input_array = input_array
        self.group_id = [None] * (len(input_array))
     
    def make_set(self, index):
        '''
        O(1)
        '''
        self.group_id[index] = self.input_array[index]

    def find(self, index):
        '''
        O(1)
        '''
        return self.group_id[index]

    def union(self, index1, index2):
        '''
        O(N) due to the for loop
        '''

        index1_id = self.find(index1)
        index2_id = self.find(index2)
        
        ## If groupid for both indices are the same, then they are in the same set
        if index1_id == index2_id:
            return
        
        ## Else they are in different sets
        else:
            ## use the smaller of the 2 IDs as the parent set for merging
            merge_into_group_id = min(index1_id, index2_id)
            
            ## Go through the entire array. 
            for i in range(len(self.input_array)):
                # If the group ID for index i is index1_id or index2_id, redirect to the master group ID determined above
                if self.group_id[i] in [index1_id, index2_id]:
                    self.group_id[i] = merge_into_group_id
        
input_array = [9,3,2,4,7,5,6,1,8]
djs = DisjointSetNaive(input_array)
for i in range(len(input_array)):
    djs.make_set(i)
djs.union(0,1)
djs.union(0,2)
djs.union(0,3)
djs.union(0,4)
djs.union(5,5)
djs.union(6,7)
djs.union(6,8)

## To get the same answer as the slide use the second one. For some reason their group_id stores the group_id according to the value of the input rather than the position of the input; i.e. their index 1 is the group_id of the value 1, not the group_id of the index 1
djs.group_id
[djs.group_id[input_array.index(x)] for x in range(1,10)]

[1, 2, 2, 2, 5, 1, 2, 1, 2]

- Based on current implementation, the `Union` method is the obvious bottleneck. How might be make this the union more efficient?
    - In our data structures so far, to do a merge efficiently, we know we can use a linked list! 
    - As long as we know the head of one linked list and the tail of another, we can merge them easily in $O(1)$

- But
    - if we use a linked list, `Find` becomes $O(N)$, because we potentially need to traverse the whole linked list to find the tail (the tail is assumed to be the list ID)
    - we also need to store the additional information of where `tail` is to do the union, incurring additional space

- Simple hack
    - Instead of attaching the head to a tail, we attach a tail to the tail!
    - So instead of a "linked list", your structure becomes a tree!! 
    - TO BE CONTINUED in the next section