## Disjoint Set
### Overview
* disjoint set as a data structure
  + The primary use of disjoint sets is to address the connectivity between the components of a network
    + to quickly check if two vertices are connected
  + also known as union-find
* terminologies
  + parent node: the direct parent node of a vertex
  + root node: a node without a parent node. can be considered as the praent node of itself

### Implementation
* find function finds the root node of a given vertex
* union function unions two vertices and makes their root nodes the same
#### implementation of Quick Find
* time complexity of find function will be O(1)
* time complexity of union function will be O(N)
* the basic idea is to maintain a root array that keeps the roots for all elements
* when we union two nodes, we first check if they have the same root, if not, then we traverse root array, and modify the root of all the elements having node y as their root to x
* in most cases, the find and union will be O(N),and connecting N nodes will be O(N^2)
  + by doing this, all elements having node y as the root will now have x as their root, including node y
```python
# UnionFind class
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)]

    def find(self, x):
        return self.root[x]
		
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            for i in range(len(self.root)):
                if self.root[i] == rootY:
                    self.root[i] = rootX

    def connected(self, x, y):
        return self.find(x) == self.find(y)
```


#### implementation of Quick Union
* the time complexity of the union function is O(N) in the worst case, but on average better than it
* the find function is O(N) in the worst case
* to connect N nodes takes <= O(N^2). On average, quick union is better than quick find
* quick union form certain hierarchy of chains, but it is possible for all vertices to form a line

In [None]:
# UnionFind class
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)]

    def find(self, x):
        while x != self.root[x]:
            x = self.root[x]
        return x

    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            self.root[rootY] = rootX

    def connected(self, x, y):
        return self.find(x) == self.find(y)

#### Union by rank
* the purpose of union by rank is to spread the tree and decrease its height
* rank refers to the height of each vertex
* when union two nodes, we use the vertex with higher rank as the parent node of the lower one, and therefore, reduce or maintain the overall lower height of the vertices
  + when union a rank 3 vertex with a rank 2 vertex, if we use the rank 3 one as the parent node, then the height of the joined disjoint set is 3, otherwise it is 4. Therefore, using the higher rank vertex as parent node decrease the overall height of the set  
  + when the two vertices have the same rank, use any of them as the parent node, and then increment its rank after union
* time complexity
  + worse case is that we repeatedly union components of equal rank, the tree height will be at most log(N)+1, so find operation is O(log(N)) in worst case
  + union will be O(log(N)) since it use find()
  + O(N) for constructor

In [None]:
# union by rank
# UnionFind class
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)]
        self.rank = [1] * size

    def find(self, x):
        while x != self.root[x]:
            x = self.root[x]
        return x
 
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] > self.rank[rootY]:
                self.root[rootY] = rootX
            elif self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            else:
                self.root[rootY] = rootX
                self.rank[rootX] += 1

    def connected(self, x, y):
        return self.find(x) == self.find(y)

#### Path compression optimization
* to further decrease the tree height, after finding the root node, we udate the parent node of all travesed elements to their root node
  + the recursive chain keep going up to the root element
* this effectively decrease the height of the tree, actually, there are only two possiblities:
  + the node is the root of its branch, where x = root\[x\]
  + the node has another node as its parent, where you only need two recursive run to find root
  + note that this only happens when we search the node again. The building process still may take O(N), although it is rare
* notice the different implementation in find(x) where we don't use while loop, but a if statement as base case, and recursively call find(root\[x\]) to find root, and set the root\[x\] to returned root node, and then return the root node
* we don't use the union by rank in this implementation, and therefore, the worst case could have O(N) for find() when building the chain
* Time complexity
  + constructor O(N)
  + find O(logN) on average, worst case O(N)
  + union O(logN) on average, worst case O(N)
  + connected O(logN) on average, worst case O(N)

In [None]:
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)]

    def find(self, x):
        if x == self.root[x]:
            return x
        self.root[x] = self.find(self.root[x])
        return self.root[x]
 
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            self.root[rootY] = rootX

    def connected(self, x, y):
        return self.find(x) == self.find(y)

#### optimized disjoint set with path compression and union by rank
* here we combine the path compression in find() and union by rank in union()
* time complexity
  + constructor O(N)
  + find: O(alpha(N)) on average, regarded as O(1)
  + union and connect: O(alpha(N)) on average, regarded as O(1)

In [1]:
class UnionFind:
    def __init__(self, size):
        self.root = [i for i in range(size)]
        # Use a rank array to record the height of each vertex, i.e., the "rank" of each vertex.
        # The initial "rank" of each vertex is 1, because each of them is
        # a standalone vertex with no connection to other vertices.
        self.rank = [1] * size

    # The find function here is the same as that in the disjoint set with path compression.
    def find(self, x):
        if x == self.root[x]:
            return x
        self.root[x] = self.find(self.root[x])
        return self.root[x]

    # The union function with union by rank
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] > self.rank[rootY]:
                self.root[rootY] = rootX
            elif self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            else:
                self.root[rootY] = rootX
                self.rank[rootX] += 1

    def connected(self, x, y):
        return self.find(x) == self.find(y)


#### Leetcode 547
* we can use Union_Find to solve this problem. The key points:
  + straightforward to use UF. Union two points when isConnected\[i\]\[j\]==1
  + it is better to set a count that initialized to the size of islands, and each time a union is called to join two points, decrement this value
  + return the count as results
  + we can use the count of root nodes, but in case a node doesn't update its parent node, we get incorrect results
  
#### Leetcode 261
* a tree is a fully connected graph without repeated edges
  + the tree should have only one root node, which can be implemented by union() to decrement count
  + the number of unrepeated edges should equals to n-1
  
#### Leetcode 1202
* the key points are the following:
  + we use the unionfind to connect all the pairs as the indices of the input string
  + we then traverse the input string by both index and char. and find the root of each index
  + we store the index and char of the input string to the corresponding defaultdicts using the root index as key
  + we then travers the two defaultdicts, retrieve the index and char lists, and sort them
  + we then insert the chars from the sorted char list to the corresponding index defined by the sorted index list
  + we finally join the sorted char list 
* time complexity
  + we will connect all the edges in the pairs, which is O(V), here V times alpha(V)
  + space complexity: O(V)
  ```python
     # traverse the input string, and aggregate the chars and indices 
     # using the root index of each collections. Then store them in
     # defaultdicts        
     for i, char in enumerate(s):
        root_index = uf.find(i)
        char_dict[root_index].append(char)
        index_dict[root_index].append(i)
        
    # prepare empty string list for output
    rs = [""] * n
    
    # retrieve the char and index lists, sort them
    # and define the output string list using the 
    # sorted index and char lists
    for key, value in char_dict.items():
        value.sort()
        index_dict[key].sort()
        for (i, char) in zip(index_dict[key], value):
            rs[i] = char
    return "".join(rs)        
    ```

In [4]:
s = "abc"
for i, char in enumerate(s):
    print(i, char)

0 a
1 b
2 c


In [15]:
from collections import defaultdict
a = defaultdict(list)
a[3].append(0)
a[3].append(3)
a[2].append(1)
a[2].append(2)

In [16]:
a

defaultdict(list, {3: [0, 3], 2: [1, 2]})

In [17]:
b = defaultdict(list)
b[3].append('d')
b[3].append('b')
b[2].append('c')
b[2].append('a')

In [18]:
b

defaultdict(list, {3: ['d', 'b'], 2: ['c', 'a']})

In [19]:
for key, value in a.items():
    for (i, char) in zip(b[key].sort(), )

3 [0, 3]
2 [1, 2]


In [11]:
rs =[""] * 5

In [12]:
rs

['', '', '', '', '']

In [13]:
rs[0] ='a'

In [14]:
rs

['a', '', '', '', '']