#### Union-Find Data Structure: Pointer-Based Implementation

Instead of using arrays, an alternative way of implementing a union-find data structure is pointer-based. In this approach, each item is contained inside a `record` which hold the item and a `pointer`. Initially, the pointer for each item's record is null. A null pointer indicates that a record represents a connected component labeled by the item contained in that record. 

We will again maintain an array `Size[n]` which stores the size of each connected component. 

Given two components `A` and `B`, let $u \in A$ be the record of the item labelling `A` and $v \in B$ be the record of the item labelling `B`. Then `Union(A,B)` operation merges the two connected components by going to the record of item labelling the smaller of the two components and updating it's pointer to point to the record of the item labelling the larger component. e.g. if `Size[A] < Size[B]`, then we would update `pointer(u) -> v` and also update the component sizes. This the cost of the union operation is $O(1)$.

Given a record `u`, the `Find(u)` operation involves following the pointer of that record back to the root record, i.e. we follow the pointers until we reach a record with a null pointer, which indicates we've reached the record whose item labels the connected component containing `u`. 

Everytime we perform a merge, the size of the larger component can at most double. Now suppose that a particular element `v` is involved in a sequence of `Union(A,B)` operations, i.e. `v` is in either `A` or `B`. Then the size of the larger of the two components can at most double after each `Union()` merge. Since the size of the component containing `v` initially is 1 and can grow to be at most `n`, the number of such size doublings that could occur can be at most $\log_2 n$. This mean that the number of times that the label of the connected component containing item `v` gets "updated" can be at most $O(\log n)$, which means that a `Find()` operation will require traversing across at most $\log n$ pointers and must therefore cost $O(\log n)$ in the worst case. 


In [3]:
class record():
    def __init__(self, item_id):
        self.item_id = item_id
        self.pointer =  None

    def update_pointer(self, pointer):
        self.pointer = pointer    


class UnionFindPointer():
    def __init__(self):
        pass
    
    def makeUnionFind(self, S):    
        # S contains a list of n items 
        self.S = S
        # we assign ids 0, 1, 2, 3, ...n-1 to the items in S
        self.item_id = {self.S[i]: i for i in range(len(S))}
        # create records for each item
        self.records = [record(i) for i in range(len(S))]
        self.Size = [1 for i in range(len(S))]

    def find(self, u): 
        # convert u to its corresponding id
        u = self.item_id[u]
        # return the root of the component containing u
        while self.records[u].pointer:
            u = self.records[u].pointer.item_id
        return self.S[u]
    
    def union(self, A, B):
        # convert A and B to their corresponding ids
        A = self.item_id[A]
        B = self.item_id[B]
        # update the pointer of the root of the smaller component to the root of the larger component
        if self.Size[A] < self.Size[B]:
            self.records[A].update_pointer(self.records[B])
            self.Size[B] += self.Size[A]
            self.Size[A] = 0
        else:
            self.records[B].update_pointer(self.records[A])
            self.Size[A] += self.Size[B]
            self.Size[B] = 0    

    def __str__(self):
        # create dictionary of component lists
        component_dict = {i:[] for i in self.S}
        for i in range(len(self.S)):
            component_dict[self.find(self.S[i])].append(self.S[i]) 
        
        return str(component_dict)                

In [4]:
S = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
uf = UnionFindPointer()
uf.makeUnionFind(S)
print(uf)

uf.union(uf.find('a'), uf.find('b'))
print(uf)
uf.union(uf.find('a'), uf.find('d'))
print(uf)
uf.union(uf.find('f'), uf.find('j'))
print(uf)

{'a': ['a'], 'b': ['b'], 'c': ['c'], 'd': ['d'], 'e': ['e'], 'f': ['f'], 'g': ['g'], 'h': ['h'], 'i': ['i'], 'j': ['j']}
{'a': ['a', 'b'], 'b': [], 'c': ['c'], 'd': ['d'], 'e': ['e'], 'f': ['f'], 'g': ['g'], 'h': ['h'], 'i': ['i'], 'j': ['j']}
{'a': ['a', 'b', 'd'], 'b': [], 'c': ['c'], 'd': [], 'e': ['e'], 'f': ['f'], 'g': ['g'], 'h': ['h'], 'i': ['i'], 'j': ['j']}
{'a': ['a', 'b', 'd'], 'b': [], 'c': ['c'], 'd': [], 'e': ['e'], 'f': ['f', 'j'], 'g': ['g'], 'h': ['h'], 'i': ['i'], 'j': []}


#### Optimization: Path Compression

Every time we invoke the `Find(u)` operation, we have to follow the backpointers all the way back to the root record. Now imagine that we make successive calls for records that are all in the same component. On many of these call, we may end up following a lot of the same back pointers. However, after the first call, we already know the root of all records along the path. We can remove all the redundant path traversing by `compressing` the path after each call to `Find(u)`. By this, we mean that after each call, we reset the pointers of all records along the path to point to the same root. Then on a subsequent call, the traversal will reach the root in a single step from any record along that path.



In [7]:

class UnionFindPointer_Optimized():
    def __init__(self):
        pass
    
    def makeUnionFind(self, S):    
        # S contains a list of n items 
        self.S = S
        # we assign ids 0, 1, 2, 3, ...n-1 to the items in S
        self.item_id = {self.S[i]: i for i in range(len(S))}
        # create records for each item
        self.records = [record(i) for i in range(len(S))]
        self.Size = [1 for i in range(len(S))]

    def find(self, u): 
        # convert u to its corresponding id
        u = self.item_id[u]
        # follow the pointers back to root, store every record along the way in a list
        path = []
        while self.records[u].pointer:
            path.append(u)
            u = self.records[u].pointer.item_id
        root_id = u
        # path compression: reset the pointers of all the records in the path to the root
        for item_id in path:
            self.records[item_id].update_pointer(self.records[root_id])        
        return self.S[root_id]
    
    def union(self, A, B):
        # convert A and B to their corresponding ids
        A = self.item_id[A]
        B = self.item_id[B]
        # update the pointer of the root of the smaller component to the root of the larger component
        if self.Size[A] < self.Size[B]:
            self.records[A].update_pointer(self.records[B])
            self.Size[B] += self.Size[A]
            self.Size[A] = 0
        else:
            self.records[B].update_pointer(self.records[A])
            self.Size[A] += self.Size[B]
            self.Size[B] = 0    

    def __str__(self):
        # create dictionary of component lists
        component_dict = {i:[] for i in self.S}
        for i in range(len(self.S)):
            component_dict[self.find(self.S[i])].append(self.S[i]) 
        
        return str(component_dict)                

In [8]:
S = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
uf = UnionFindPointer_Optimized()
uf.makeUnionFind(S)
print(uf)

uf.union(uf.find('a'), uf.find('b'))
print(uf)
uf.union(uf.find('a'), uf.find('d'))
print(uf)
uf.union(uf.find('f'), uf.find('j'))
print(uf)

{'a': ['a'], 'b': ['b'], 'c': ['c'], 'd': ['d'], 'e': ['e'], 'f': ['f'], 'g': ['g'], 'h': ['h'], 'i': ['i'], 'j': ['j']}
{'a': ['a', 'b'], 'b': [], 'c': ['c'], 'd': ['d'], 'e': ['e'], 'f': ['f'], 'g': ['g'], 'h': ['h'], 'i': ['i'], 'j': ['j']}
{'a': ['a', 'b', 'd'], 'b': [], 'c': ['c'], 'd': [], 'e': ['e'], 'f': ['f'], 'g': ['g'], 'h': ['h'], 'i': ['i'], 'j': ['j']}
{'a': ['a', 'b', 'd'], 'b': [], 'c': ['c'], 'd': [], 'e': ['e'], 'f': ['f', 'j'], 'g': ['g'], 'h': ['h'], 'i': ['i'], 'j': []}
