# A0149963M 
## YSC 2229 - Assignment 3 Solutions


### Problem 1

In this problem, we are required to implement the randomized solution of the vertex cover problem. The input of this function is a a graph consisting of nodes and edges, represented by an open hash table. Therefore, it becomes very sensible to implement a Node and Linked List before starting on this problem. Below shows my implementation of these 2 basic data structures. In addition, there is also a mini-sanity check to ensure that this implementation works:

In [138]:
class Node:
    def __init__(self, data = None, next = None):
        self.data = data
        self.next = None
    
    def get_name(self):
        return self.data
    
class LinkedList:
    def __init__(self):
        self.head = None
        
    def insert(self, data):
        new = Node(data)
        new.next = self.head
        self.head = new
    
    def display(self):
        head = self.head
        while head:
            print(head.data, "->", end=" ")
            head = head.next
        print("end")
        
    def search(self, x):
        curr = self.head
        while curr != None: 
            if curr.data == x:
                print("Match Found, Returning Pointer")
                return curr
            curr = curr.next
        print("Match not Found. Returning None")
        return curr
        
    
    def delete(self, x):
        curr = self.head
        prev = None
        while curr and curr.data != x:
            prev = curr
            curr = curr.next
        if not Node:
            return None
        else:
            remove = curr.data
            if not prev:
                self.head = curr.next
            else: 
                prev.next = prev.next.next
            return remove

In this code block, we test some cases of the Linked List. 

In [139]:
# test insert and display

l = LinkedList()
l.insert(3)
l.insert(6)
l.insert(9)
l.display()

9 -> 6 -> 3 -> end


In [140]:
# test search 
print("\nSearching 9...")
l.search(9)
print("\nSearching 6...")
l.search(6)
print("\nSearching 3...")
l.search(3)

# Values that don't exist in the linked list
print("\nSearching 10...")
l.search(10)
print("\nSearching 999...")
l.search(999)


Searching 9...
Match Found, Returning Pointer

Searching 6...
Match Found, Returning Pointer

Searching 3...
Match Found, Returning Pointer

Searching 10...
Match not Found. Returning None

Searching 999...
Match not Found. Returning None


In [141]:
# appending more elements before testing delete

l.insert(10)
l.insert(17)
l.insert(20)
l.display()

20 -> 17 -> 10 -> 9 -> 6 -> 3 -> end


In [142]:
# testing deleting of head
l.delete(20)
l.display()

17 -> 10 -> 9 -> 6 -> 3 -> end


In [143]:
# testing deleting of tail
l.delete(3)
l.display()

17 -> 10 -> 9 -> 6 -> end


In [144]:
# testing deleting of middle element
l.delete(10)
l.display()

17 -> 9 -> 6 -> end


Next, we need to implement the Graph. The Graph is supposed to be an open hash table. However, we need to consider for the case for when the vertice number is very high (100000) yet only a few edges are filled. It would be a waste of memory space to simply generate empty linked lists for these edges. Hence, we will use an open hash to address this, where the keys are the slots. These slots then map to a linked list. Each node contains information as follows **(vertice, edge)** (e.g. (3, (3, 5)) is legitimate but (4, (8, 10)) is not because 4 is not a vertice in this edge.

In the graph, you can then insert edges as per normal: **(a, b)**. My algorithm would automatically parse this into a suitable data type for the linked list. With that in my mind, please ensure that a < b as far as possible, since that is how I defined my function. 

In addition, I have also defined a Search function that searches takes in an edge and searchs for the corresponding edge in the Graph. Since (4, (4, 7)) is the same as (7, (4, 7)), I've included provisions for both possibilities. However, if you expect this edge to appear but is not found, please input (7, 4) instead and check whether there is such a value corresponding to the Graph. I've ignored the opposite way of writing the tuples since this graph is undirected and does not require too much attention. 

Next, I have a display function that helps to visualize the linked list in each node. Run it without any arguments if you want to see what the graph looks like.

Last but not least, I have the **generate** function, which helps to generate a random graph with inputs n numbers of vertices and a maximum of m number of edges. This function generates unique edges (a, b) where a < b, so you so not have to worry about repeated edges. Just input your desired number of vertices and (max) edges and run it.


The aforementioned functions are shown below:

In [145]:
import random    
    
class Graph:
    def __init__(self, slots):
        # Initializing Open Hash
        self.hash = {i: LinkedList() for i in range(slots)}
        self.slots = slots
    
    def insert(self, edge):
        """
        edge - Edge that the user desires to insert
        in the open Hash Table
        """
        slot = edge[0] % self.slots        
        self.hash[slot].insert((edge[0], edge))
    
    def search(self, edge):
        """
        edge - Edge that the user desires to find 
        in the open Hash Table
        """
        # find first possibility 
        slot = edge[0] % self.slots 
        curr = self.hash[slot].head
        while curr:
            if (edge[0], edge) == curr.data:
                print("Match found, returning pointer for", edge)
                return curr
            curr = curr.next
        
        # find second possibility 
        slot = edge[1] % self.slots 
        curr = self.hash[slot].head
        while curr:
            if (edge[1], edge) == curr.data:
                print("Alternative Match found, returning pointer for", edge)
                return curr
            curr = curr.next
        
        print("Match not found, returning None for", edge)
        return
        
    def generate(self, m_vertice, n_edges_max):
        """
        Generates a maximum of n random edges for 
        m vertices to create a graph that can be 
        tested with.
        
        m_vertices - number of vertices
        n_edges_max - maximum number of edges created
        """
        x = set()
        for i in range(n_edges_max):
            low = random.randint(0, m_vertice - 1)
            high = random.randint(low + 1, m_vertice)
            edge = (low, high)
            if edge in x:
                continue
            else:
                x.add(edge)
                self.insert(edge)
                
    def display(self):
        """
        Prints the Open Hash Table for easy visualization
        """
        for i in range(self.slots):
            curr = self.hash[i].head
            print("\nPrinting Vertices and Edges for Slot Number", i)
            while curr:
                print(curr.data, " -> ", end = " ")
                curr = curr.next
            print("end", end = "")
                

These 2 code blocks demonstrate the capabilities of my function:
* For the graph initaialization function, simply add the number of slots you want in the Linked List.
* For the insert function, add your desired edge as a tuple. There is no restriction, just that the values in the edges must be integers.
* For the search function, search your desired edge as a tuple. If that doesn't work, consider swapping the values.
* For the generate function, add in your desired number of vertices and maximum number of edges and run it. Please ensure that this graph is empty!

Feel free to change the values to your satisfaction.

In [146]:
g = Graph(10)
g.insert((3, 4))
g.search((3, 4))
g.search((6, 8))

Match found, returning pointer for (3, 4)
Match not found, returning None for (6, 8)


In [147]:
g = Graph(5)
g.generate(25, 50)
g.display()


Printing Vertices and Edges for Slot Number 0
(5, (5, 16))  ->  (0, (0, 2))  ->  (20, (20, 23))  ->  (15, (15, 23))  ->  (5, (5, 20))  ->  (10, (10, 18))  ->  (15, (15, 25))  ->  (10, (10, 14))  ->  end
Printing Vertices and Edges for Slot Number 1
(1, (1, 19))  ->  (16, (16, 19))  ->  (6, (6, 13))  ->  (11, (11, 15))  ->  (16, (16, 17))  ->  (21, (21, 23))  ->  (21, (21, 24))  ->  (21, (21, 22))  ->  (6, (6, 21))  ->  (6, (6, 9))  ->  (6, (6, 14))  ->  end
Printing Vertices and Edges for Slot Number 2
(2, (2, 16))  ->  (2, (2, 19))  ->  (17, (17, 20))  ->  (2, (2, 9))  ->  (7, (7, 21))  ->  (12, (12, 24))  ->  (12, (12, 23))  ->  (17, (17, 25))  ->  (22, (22, 24))  ->  (7, (7, 20))  ->  (2, (2, 3))  ->  (22, (22, 25))  ->  (2, (2, 23))  ->  end
Printing Vertices and Edges for Slot Number 3
(23, (23, 24))  ->  (13, (13, 20))  ->  (23, (23, 25))  ->  end
Printing Vertices and Edges for Slot Number 4
(4, (4, 20))  ->  (9, (9, 11))  ->  (9, (9, 25))  ->  (19, (19, 21))  ->  (19, (19, 23)

In this section, I will cover the implementation of the Approximate Vertex Cover which follows very closely to the algorithm given in the assignment brief. In this case, this cover takes an argument of type Graph (as defined earlier) and then finds the sets of all the edges in the Graph. 

Afterwards, a random edge is chosen and added to the output later on. A while loop runs as long as the edgeset is not empty. After that, we traverse the edgeset and find out which edges are touching the randomly chosen vertice and add it to the remove array. 

Lastly, we traverse this remove array to remove these edges from the set. After that, the while-loop runs again if the edgeset is not empty, the remove array is re-initialized, and the cycle repeats until the edgeset is empty. After that, the function returns a list of edges that make up the vertex cover.

To run this function, Simply run the cell below the implementation. If this function works, it should return a non-unique vertex cover each time it is run that is not necessarily the minimum vertex cover.

In [148]:
import random

def approx_vertex_cover(graph):
    """
    graph - Graph with open Hash table
    
    Given an input graph, it prints the number of edges and
    the edgeset that denotes the approximate vertex cover
    solution. Solution should be non-unique.
    """
    out = []
    edgeset = set()
    
    # set of all edges
    for i in range(graph.slots):
        curr = graph.hash[i].head
        while curr:
            # append edges, not vertices
            edgeset.add(curr.data[1])
            curr = curr.next
    
    while (edgeset != set()):
        # making random choices
        (x_ran, y_ran) = random.choice(tuple(edgeset))
        out.append((x_ran, y_ran))
        remove = []
        for (a, b) in edgeset:
            if a == x_ran or b == x_ran or a == y_ran or b == y_ran:
                remove.append((a, b))
        
        for edge in remove:
            edgeset.remove(edge)
    
    print("Length of approx vertex cover is", len(out))
    print("Vertex cover is",out)
    return out


In [149]:
g = Graph(5)
g.generate(25, 50)

approx_vertex_cover(g)

Length of approx vertex cover is 10
Vertex cover is [(2, 21), (1, 14), (5, 19), (22, 24), (7, 10), (9, 15), (11, 17), (16, 23), (13, 20), (3, 8)]


[(2, 21),
 (1, 14),
 (5, 19),
 (22, 24),
 (7, 10),
 (9, 15),
 (11, 17),
 (16, 23),
 (13, 20),
 (3, 8)]

### Problem 2

In this problem, we are required to perform radix sort on letters instead of numbers. For the most part, the implementation is very similar to what we did with numbers. However, there a few modifications to the code. 

In terms of Counting Sort for letters, there are a few differences. Firstly, instead of 10 possible digits (0- 9), we have 26 different possible characters **(small 'a' to small 'z')**. Since, our count array has 26 entries instead of 10. Furthermore, since count sort works on numbers, we need to convert our small letters to meaningful numbers somehow. This is done through **ord()**, which accepts a string of length 1 and returns the unicode code point representation. Hence,
* ord(a) returns 97
* ord(b) returns 98
* ...
* ord(y) returns 121
* ord(z) returns 122
From this, we know that the unicodes for small letters are consecutive. However, it starts from 97 and ends at 122. Since the counting array should be as small as possible to ensure that count sort works on O(n) time, we should make the count array a size of 26, not 122. This is done by shifting these unicode indexs by -97, which is defined in line 4 of the function count_sort_letters. Other than that, the main function does not change.

In order to use this function, please input an array of strings that is to be sorted. Please note that this algorithm only sorts small letter characters. Correct examples include
* ["bob", "danvy", "unittests", "mathematics", "jewel", "gem"]
* ["nancy", "dave", "zhao", "give", "job", "please"]

These are incorrect examples:
* "bob", "danvy", "unittests", "mathematics", "jewel", "gem"
* "nancy", "dave", "zhao", "give", "job", "please"
* ["2134", "AQUINAS", "Liberal", "Wine"]

The first 2 examples are incorrect because it is not instantiated inside a list. The last one is incorrect because it contains forbidden characters (numbers, capital letters, etc).

To run the relevant functions, call radix_sort_letters and put in an array of strings consisting of small letters only. 

In [150]:
def count_sort_letters(array, size, col, max_len):
    out   = [0] * size 
    
    # 26 for all 26 letters of the alphabet
    count    = [0] * 26
    
    # Using this to shift Unicode by 97 to the left
    # To make the count table as small as possible. 
    shift = ord('a') 

    for item in array: 
        if col < len(item):
            # shift to the corresponding index in
            # the count array
            letter = ord(item[col]) - shift
        else: 
            letter = 0
        count[letter] += 1 
    
    # Add culminative counts
    for i in range(len(count) - 1):   
        count[i + 1] += count[i] 
    
    # Performing count sorting
    for item in array:
        if col < len(item):
            letter = ord(item[col]) - shift
        else:
            letter = 0
        out[len(out) - count[letter]] = item
        count[letter] -= 1
    return out

def radix_sort_letters(arr):
    """
    arr - input array of strings 
    
    Radix sort sorts the letters from the 
    rightmost letter to the leftmost letter.
    """
    if arr == []:
        return []
    
    max_col = len(max(arr, key = len)) 
    for col in reversed(range(max_col)): 
        array = count_sort_letters(arr, len(arr), col, max_col)
    return array

def radix_sort_letters(array):
    """
    arr - input array of strings 
    
    Radix sort sorts the letters from the 
    rightmost letter to the leftmost letter.
    """
    max_col = len(max(array, key = len)) 
    for col in reversed(range(max_col)): 
        array = count_sort_letters(array, len(array), col, max_col)
    return array

print(radix_sort_letters(["dog", "cat", "rain", "umbrella", "bob", "digit", "zeta"]))
print(radix_sort_letters(["bob", "danvy", "unittests", "mathematics", "jewel", "gem"]))
print(radix_sort_letters(["a", "z", "b", "l", "i", "e", "f"]))
print(radix_sort_letters(["abc", "aaa", "acb"]))

['zeta', 'umbrella', 'rain', 'dog', 'digit', 'cat', 'bob']
['unittests', 'mathematics', 'jewel', 'gem', 'danvy', 'bob']
['z', 'l', 'i', 'f', 'e', 'b', 'a']
['acb', 'abc', 'aaa']


### Problem 3

This question prompts us to implement a queue using 2 stacks. While this is possible, we will realize that we have to pay a cost in terms of runtime later on. Before beginning, let's recap on the nature of stacks and queues:
* Stacks - FIFO. The first element pushed is the first element popped
* Queue - LIFO. The last element enqueued becomes the first element to be dequeued

With such a large difference in terms of FIFO and LIFO, to implement a stack using a queue, the second stack will become an auxillary stack that helps fulfill the requirements of a queue. 

In this implementation of the queue, I felt that it didn't really make sense to include a head and tail since the stack is handling all data manipulation, so I removed it. To begin with the implementation, I first used 2 stacks. In a sense, the first stack would be the main stack storing the queue while the second stack functions as a cache to reverse the order of a stack and pop the bottommost element for dequeuing, thereby fulfilling the requirements of a queue. We show the Stack first:

In [151]:
class Stack:
    def __init__(self, size):
        self.items = [None] * size
        self.top = 0
        self.size = size
    
    def isEmpty(self):
        return self.top == 0

    def push(self, data):
        if self.top >= self.size:
            raise ValueError("Stack Overflow")
        self.items[self.top] = data
        self.top += 1
    
    def pop(self):
        if self.top <= 0:
            raise ValueError("Stack Underflow")
        self.top -= 1
        return self.items[self.top]
    
    def display(self):
        """
        Prints stack. The first element is the First element enqueued
        which should be dequeued last. The last element is the last element 
        enqueued which should be dequeued first. 
        """
        temp = self.top
        for i in range(0 ,temp):
            print(self.items[i])

When implementing a Queue, we could choose to make either enqueuing or dequeuing costly. In this case, we choose the latter to be costly. For this case, we denote the 2 stacks as s1 (Main) and s2 (Auxillary). When enqueuing, we simply push the data into s1 for a O(1) operation. in the FIFO paradigm, this means that the element at the bottom of s1 should be removed when performing dequeuing.

In this case, we define the top of stack s1 to be the first element in the queue. Since a queue is LIFO in nature, this means that. This element is the last to be out. Instead, the bottom element of stack 1 is removed during a dequeue. In this case, in order to remove this, we need to make this at the top of the stack. This is done by popping all elements in stack 1 and pushing them in order to stack 2. After this is done, then we pop off the first element in stack 2. This is doable because the elements in stack 2 are arranged in reverse order to stack 1.

After this is done, all that is needed is to pop all the elements in stack 2 in order and push them back into stack 1, thereby preserving the order of the Queue. Hence, enqueuing is a O(1) operation but dequeuing is O(n) instead, which is rather costly. 

In [152]:
class Queue:
    
    def __init__(self, size):
        self.s1 = Stack(size)
        self.s2 = Stack(size)
        self.cache = 0 
        self.size = size
    
    def enqueue(self, data):
        """
        Normal, Bread and Butter enqueuing
        That costs O(1)
        """
        # check for overflow
        self.cache += 1
        if self.cache > self.size:
            raise ValueError("Stack Overflow")
        self.s1.push(data)
    
    def dequeue(self):
        """
        Costly dequeuing that costs O(n)
        because element to be dequeued is 
        in Bottom of stack
        """
        # check for underflow 
        self.cache -= 1
        if self.cache < 0:
            raise ValueError("Stack Underflow")
        
        while not self.s1.isEmpty():
            self.s2.push(self.s1.pop())
        
        item = self.s2.pop()
        
        while not self.s2.isEmpty():
            self.s1.push(self.s2.pop())
            
        return item  
    
    def display(self):
        self.s1.display()

This section shows some of the tests I performed.

In [153]:
# test for underflow - uncomment to run
# q = Queue(1)
# q.dequeue()

In [154]:
# test for overflow - uncomment to run
# q = Queue(1)
# q.enqueue(2)
# q.enqueue(3)

In [155]:
# 4 is added last
# 1 is added first, should be removed upon dequeue
# 2 is added second, should be removed upon second dequeue

q = Queue_alt(4)
q.enqueue(1)
q.enqueue(2)
q.enqueue(3)
q.enqueue(4)
q.display()
print("Dequeuing, ", q.dequeue())
print("Current queue: ")
q.display()
print("Dequeuing, ", q.dequeue())
print("Current queue: ")
q.display()

4
3
2
1
Dequeuing,  1
Current queue: 
4
3
2
Dequeuing,  2
Current queue: 
4
3


Another way of defining Queues is by making the Enqueuing expensive (O(n) time) and Dequeuing (O(1) time) cheap. In this case, the top element of the queue become the last element added to the queue under the LIFO method, making it the first to be removed. On the other hand, the bottom element of the queue becomes the first element to be added to the queue. Therefore, when dequeuing, we simply remove the first element at the top of stack s1.

However, for enqueuing, we have to reverse the stack order by popping all elements of s1 to p2, then pushing the new data into the stack. After that, we pop all the elements of s2 back to s1, and hence the new data becomes the bottom most element and the last to be removed as a result. By doing this, we preserve the LIFO preserve of queues. In this implementation, enqueuing takes O(n) time whild dequeuing takes O(1) time. 

We have established that implementing Queues using 2 stacks is much more costly than implementing a Queue itself. While it was a fun mental exercise, I think that the moral of the story is to use the most appropriate data structure to tackle problems and not reinvent the wheel.

In [156]:
class Queue_alt:
    def __init__(self, size):
        self.s1 = Stack(size)
        self.s2 = Stack(size)
        self.cache = 0 
        self.size = size
    
    def enqueue(self, data):
        """
        Costly enqueued that costs O(n)
        because element to be enqueued is 
        in Bottom of stack
        """
        # check for overflow
        self.cache += 1
        if self.cache > self.size:
            raise ValueError("Stack Overflow")
        
        while not self.s1.isEmpty():
            self.s2.push(self.s1.pop())
        
        self.s2.push(data)
        
        while not self.s2.isEmpty():
            self.s1.push(self.s2.pop())
            
        
    def dequeue(self):
        """
        Cheap dequeuing operation that costs
        O(1) because element to be dequeued
        is already at the top of the Stack
        """
        # check for underflow 
        self.cache -= 1
        if self.cache < 0:
            raise ValueError("Stack Underflow")
        
        item = self.s1.pop()
            
        return item  
    
    def display(self):
        self.s1.display()


In [157]:
# test for underflow - uncomment to run
# q = Queue_alt(1)
# q.dequeue()

In [158]:
# test for overflow - uncomment to run
# q = Queue_alt(1)
# q.enqueue(2)
# q.enqueue(3)

In [159]:
# 4 is added last
# 1 is added first, should be removed upon dequeue
# 2 is added second, should be removed upon second dequeue

q = Queue_alt(4)
q.enqueue(1)
q.enqueue(2)
q.enqueue(3)
q.enqueue(4)
q.display()
print("Dequeuing, ", q.dequeue())
print("Current queue: ")
q.display()
print("Dequeuing, ", q.dequeue())
print("Current queue: ")
q.display()

4
3
2
1
Dequeuing,  1
Current queue: 
4
3
2
Dequeuing,  2
Current queue: 
4
3


## Problem 4

Before beginning this problem, allow me to import the random library which will be used frequently in this question, together with the constant LARGE_PRIME which is very important for universal hashing. 

In [160]:
import random

LARGE_PRIME = 10888869450418352160768000001

In this question, we are required to implement an open has table that is under the SUHA and robust to adversarial attacks. 

Before going through these parts more in-depth, we know that the data of a hospital consists of its name, date of birth, IC and illness. With that in mind, we define a linked list and Node that takes in these attributes and perform some basic tests on it. The more in-depth functions will be defined in our universal Hash later on; this only serves as the rudimentary data structure for more complex functions. In this particular question, I feel that the more complex deletion and search functions shold be done on the Hash class because it worked out better for me.

In [161]:
class Node:
    def __init__(self, name = None, dob = None, id_no = None, illness = None, next = None):
        self.name = name
        self.dob = dob
        self.id_no = id_no
        self.illness = illness
        self.next = None
    
    def get_name(self):
        return self.name
    
class LinkedList:
    def __init__(self):
        self.head = None
        
    def insert(self, name, dob, id_no, illness):
        new = Node(name, dob, id_no, illness)
        new.next = self.head
        self.head = new
    
    def display(self):
        curr = self.head
        index = 0
        while curr:
            print("Index:", index, ", Name:", curr.name, ", DOB:", curr.dob, ", ID:", curr.id_no, ", Illness:", curr.illness)
            index += 1
            curr = curr.next


In [162]:
ll = LinkedList()
ll.insert("Bob", "23 May 1923", 1231243, "Tuberculosis")
ll.insert("Mary", "2 Jun 2020", 9238412, "Cancer")
ll.display()

Index: 0 , Name: Mary , DOB: 2 Jun 2020 , ID: 9238412 , Illness: Cancer
Index: 1 , Name: Bob , DOB: 23 May 1923 , ID: 1231243 , Illness: Tuberculosis


Next, we need to define a universal hash function that satisfies SUHA. This is done by defining a hash function using a very large prime number p as a proxy and making it adhere to this formula stated below:

$$ H = ((A*Key + B) \; mod \; p ) mod \; s$$
This is where:
* H is the Hashed value
* p is the very large prime number
* s is the number of slots in the universal hash
* B is a random number from 1 to p - 1 inclusive
* A is a random number from 0 to p - 1 inclusive
* Key refers to the Unique ID that will be hashed

The reason why the ID is chosen to be hashed is because it is unique, and our hashing function is deterministic. Hence, given the same ID key, we can find the value it is hashed to with definite certain because the hashing equation is fixed upon the initialization of the universal hash. This explains why there is a utility function **find_key** defined. 

In the initialization of UniversalHash as a class, you can observe that a, b, s, p all relate to the values I've put above. When initializing it, please the **LARGE_PRIME** in the prime parameter and input any number you deem fit to be the number of slots (more than 0, please...). This should take in name **(String)**, **date of birth(String)**, **Identity number(Integer)** and **Illness(String)**

For the search and delete functions, defining it using ID is very easy because it is unique, and we can guarantee that the correct node will be deleted or search in O(keys/slots) time, which is very efficient. The Count_SUHA function will be discussed later on. However, for searching the names, illness and date of birth, it is not so trivial because these values are non-unique: more than 1 patient could have the same birthday, name or illness. Hence, it is not practical to return just one possible ID, but all the possible IDs that correspond to these values. This is why i defined these search functions as ones that return a list of possible ID values and not 1.

In [163]:
class UniversalHash:
    def __init__(self, prime, slots):
        self.a = random.randint(0, prime - 1)
        self.b = random.randint(1, prime - 1)
        self.s = slots
        self.p = prime
        self.info = {i: LinkedList() for i in range(self.s)}
        
    def find_key(self, id_no):
        """
        function that converts the id_no to the corresponding
        hash index in the universal hash.
        """
        return ((self.a * id_no + self.b) % self.p) % self.s
    
    def display_hash(self, id_no):
        """
        given an id number, it displays 
        the linked list associated with 
        its hash 
        """
        key = self.find_key(id_no)
        self.info[key].display()
    
    def add(self, name, dob, id_no, illness):
        index = self.find_key(id_no)
        head = self.info[index]
        head.insert(name, dob, id_no, illness)
        
    def search_id(self, key):
        """
        Given an id number, it searches for whether the ID is found in the associated hash.
        """
        index = self.find_key(key)
        cur = self.info[index].head
        while curr:
            if curr.id_no == key:
                print("Key found. returning Pointer")
                return curr
            curr =  curr.next
        print("Match not found. Returning None")
        return curr
    
    def delete_id(self, key):
        """
        Given an ID_no, it locates
        the hash it will be mapped 
        to and checks whether said 
        element exists. If yes, 
        then deletion is done. 
        """
        index = self.find_key(key)
        curr = self.info[index].head
        prev = None
        while curr and curr.id_no != key:
            prev = curr
            curr = curr.next
        
        if curr is None:
            return None
        
        else:
            remove = curr
            if prev is None:
                self.info[index].head = self.info[index].head.next
            else:
                prev.next = prev.next.next
            return remove
           
        
        
    
    def count_SUHA(self):
        """
        Utility function to see
        whether Hash table meets
        SUHA by counting the number
        of nodes in each slot and
        returning as a dictionary
        """
        out = {}
        for i in range(self.s):
            curr= self.info[i].head
            count = 0 
            while curr:
                count += 1
                curr = curr.next
            out[i] = count
        return out
    
    def search_name(self, key):
        """
        Given a key name, search the
        entire array for all possible
        IDs corresponding to this DOB
        """
        out = []
        for i in range(self.s):
            curr = self.info[i].head
            while curr:
                if curr.name == key:
                    out.append(curr.id_no)
                curr = curr.next
        return out
    
    
    def search_dob(self, key):
        """
        Given a key DOB, search the
        entire array for all possible
        IDs corresponding to this DOB
        """
        out = []
        for i in range(self.s):
            curr = self.info[i].head
            while curr:
                if curr.dob == key:
                    out.append(curr.id_no)
                curr = curr.next
        return out
    
    def search_illness(self, key):
        """
        Given a key illness, search the
        entire array for all possible IDs
        corresponding to this illness
        """
        out = []
        for i in range(self.s):
            curr = self.info[i].head
            while curr:
                if curr.illness == key:
                    out.append(curr.id_no)
                curr = curr.next
        return out

In order to test for SUHA, we perform a rather extreme test on the hash function: given 1000000 keys, we assign each of them to a random ID from 0 to 10^8. Afterwards, we check whether this number was used (If it is, we restart since ID is unique). If not, we add this ID to our hash function. After that, we check the number of nodes/ entries in each slot using **count_SUHA()** and check whether they are roughly equal to **trials/slots** (which is printed near the end). If this holds, we know that the SUHA assumption is fulfilled. Run the function below. As you can see, the exact ratio is very close to the number of nodes in each of the slots. Therefore, SUHA is fulfilled and resistant to adversial attacks against efficiency (as seen from the wide range of random numbers generated). To run the **test_SUHA** function, just run the below line and modify the number of slots as you deem fit. 

In [164]:
def test_SUHA(slots):
    p = UniversalHash(LARGE_PRIME, slots)
    limit = 10 ** 8
    keys = 10 ** 6
    trials = keys
    used_id = set()
    while trials >= 0:
        random_number = random.randint(0, limit)
        if random_number in used_id:
            continue
        else:
            p.add("dummy_name", "dummy_dob", random_number, "dummy_illness")
            used_id.add(random_number)
            trials -= 1

    test = p.count_SUHA()
    print(test)
    print("Exact ratio of keys to slots is: ", keys/slots)
    return

test_SUHA(50)

{0: 19978, 1: 20159, 2: 20044, 3: 20011, 4: 19983, 5: 20190, 6: 20037, 7: 20145, 8: 19819, 9: 19893, 10: 19711, 11: 20113, 12: 19891, 13: 19967, 14: 19985, 15: 20072, 16: 20006, 17: 20250, 18: 20202, 19: 19829, 20: 20149, 21: 20084, 22: 19984, 23: 19983, 24: 19958, 25: 19935, 26: 19913, 27: 19941, 28: 19588, 29: 19965, 30: 20185, 31: 19827, 32: 19876, 33: 19987, 34: 19977, 35: 20009, 36: 19930, 37: 20149, 38: 20107, 39: 20176, 40: 19906, 41: 19898, 42: 20109, 43: 19933, 44: 19902, 45: 20110, 46: 20108, 47: 20080, 48: 19943, 49: 20004}
Exact ratio of keys to slots is:  20000.0


The next few cells detail tests relating to the auxillary functions of my Universal Hash function. With these in mind, my functions have passed preliminary tests to become actually usable functions.

To use the search functions, simply call the related search function and search the desired value. Please use int for id_no and str for the rest of the data types and ensure that your spelling is correct for the strings. 

In [165]:
p = UniversalHash(LARGE_PRIME, 20)
p.add("Bob", "25 May 1998", 2125123, "Fever")
p.add("Mary", "28 Feb 1992", 1235124, "Tuberculosis")

print("\nTesting display_hash...")
p.display_hash(2125123)

print("\nTesting Delete...")
p.delete_id(2125123)
p.display_hash(2125123)

print("\nTesting search_dob...")
p.add("Carry", "25 May 1998", 7842355, "AIDS")
p.add("Rachel", "25 May 1998", 9826312, "Parkinsons'")
print(p.search_dob("25 May 1998"))

print("\nTesting search_illness...")
p.add("Megan", "31 Dec 1978", 2246713, "Hay Fever")
p.add("Sandra", "13 Mar 1987", 3451623, "Hay Fever")
p.add("Achsah", "4 Feb 1990", 2982641, "Hay Fever")
print(p.search_illness("Hay Fever"))

print("\nTesting search_name...")
p.add("John", "31 Jul 1991", 5628395, "Constipation")
p.add("John", "31 Dec 1983", 8264912, "Depression")
p.add("John", "23 May 1964", 1092752, "Workplace Injury")
print(p.search_name("John"))


Testing display_hash...
Index: 0 , Name: Bob , DOB: 25 May 1998 , ID: 2125123 , Illness: Fever

Testing Delete...

Testing search_dob...
[7842355, 9826312]

Testing search_illness...
[3451623, 2246713, 2982641]

Testing search_name...
[5628395, 1092752, 8264912]
