# A0149963M 
## YSC 2229 - Assignment 3 Solutions


### Problem 1

In this problem, we are required to implement the randomized solution of the vertex cover problem. The input of this function is a a graph consisting of nodes and edges, represented by an open hash table. Therefore, it becomes very sensible to implement a Node and Linked List before starting on this problem. Below shows my implementation of these 2 basic data structures:

In [281]:
class Node:
    def __init__(self, data = None, next = None):
        self.data = data
        self.next = None
    
    def get_name(self):
        return self.data
    
class LinkedList:
    def __init__(self):
        self.head = None
        
    def insert(self, data):
        new = Node(data)
        new.next = self.head
        self.head = new
    
    def display(self):
        head = self.head
        while head:
            print(head.data)
            head = head.next
    
    def search(self, x):
        curr = self.head
        while curr != None: 
            if curr.data == x:
                print("Match Found, Returning Pointer")
                return curr
            curr = curr.next
        print("Match not Found. Returning None")
        return curr
        
    
    def delete(self, x):
        temp = self.head 
        if (temp is not None):
            if (temp.data == x):
                self.head = temp.next
                temp = None
                return
            
        while(temp is not None):
            if temp.data == x:
                break
            prev = temp
            temp = temp.next
 
        if(temp == None):
            return
 
        prev.next = temp.next
        temp = None
        return


Next, we need to implement the Graph. The Graph is supposed to be an open hash table. However, we need to consider for the case for when the vertice number is very high (100000) yet only a few edges are filled. It would be a waste of memory space to simply generate empty linked lists for these edges. Hence, we will use an open hash to address this, where the keys are the slots. These slots then map to a linked list. Each node contains information as follows **(vertice, edge)** (e.g. (3, (3, 5)) is legitimate but (4, (8, 10)) is not because 4 is not a vertice in this edge.

In the graph, you can then insert edges as per normal: **(a, b)**. My algorithm would automatically parse this into a suitable data type for the linked list. With that in my mind, please ensure that a < b as far as possible, since that is how I defined my function. 

In addition, I have also defined a Search function that searches takes in an edge and searchs for the corresponding edge in the Graph. Since (4, (4, 7)) is the same as (7, (4, 7)), I've included provisions for both possibilities. However, if you expect this edge to appear but is not found, please input (7, 4) instead and check whether there is such an edge. 

Next, I have a display function that helps to visualize the linked list in each node. Run it without any arguments if you want to see what the graph looks like

Last but not least, I have the **generate** function, which helps to generate a random graph with inputs n numbers of vertices and a maximum of m number of edges. This function generates unique edges (a, b) where a < b, so you so not have to worry about repeated edges. Just input your desired number of vertices and (max) edges and run it.


The aforementioned functions are shown below:

In [300]:
import random    
    
class Graph:
    def __init__(self, slots):
        self.hash = {i: LinkedList() for i in range(slots)}
        self.slots = slots
    
    def insert(self, edge):
        slot = edge[0] % self.slots        
        self.hash[slot].insert((edge[0], edge))
    
    def search(self, edge):
        # find first possibility 
        slot = edge[0] % self.slots 
        curr = self.hash[slot].head
        while curr:
            if (edge[0], edge) == curr.data:
                print("Match found, returning pointer for", edge)
                return curr
            curr = curr.next
        
        # find second possibility 
        slot = edge[1] % self.slots 
        curr = self.hash[slot].head
        while curr:
            if (edge[1], edge) == curr.data:
                print("Alternative Match found, returning pointer for", edge)
                return curr
            curr = curr.next
        
        print("Match not found, returning None for ", edge)
        return
        
    def generate(self, n_vertice, n_edges_max):
        x = set()
        for i in range(n_edges_max):
            low = random.randint(0, n_vertice - 1)
            high = random.randint(low + 1, n_vertice)
            edge = (low, high)
            if edge in x:
                continue
            else:
                x.add(edge)
                self.insert(edge)
                
    def display(self):
        for i in range(self.slots):
            curr = self.hash[i].head
            print("\nPrinting Vertices and Edges for Slot Number", i)
            while curr:
                print(curr.data, " -> ", end = " ")
                curr = curr.next
            print("end", end = "")
                

These 2 code blocks demonstrate the capabilities of my function.

In [301]:
g = Graph(10)
g.insert((3, 4))
g.search((3, 4))
g.search((6, 8))

Match found, returning pointer for (3, 4)
Match not found, returning None for  (6, 8)


In [307]:
g = Graph(5)
g.generate(20, 30)
g.display()


Printing Vertices and Edges for Slot Number  0
(10, (10, 16))  ->  (0, (0, 9))  ->  (15, (15, 19))  ->  (5, (5, 20))  ->  (15, (15, 16))  ->  (15, (15, 17))  ->  end
Printing Vertices and Edges for Slot Number  1
(1, (1, 14))  ->  (1, (1, 19))  ->  (6, (6, 20))  ->  (6, (6, 11))  ->  (1, (1, 16))  ->  (16, (16, 17))  ->  end
Printing Vertices and Edges for Slot Number  2
(12, (12, 18))  ->  (17, (17, 20))  ->  (2, (2, 10))  ->  (2, (2, 20))  ->  (17, (17, 19))  ->  (12, (12, 17))  ->  end
Printing Vertices and Edges for Slot Number  3
(18, (18, 19))  ->  (8, (8, 13))  ->  (8, (8, 18))  ->  (18, (18, 20))  ->  end
Printing Vertices and Edges for Slot Number  4
(4, (4, 8))  ->  (9, (9, 17))  ->  (19, (19, 20))  ->  (14, (14, 16))  ->  end

In this section, I will cover the implementation of the Approximate Vertex Cover. Firstly, I 

In [327]:
import random

def approx_vertex_cover(graph):
    out = []
    edgeset = set()
    
    # set of all edges
    for i in range(graph.slots):
        curr = graph.hash[i].head
        while curr:
            # append edges, not vertices
            edgeset.add(curr.data[1])
            curr = curr.next
    
    while (edgeset != set()):
        (x_rand, y_rand) = random.choice(tuple(edgeset))
        out.append((x_rand, y_rand))
        remove = []
        for (a, b) in edgeset:
            if a == x_rand or b == x_rand or a == y_rand or b == y_rand:
                remove.append((a, b))
        
        for edge in remove:
            edgeset.remove(edge)
    
    print("Length of approx vertex cover is:", len(out))
    return out
    
approx_vertex_cover(g)

Length of approx vertex cover is: 8


[(6, 20), (1, 14), (15, 16), (2, 10), (12, 17), (4, 8), (0, 9), (18, 19)]

### Problem 2

In this problem, we are required to perform radix sort on letters instead of numbers. For the most part, the implementation is very similar to what we did with numbers. However, there a few modifications to the code. 

In terms of Counting Sort for letters, there are a few differences. Firstly, instead of 10 possible digits (0- 9), we have 26 different possible characters **(small 'a' to small 'z')**. Since, our count array has 26 entries instead of 10. Furthermore, since count sort works on numbers, we need to convert our small letters to meaningful numbers somehow. This is done through **ord()**, which accepts a string of length 1 and returns the unicode code point representation. Hence,
* ord(a) returns 97
* ord(b) returns 98
* ...
* ord(y) returns 121
* ord(z) returns 122
From this, we know that the unicodes for small letters are consecutive. However, it starts from 97 and ends at 122. Since the counting array should be as small as possible to ensure that count sort works on O(n) time, we should make the count array a size of 26, not 122. This is done by shifting these unicode indexs by -97, which is defined in line 4 of the function count_sort_letters. Other than that, the main function does not change.

In order to use this function, please input an array of strings that is to be sorted. Please note that this algorithm only sorts small letter characters. Correct examples include
* ["bob", "danvy", "unittests", "mathematics", "jewel", "gem"]
* ["nancy", "dave", "zhao", "give", "job", "please"]

These are incorrect examples:
* "bob", "danvy", "unittests", "mathematics", "jewel", "gem"
* "nancy", "dave", "zhao", "give", "job", "please"
* ["2134", "AQUINAS", "Liberal", "Wine"]

The first 2 examples are incorrect because it is not instantiated inside a list. The last one is incorrect because it contains forbidden characters (numbers, capital letters, etc).

In [47]:
def count_sort_letters(array, size, col, max_len):
    out   = [0] * size 
    count    = [0] * 26
    shift = ord('a') 

    for item in array: 
        if col < len(item):
            letter = ord(item[col]) - shift
        else: 
            letter = 0
        count[letter] += 1 

    for i in range(len(count) - 1):   
        count[i + 1] += count[i] 

    for item in array:
        if col < len(item):
            letter = ord(item[col]) - shift
        else:
            letter = 0
        out[len(out) - count[letter]] = item
        count[letter] -= 1
    return out

def radix_sort_letters(array):
    max_col = len(max(array, key = len)) 
    for col in reversed(range(max_col)): 
        array = count_sort_letters(array, len(array), col, max_col)
    return array

print(radix_sort_letters(["dog", "cat", "rain", "umbrella", "bob", "digit", "zeta"]))

['zeta', 'umbrella', 'rain', 'dog', 'digit', 'cat', 'bob']


### Problem 3

This question prompts us to implement a queue using 2 stacks. While this is possible, we will realize that we have to pay a cost in terms of runtime later on. Before beginning, let's recap on the nature of stacks and queues:
* Stacks - FIFO. The first element pushed is the first element popped
* Queue - LIFO. The last element enqueued becomes the first element to be dequeued

With such a large difference in terms of FIFO and LIFO, to implement a stack using a queue, the second stack will become an auxillary stack that helps fulfill the requirements of a queue. 

In this implementation of the queue, I felt that it didn't really make sense to include a head and tail since the stack is handling all data manipulation, so I removed it. To begin with the implementation, I 

In [50]:
class Stack:
    def __init__(self, size):
        self.items = [None] * size
        self.top = 0
        self.size = size
    
    def isEmpty(self):
        return self.top == 0

    def push(self, data):
        if self.top >= self.size:
            raise ValueError("Stack Overflow")
        self.items[self.top] = data
        self.top += 1
    
    def pop(self):
        if self.top <= 0:
            raise ValueError("Stack Underflow")
        self.top -= 1
        return self.items[self.top]
    
    def display(self):
        """
        Prints stack from bottom to top
        """
        temp = self.top
        for i in range(0 ,temp):
            print(self.items[i])

class Queue:
    """
    This one makes dequeuing costly
    """
    
    def __init__(self, size):
        self.s1 = Stack(size)
        self.s2 = Stack(size)
        self.cache = 0 
        self.size = size
    
    def enqueue(self, data):
        self.cache += 1
        if self.cache > self.size:
            raise ValueError("Stack Overflow")
        self.s1.push(data)
    
    def dequeue(self):
        self.cache -= 1
        if self.cache < 0:
            raise ValueError("Stack Underflow")
        
        while not self.s1.isEmpty():
            self.s2.push(self.s1.pop())
        
        item = self.s2.pop()
        
        while not self.s2.isEmpty():
            self.s1.push(self.s2.pop())
            
        return item  
    
    def display(self):
        for i in range(self.cache):
            print (self.s1[i])

Instead of making dequeue costly, we could choose to make enqueue the costly operation. In terms of time complexity,
* cost of dequeuing = O(1)
* cost of enqueuing = O(n)

## Problem 4

In [16]:
import matplotlib
import random

LARGE_PRIME = 10888869450418352160768000001

What are the types of values we have here?

In [None]:
class Node:
    def __init__(self, name = None, dob = None, id_no = None, illness = None, next = None):
        self.name = name
        self.dob = dob
        self.id_no = id_no
        self.illness = illness
        self.next = None
    
    def get_name(self):
        return self.name
    
class LinkedList:
    def __init__(self):
        self.head = None
        
    def insert(self, name, dob, id_no, illness):
        new = Node(name, dob, id_no, illness)
        new.next = self.head
        self.head = new
    
    def display(self):
        curr = self.head
        index = 0
        while curr:
            print("Index:", index, ", Name:", curr.name, ", DOB:", curr.dob, ", ID:", curr.id_no, ", Illness:", curr.illness)
            index += 1
            curr = curr.next

            
class PerfectHash:
    def __init__(self, prime, slots):
        self.a = random.randint(0, prime - 1)
        self.b = random.randint(1, prime - 1)
        self.s = slots
        self.p = prime
        self.info = {i: LinkedList() for i in range(self.s)}
        
    def find_key(self, id_no):
        return ((self.a * id_no + self.b) % self.p) % self.s
    
    def add(self, name, dob, id_no, illness):
        index = self.find_key(id_no)
        head = self.info[index]
        head.insert(name, dob, id_no, illness)
        
    def search_id(self, key):
        index = self.find_key(key)
        cur = self.info[index].head
        while curr:
            if curr.id_no == key:
                print("Key found. returning Pointer")
                return curr
            curr =  curr.next
        print("Matc
              

In [279]:
class Node:
    def __init__(self, name = None, dob = None, id_no = None, illness = None, next = None):
        self.name = name
        self.dob = dob
        self.id_no = id_no
        self.illness = illness
        self.next = None
    
    def get_name(self):
        return self.name
    
class LinkedList:
    def __init__(self):
        self.head = None
        
    def insert(self, name, dob, id_no, illness):
        new = Node(name, dob, id_no, illness)
        new.next = self.head
        self.head = new
    
    def display(self):
        curr = self.head
        index = 0
        while curr:
            print("Index:", index, ", Name:", curr.name, ", DOB:", curr.dob, ", ID:", curr.id_no, ", Illness:", curr.illness)
            index += 1
            curr = curr.next

            
class PerfectHash:
    def __init__(self, prime, slots):
        self.a = random.randint(0, prime - 1)
        self.b = random.randint(1, prime - 1)
        self.s = slots
        self.p = prime
        self.info = {i: LinkedList() for i in range(self.s)}
        
    def find_key(self, id_no):
        return ((self.a * id_no + self.b) % self.p) % self.s
    
    def add(self, name, dob, id_no, illness):
        index = self.find_key(id_no)
        head = self.info[index]
        head.insert(name, dob, id_no, illness)
        
    def search_id(self, key):
        index = self.find_key(key)
        cur = self.info[index].head
        while curr:
            if curr.id_no == key:
                print("Key found. returning Pointer")
                return curr
            curr =  curr.next
        print("Match not found. Returning None")
        return curr
    
    def delete_id(self, key):
        index = self.find_key(key)
        curr = self.info[index].head
        if curr: 
            if temp.id_no == key:
                curr.head = temp.next
                curr = None
                return
        
        while curr:
            if curr.id_no == key:
                prev.next = curr.next
                temp = None
                return
            prev = curr
            curr = prev.next
            
        if (curr == None):
            return
    
    def count_SUHA(self):
        out = {}
        for i in range(self.s):
            curr= self.info[i].head
            count = 0 
            while curr:
                count += 1
                curr = curr.next
            out[i] = count
        return out
            
    

In [280]:
p = PerfectHash(LARGE_PRIME, 50)
for i in range(10000, 90000):
    p.add("dummy_name", "dummy_dob", i, "dummy_illness")
    
test = p.count_SUHA()
print(test)

{0: 1599, 1: 1601, 2: 1600, 3: 1598, 4: 1602, 5: 1599, 6: 1601, 7: 1600, 8: 1598, 9: 1603, 10: 1598, 11: 1601, 12: 1599, 13: 1600, 14: 1602, 15: 1598, 16: 1602, 17: 1599, 18: 1600, 19: 1601, 20: 1599, 21: 1601, 22: 1599, 23: 1600, 24: 1600, 25: 1599, 26: 1601, 27: 1599, 28: 1599, 29: 1602, 30: 1599, 31: 1601, 32: 1599, 33: 1599, 34: 1602, 35: 1599, 36: 1601, 37: 1599, 38: 1599, 39: 1602, 40: 1599, 41: 1602, 42: 1599, 43: 1599, 44: 1601, 45: 1600, 46: 1601, 47: 1600, 48: 1598, 49: 1601}
