# Set Membership

The cell below defines two **abstract classes**: the first represents a set and basic insert/search operations on it. You will need to impement this API four times, to implement (1) sequential search, (2) binary search tree, (3) balanced search tree, and (4) bloom filter. The second defines the synthetic data generator you will need to implement as part of your experimental framework. <br><br>**Do NOT modify the next cell** - use the dedicated cells further below for your implementation instead. <br>

In [4]:
# DO NOT MODIFY THIS CELL

from abc import ABC, abstractmethod  

# abstract class to represent a set and its insert/search operations
class AbstractSet(ABC):
    
    # constructor
    @abstractmethod
    def __init__(self):
        pass           
        
    # inserts "element" in the set
    # returns "True" after successful insertion, "False" if the element is already in the set
    # element : str
    # inserted : bool
    @abstractmethod
    def insertElement(self, element):     
        inserted = False
        return inserted   
    
    # checks whether "element" is in the set
    # returns "True" if it is, "False" otherwise
    # element : str
    # found : bool
    @abstractmethod
    def searchElement(self, element):
        found = False
        return found    
    
    
    
# abstract class to represent a synthetic data generator
class AbstractTestDataGenerator(ABC):
    
    # constructor
    @abstractmethod
    def __init__(self):
        pass           
        
    # creates and returns a list of length "size" of strings
    # size : int
    # data : list<str>
    @abstractmethod
    def generateData(self, size):     
        data = [""]*size
        return data   


Use the cell below to define any auxiliary data structure and python function you may need. Leave the implementation of the main API to the next code cells instead.

In [5]:
# ADD AUXILIARY DATA STRUCTURE DEFINITIONS AND HELPER CODE HERE



Use the cell below to implement the requested API by means of **sequential search**.

In [6]:
class SequentialSearchSet(AbstractSet):
    
    def __init__(self):
        # ADD YOUR CODE HERE
        self.val = []
     
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
        if element not in self.val:
            self.val.append(element)
            inserted = True
        return inserted

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE
        if element in self.val:
            found = True
        return found    

In [7]:
# Delete this cell before submission.
def sequentialCorrectnessTest():
    test = SequentialSearchSet()
    assert test.insertElement("a") == True
    assert test.insertElement("b") == True
    assert test.insertElement("c") == True
    assert test.insertElement("a") == False
    assert test.insertElement("b") == False
    assert test.insertElement("c") == False
    assert test.searchElement("a") == True
    assert test.searchElement("b") == True
    assert test.searchElement("c") == True
    assert test.searchElement("d") == False
    assert test.searchElement("e") == False
    assert test.searchElement("f") == False
    print("✅ Sequential search correctness test passed")

sequentialCorrectnessTest()

✅ Sequential search correctness test passed


Use the cell below to implement the requested API by means of **binary search tree**.

In [8]:
class BinarySearchTreeSet(AbstractSet):
    
    def __init__(self, val, left=None, right=None):
        # ADD YOUR CODE HERE
        self.val = val
        self.left = left
        self.right = right
        pass           
     
    
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
        if self.val == None: # if the tree is empty
            self.val = element
            inserted = True
            return inserted
        
        # traverse the tree
        thisNode = self
        while thisNode:
            if element < thisNode.val:
                if thisNode.left:
                    thisNode = thisNode.left
                else:
                    thisNode.left = BinarySearchTreeSet(element)
                    inserted = True
                    break
            elif element > thisNode.val:
                if thisNode.right:
                    thisNode = thisNode.right
                else:
                    thisNode.right = BinarySearchTreeSet(element)
                    inserted = True
                    break
            else:
                break
        return inserted
    
    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE
        thisNode = self
        while thisNode:
            if element < thisNode.val:
                thisNode = thisNode.left
            elif element > thisNode.val:
                thisNode = thisNode.right
            else:
                found = True
                break
        
        return found    

In [9]:
# Delete this cell before submission.
def binaryTreeCorrectnessTest():
    test = BinarySearchTreeSet(None)
    assert test.insertElement("a") == True
    assert test.insertElement("b") == True
    assert test.insertElement("c") == True
    assert test.insertElement("a") == False
    assert test.insertElement("b") == False
    assert test.insertElement("c") == False
    assert test.searchElement("a") == True
    assert test.searchElement("b") == True
    assert test.searchElement("c") == True
    assert test.searchElement("d") == False
    assert test.searchElement("e") == False
    assert test.searchElement("f") == False
    print("✅ Binary tree correctness test passed")

binaryTreeCorrectnessTest()

✅ Binary tree correctness test passed


Use the cell below to implement the requested API by means of **balanced search tree**.

In [16]:
class ColourNode:
    def __init__(self, key):
        self.key = key
        self.parent = None
        self.left = None
        self.right = None
        self.is_red = True


# left leaning red-black binary search tree
class BalancedSearchTreeSet(AbstractSet):
    def __init__(self):
        self.root = None

    def insertElement(self, key):
        if self.root is None:
            self.root = ColourNode(key)
            self.root.is_red = False
            return True

        # standard traversal adding nodes to stack so we can backtrack balancing the tree
        node = self.root
        new_node = ColourNode(key)
        while True:
            if key == node.key:
                return False

            if key < node.key:
                if node.left is None:
                    node.left = new_node
                    new_node.parent = node
                    node = new_node
                    break
                node = node.left

            elif key > node.key:
                if node.right is None:
                    node.right = new_node
                    new_node.parent = node
                    node = new_node
                    break
                node = node.right

        # backtrack up the tree balancing it
        while node.parent is None:
            try:
                if node.left is None and node.right is None:
                    node = node.parent
                    continue

                if node.right.is_red and not node.left.is_red:
                    node = self._left_rotation(node)

                if node.left.is_red and node.right.is_red:
                    node = self._colour_flip(node)

                if node.left.left is None:
                    continue

                if node.left.is_red and node.left.left.is_red:
                    node = self._right_rotation(node)

                node = node.parent

            except Exception as e:
                print(e)
        return True

    def searchElement(self, key):
        if self.root is None:
            return False

        node = self.root
        while True:
            if node is None:
                return False

            if node.key == key:
                return True

            if key < node.key:
                node = node.left

            elif key > node.key:
                node = node.right

    def _left_rotation(self, node):
        next = node.right
        node.right = next.left
        next.left = node
        next.is_red = node.is_red
        node.is_red = True
        return next

    def _right_rotation(self, node):
        next = node.left
        node.left = next.right
        next.right = node
        next.is_red = node.is_red
        node.is_red = True
        return next

    def _colour_flip(self, node):
        node.left.is_red = False
        node.right.is_red = False
        node.is_red = True
        return node






In [17]:
def balancedTreeCorrectnessTest():
    test = BalancedSearchTreeSet()
    assert test.insertElement("a") == True
    assert test.insertElement("b") == True
    assert test.insertElement("c") == True
    assert test.insertElement("a") == False
    assert test.insertElement("b") == False
    assert test.insertElement("c") == False
    assert test.searchElement("a") == True
    assert test.searchElement("b") == True
    assert test.searchElement("c") == True
    assert test.searchElement("d") == False
    assert test.searchElement("e") == False
    assert test.searchElement("f") == False
    print("✅ Balanced tree correctness test passed")

balancedTreeCorrectnessTest()

✅ Balanced tree correctness test passed


Use the cell below to implement the requested API by means of **bloom filter**.

In [None]:
class BloomFilterSet(AbstractSet):
    
    def __init__(self):
        # ADD YOUR CODE HERE

        
        pass           
     
    
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found    

Use the cell below to implement the **synthetic data generator** as part of your experimental framework.

In [None]:
import string
import random

class TestDataGenerator(AbstractTestDataGenerator):
    
    def __init__(self):
        # ADD YOUR CODE HERE
        
        
        pass           
        
    def generateData(self, size):     
        # ADD YOUR CODE HERE
        data = [""]*size
        

        return data   



Use the cells below for the python code needed to **fully evaluate your implementations**, first on real data and subsequently on synthetic data (i.e., read data from test files / generate synthetic one, instantiate each of the 4 set implementations in turn, then thorouhgly experiment with insert/search operations and measure their performance).

In [None]:
import timeit

# ADD YOUR TEST CODE HERE TO WORK ON REAL DATA





In [None]:
import timeit

# ADD YOUR TEST CODE HERE TO WORK ON SYNTHETIC DATA



