# Hashmap and Hash table notes

## Dictionary
- A dictionary is a list of **keys-value** pairs, with keys as indices
- Python dictionaries are implemented using hash tables

## Hash Table
- Data structure to store data at a **specific** location based on its **key**

## Hashing
- Mapping of a key to its slot position (hashvalue), that is, there is a function $f$ such that $f(key) = hashvalue$
- Example:
    - $f(n)$ - remainder of $n / m$, where $m$ is the sequence size
    - $f(22)$ = `22 % 11 = 0`
    - $f(7)$ = `7 % 11 = 7`

## HashTable / HashMap
- Data structure for storing **key <-> value** pairs
- Key and value pairs are stored in the same slot positions in the Hash Table(s)
- Example:
    - 22 <-> 'Srini', 90 <-> 'Zach', 16 <-> 'Ellen', 7 <-> 'Vanessa', 42 <-> 'Yang

### HashTable
- How can we store string/text data as keys in Hash Tables?
- A common hash function for strings: `(sum of ord(char)) % location_size`
    - `'cat' -> (ord('c') + ord('a') + ord('t')) % 11 = 4`
    - `'dog' -> (ord('d') + ord('o') + ord('g')) % 11 = 6`
## Collision Handling
- **Collision**: different data values result in the same hash value (slot position)
- Linear Probing:
    - Sequentially look for next available slot position
- Chaining
    - Add items to the collection referenced at a position
    - collection can be a list or linked list of items

## Average Time Complexity Comparison
Method | Python List | BST | HashTable
----- | ----- | ----- | -----
Search | $O(n)$ | $O(log(n))$ | $O(1)$
Insertion | $O(1)$ | $O(log(n))$ | $O(1)$
Deletion (by value) | $O(n)$ | $O(log(n))$ | $O(1)$ 

- **Load Factor:** $Î» = $ number of items / m, where m = # of slots

## Implementation of Linear Probing HashMap

In [7]:
class HashMap:
    def __init__(self, size=11):
        self.size = size
        self.keys = [None] * self.size #strings
        self.values = [None] * self.size #numerical values to go along with

    def __str__(self):
        s = ""
        for slot, key in enumerate(self.keys):
            value = self.values[slot]
            s += str(key) + ":" + str(value) + ", "
        return s
    
    def __len__(self):
        count = 0
        for item in self.keys:
            if item is not None:
                count += 1
            return count

    def __getitem__(self, key):
        return self.get(key)

    def __setitem__(self, key, data):
        self.insert(key, data)

    def __delitem__(self, key):
        self.remove(key)

    def __contains__(self, key):
        return self.get(key) != -1

    def hash_function(self, item):
        #return item % self.size
        key = 0
        for x in item:
            key += ord(x)
        return key % self.size
    
    def rehash(self, oldhash):
        return (oldhash + 1) % self.size

    def insert(self, key, value):
        hashvalue = self.hash_function(key)
        slot_placed = -1
        if self.keys[hashvalue] == None or self.keys[hashvalue] == key:
            self.keys[hashvalue] = key
            slot_placed = hashvalue
        else:
            nextslot = self.rehash(hashvalue)
            while self.keys[nextslot] != None and self.keys[nextslot] != key:
                nextslot = self.rehash(nextslot)
                if nextslot == hashvalue:
                    return slot_placed
            self.keys[nextslot] = key
            slot_placed = nextslot
        
        if slot_placed != -1:
            self.values[slot_placed] = value
        return slot_placed

    def get(self, key):
        startslot = self.hash_function(key)

        stop = False
        found = False 
        position = startslot

        while not found and not stop:
            if self.keys[position] == key:
                found = True
            else:
                position = self.rehash(position)
                if position == startslot:
                    stop = True 

        if found:
            return self.values[position]
        else:
            return -1

    def remove(self, key):
        startslot = self.hash_function(key)

        stop = False
        found = False 
        position = startslot
        while not found and not stop:
            if self.keys[position] == key:
                found = True 
            else:
                position = self.rehash(position)
                if position == startslot:
                    stop = True 

        if found:
            self.keys[position] = None 
            self.values[position] = None 
            return position 
        else:
            return -1    


In [10]:
h = HashMap()
h.insert("intro", 121)
h.insert("data structures", 122)
h.insert("data analytics", 215)
print(h)

None:None, None:None, data analytics:215, None:None, data structures:122, None:None, intro:121, None:None, None:None, None:None, None:None, 
