# Hash Tables
When the size of the **universe *U*** is enormous, a direct-address table ***T*** of size $|U|$ becomes impractical: the set ***K*** of keys actually stored may be so small relative to ***U***, most of space allocated for ***T*** would be wasted. Therefore we will need the help of a **hash table**. When the set ***K*** of keys << **universe *U*** of all possible keys, a hash table requires much less storage.
## Hash function
In a hash table, each element is stored in slot $h(k)$. Here, the **hash function** $h$ is used to compute the slot from the key $k$.
* $h$ maps the universe $U$ of keys into the slots of a hash table $T$ with size $m$:<br>
$h:U \rightarrow {0,1,...,m-1}$, where $m<<|U|$
* It thus reduces the range of array indices and size of the array
* We say that an element with key $k$ **hashes** to slot $h(k)$, or $h(k)$ as the **hash value** of key $k$. 

Comparison between a **hash table** and a **direct-address table**:

||**hash table**|**direct-address table**|
|:-|:--------------:|:------------------------:|
|Storage requirement|$\Theta(\|K\|)$|$\|U\|$|
|Search time| $O(1)$ time for *average-case*|$O(1)$ time for *worst-case*|
|Location of an element with key $k$|slot $h(k)$|slot $k$|


<br>Figure 11.2 shows how a hash table is constructed:
<img src="img/fig11.2.png" width="700"/>
## Collision
When two keys hash to the same slot, **collision** takes place. Becuase $|U|>m$, there must be at least two keys that have the same hash value: the collision is unaviodable. Nonetheless, we still need to minimise the number of collision. There are several ways:
1. Choosing a **suitable hash function**:
    1. $h$ needs to appear **random**
    2. $h$ needs to be **deterministic**: a given input $k$ always gives the same $h(k)$
2. By **chaining**
3. By **open addressing**

## Collision resolution by chaining
In the hash table $T$, we place all the elements that hash to the same slot (i.e. they have the same value of $h(k)$) into a **linked list** $L$.
* Slot $j$ in $T$ contains a pointer to the head of $L$ 
* If no elements hash to $L$, slot $j$ is `None`

Figure 11.3 shows the process of chaining:
<img src="img/fig11.3.png" width="700"/>
### Insert (T,x)
Insert $x$ at the head of list `T[h(x.key)]`
* Running time is O(1)

### Search (T,k)
Search for an element with key $k$ in list `T[h(k)]`
* *Worst-case* running time $\propto$ the length of `T[h(k)]` 

### Delete (T,x)
Delete $x$ from the list `T[h(x.key)]`
* Running time is $O(1)$ if `T[h(k)]` is **doubly linked**
* Running time is asymptotically the same as Search (T,k) if `T[h(k)]` is **singly linked**

### In Python
We can build a chained hash table with the use of `class DoublyLinkedList` from *10.1_Linked_list.ipynb*. It is very similar to *Exc 11.1-3*, only that we need to hash the key $k$ to slot $h(k)$ in Table $T$. Assume that we have a set of keys $K$, where $ K\in \mathbb{N}$. Each element `x` has a pointer `x.key`to keys in $K$ and a pointer `x.data` to its satellite data.
1. Define a hash function $h$. In the following codes: $h(p)=p mod m$, where m is length of table $T$.
2. $h$ will allows us to map $K$ keys to $m$ slots: $h:K \rightarrow {0,1,...,m-1}$, where $m<<|K|$
3. We can define each slot `T(k)` in the table such that:
* It is a pointer to a **doubly linked list** `L` containing all the keys with the same hash value
    * We only need to modify `class Node` slightly by adding another attribute `data` that points to satellite data. 
    * If the slot is empty, `L.head is None`
* Special attention to **DELETE**:
    * To run in $O(1)$, we can only assign node in the list `L` to be deleted, such as `L.head` or `L.head.next`
    * If one wishes to delete a node with a given key value, a while loop is needed (running time $\propto$ length of list `L`)


In [43]:
class Node:
    def __init__(self,key,data=None):
        self.key=key
        self.data=data
        self.next=None #pointer attribute next
        self.prev=None #pointer attribute prev
        return

class DoublyLinkedList:
    def __init__(self):# create an empty list without node
        
        self.head=None #initial head=None
        return
    def list_insert(self,x):

        """ verify if the object x has a node-like structure 
        with attributes key, next and prev.
        If not, build it by by calling class Node"""
        if not isinstance(x,Node):
            new_node=Node(x)
        else:
            new_node=x

        """1.insert node before head.
        2. assign pointer next and prev btw head and new_node
        3. set new_node as head"""
        new_node.next=self.head
        if self.head is not None:
            self.head.prev=new_node

        self.head=new_node
        new_node.prev=None
        return
    def list_search(self,key):

        current_node=self.head

        while current_node is not None and current_node.key!=key:
            current_node=current_node.next
        if current_node is None:
            print ('item not in list')
        else:
            print (current_node.key, current_node.data)
            return current_node, current_node.key
    
    def list_delete(self,node):
        """ update the pointer next"""
        if node.prev is not None: # if node is not at the head
            node.prev.next=node.next
        else:
            self.head=node.next #if node is at the head
        """ update the pointer prev"""
        if node.next is not None:
            node.next.prev=node.next
        return
    
    def list_delete_by_key(self, key): 
        """ 1. search key in the list"""
        current_node=self.list_search(key)[0]
        
        """ 2. the rest is similar to list_delete"""
        if current_node.prev is not None:
            
            current_node.prev.next=current_node.next
        else:
            self.head=current_node.next
        if current_node.next is not None:
            current_node.next.prev=current_node.prev
                
        return


class HashTable:
    
    def __init__(self,m):
        self.m=m # no. hashable slots, m
        self.array=[None]*m # array T, with m hashable slots
    
    
    def h(self,key): # define hash function by modulate of array length
        return key%self.m
    
    def insert(self,x):
       # print (x.key)
       
        idx=self.h(x.key) #compute h(k)
        
        """if the slot is empty, create a doubly linked list
        insert x at the head of the list"""
        
        if self.array[idx] is None:
            self.array[idx]=DoublyLinkedList()
        self.array[idx].list_insert(x)
            
    
    def search(self,k):
        idx=self.h(k)
        
        self.array[idx].list_search(k)
        
    def delete(self, x):

        idx=self.h(x.key)
        if self.array[idx].head is None: # if slot is empty
            print ('item note found in table')
        else:
            self.array[idx].list_delete(x)

ht=HashTable(m=10)
s=((1,2),(21,'apple'),(31,None),(5,6.9))
for item in s:
    ht.insert(Node(item[0],item[1]))

ht.delete(ht.array[ht.h(1)].head)
ht.search(1)


item not in list


### Explanation:
* A chained hash table with 10 empty slots was created
* `(1,2),(21,'apple'),(31,None)` hash to the same key as $h(1)=h(21)=h(31)=1$
* `ht.delete(ht.array[ht.h(1)].head)` first goes to slot $h(1)$, then deletes the head of list `L` in slot $h(1)$
* Because the doubly linked list `L` was designed such that a new item was added before `L.head`, the last added object `(31,None)` will be removed
* `ht.search(31)` returns "item not in list"