# 11.4 Open addressing
Open addressing is a way to resolve collision, other than a chained list.
* Each slot contains **either an element of the dynamic set or `None`** (the load factor $\alpha$ never exceeds $1$)
* **Pointers are avoided** so that memory can be freed to store more elements
* Instead of following pointers, we **compute the sequence of slots** to be examined (=probed)

## Insertion
We succesively examine (probe) the hash table until we find an empty slot in which to put the key.
* We do NOT probe in sequencial order $0,1,2,...,m-1$
* The **sequence of slots** to be probed depends upon the **key being inserted** 
* The **sequence of slots** can be calculated with a **hash function $h$**, with the **probe number $\{0,1,2,...,m-1\}$** as a **second input** (the first input is the key of the element)
$$
\begin{align}
h:U\times \{0,1,2,...,m-1\} \rightarrow\{0,1,2,...,m-1\}
\end{align}
$$

We require that for every key $k$, its probe sequence $\langle h(k,0), h(k,1),..., h(k,m-1)\rangle$ be a **permutation** of $\langle 0, 1,..., m-1\rangle$, so that every slot can be considered.

## Search
We probe the **same sequence of slots** to search a key $k$ as we would insert it. Therefore:
* When the search finds an empty slot, it means that the search is unsccessful
* The following slots to be probed are all empty, as a key $k$ can only be inserted sequentially

## Deletion
Why the deletion is difficult in open addressing? If we delted a key from slot $i$ and marke it as `None`, we might not be able to retrieve any key $k$ during **whose insertion we had probe slot $i$ and found it occupied**. As shown below in *Figure 11.4*:
* For key $k$ with probe sequence $\langle h(k,0), h(k,1),h(k,2)...\rangle$,
* If we delete the element $k_1$ that occupies slot $h(k,1)$ by replacing it with `None`,
* `hash_search` cannot retrieve element $k_2$, or any elements $k_3, k_4,...$ that were once inserted after probing the slot $h(k,1)$.
<img src="img/fig11.4_1.png" width="500">

A solution is to mark the deleted slot as `deleted` instead of `None`, so that the slot can be:
1. passed in `hash_search`
2. regarded as `None` in `hash_search` (which means we need to modify `hash_insert`)
Since the additional flag `deleted` renders a less efficient search, **chaining** is more commonly use as a collision resolution technique when deletion of keys is necessary.

## Uniform hashing
In **uniform hashing**, the probe sequence of each key is equally likely to be any of the $m!$ permutations of $\langle 0, 1,..., m-1\rangle$. Although true uniform hashing is difficult to implement, the book provides three practival approximations: **linear probing**, **quadratic probing**, and **double probing**.

### Linear probing
Given an ordinary hash function $h':U\times \{0,1,2,...,m-1\} \rightarrow\{0,1,2,...,m-1\}$ which we refer to as an **auxiliary hash function**, linear probing uses the hash function for $i=0,1,2,...,m-1$:
$$
\begin{align}
h(k,i)=((h'(k)+i))\:mod\:m=(h'(k)\:mod\:m)+i\\
\end{align}
$$
* Because $h(k,i)$ is the modulation of $m$, it has $m$ distinct values
* The probe sequence starts from $T[h'(k)\:mod\:m]$
* It is followed by $T[(h'(k)\:mod\:m)+1]$, $T[(h'(k)\:mod\:m)+2]$..., until $T[m-1]$
* Then starts from $T[0]$, $T[1]$ ..., and ends at $T[(h'(k)\:mod\:m)-1]$
As you may see, the start of probe sequence$T[h'(k)\:mod\:m]$ always determines the entire probe sequence, there are only $m$ distinct probe sequence (because you can start from one of the $m$ slots). 

#### Primary Clustering
Linear probing can suffer from **primary clustering**, in which long runs of occupied slots build up that increase the average search time. For an empty slot $T[j]$ preceded by $i$ full slots, its probability of getting filled next is $(i+1)/m$. 
<img src="img/fig11.4_2.png" width="500">
Why? 
1. When insert a key $k$, the start of its probe sequence can be any one of the $m$ slots in $T$;
2. If this slot is filled ($P=i/m$), we will eventually insert $k$ in the slot $T[j]$
    * because the probe sequence is an increment of $1$ of its previous term
    * and we always insert in the first empty slot encountered
3. Don't forget that there is $P=1/m$ chance that $T[j]$ itself is the start of the probe sequence
4. In total we have $P=(i+1)/m$

When i is larger, P grows: there is a even higher tendency to have long occupied slots!

#### Implementing linear probing in Python:
By defining $h'$ with the division method, we can now implement the intact `hash_insert` and `hash_search` and `hash_delete` functions.
* We assume now that the elements in the hash table $T$ are keys with no satellite information
* Each slot contains either a key (filled), or `None` (empty)
* `h_linear` compute the probe sequence according to linear probling
* `hash_insert` and `hash_search` takes as input a hash table $T$ and a key $k$, and return the slot number $j$ or raise an error when the table is full/element not present respectively
* `hash_delete` searches for the element in $T$ and flag it with `deleted`; if the element is not present, raises an error





In [51]:
class OP:
    def __init__(self,m):#m = no. slots in table
        self.array=[None]*m
        self.m=m
    def h_linear(self,k,i): 
        
        return (k%self.m+i)%self.m
    def hash_insert(self,k):
        i=0
        while i!=self.m:
            j=self.h_linear(k,i)
            if self.array[j] is None or self.array[j]=='deleted':
                self.array[j]=k
                return j
            else:
                i+=1
        
        print ('hash table overflows')
        return
        
    def hash_search(self,k):
        i=0

        while i!=self.m:
            j=self.h_linear(k,i)
            
            """case 1: key is found terminates loop"""
            if self.array[j]==k: 
                return j
            
                """case 2: search finds an empty slot, the search it unsuccessful terminates loop"""
            elif self.array[j] is None: 
                print ('k not found in T')
                return
            
            else:
                i+=1
                
        """case 3: the list is full, we search the entire list but k cannot be found"""
        print ('k not found in T')
        return
    
    def hash_delete(self,k):
        try:
            self.array[self.hash_search(k)]='deleted'
        except TypeError:
            print ('k not found in T, cannot be deleted')
        
            
        
x1=OP(m=5) 
x1.hash_insert(8)
x1.hash_insert(18)
x1.hash_insert(28)
x1.hash_delete(20)
x1.hash_search(28)#return 0
x1.hash_search(18)#'k not found in T'
x1.hash_search(8)# return 3

k not found in T
k not found in T, cannot be deleted


3