### Phone book implementation with Hashmap

- Having discussed hash tables, maps, and sets, how do we implement a phone book?

- Requirements:
    1. Add and delete contacts fast
    2. Call person by name
    3. Determine who is calling given their phonenumber

- Quite clearly, requirements 2 and 3 are different look ups, one looks up a number using name as a key, the other looks up a name using number as a key
    - So we need 2 maps; `number -> name` and `name -> number`
    - Both of these are hash tables

- Recall the following from previous notes:
    - Jargon
        - $n$: Number of objects in universe to store
        - $m$: Cardinality of hash function
        - $c$: Longest chain length
    - Asymptotics
        - $\Theta(n+m)$  memory
        - $\Theta(c+1)$  time

- We want to keep $m$ and $c$ as small as possible!
- We further know that $c \ge \frac{n}{m}$
    - The smallest $c$ you can get is if you evenly divide all $n$ objects between the $m$ chains

### What is a good hash function for phone number?

- Options
    - First 3 digits? Bad, because area code is often the same, so you get large $c$
    - Last 3 digits? Might be bad, if many numbers end with `000`
    - Random? Good distribution guaranteed, but hash cannot be repeated!

- Remember, we want our hash function to be 
    - Deterministic (i.e. for a given value the computed hash is always the same)
    - Fast to compute
    - Distributes keys well info different cells
    - Few collisions

- **Problem**: if the number of possible keys is much bigger than cardinality of the hash function $|S| >> m$, then any hash function $h$ can give you a bad input with collisions

### Universal family

- So if no single hash function that exists can give us the desired case of few collisions, we rely instead on a `universal family` of hash functions
    - It is similar to the quicksort idea, where choosing pivot randomly helps us get better performance asymptotically!

- We will choose a family (set) of hash functions, then choose a random one from the family

- Formally
    - Let $U$ be the **universe** i.e. set of all possible keys that we want to hash
    - Let $\mathbb{H} = \{h: U \rightarrow \{0,1,...m-1\} \}$ be a set of hash functions
    - $\mathbb{H}$ is a **universal family** if for any two keys $x,y \in U \text{and} x \neq y$, the probability of collision is at most $\frac{1}{m}$
    $$Pr[h(x) = h(y)] \le \frac{1}{m}$$
    
- Intuitively, it just means that if I randomly pick some hash function from this set, and computed h(x) and h(y) for a specific pair (x, y), I have at most $\frac{1}{m}$ probabilty of collision
    - Just as an example, if I uniformly pick a random hash function for x, and another for y, this gives us collision with probability $\frac{1}{m}$
    - Of course this doesn't work, because then the hashing isn't deterministic. It is just to illustrate the idea
    - In actual implementation, we will use the same $h$ throughout the algorithm

### Load Factor

- Let's discuss 1 more concept, called the load factor $\alpha$
    - $\alpha = \frac{n}{m}$
    - It is simply the ratio between the number of objects and cardinality of the hash 

- Theorem: If we choose $h$ randomly from universal family, the average length of the longest chain $c$ is $O(1 + \alpha)$, where $\alpha=\frac{n}{m}$ is the load factor of the table
    - That is; if $h$ is from the universal family, operations with hash table run on averagein time $O(1 + \alpha)$

- So effectively, our problem reduces to choosing a good $\alpha$, which we can do by choosing a good $m$
    - Ideally, we want $0.5 \lt \alpha \lt 1$
    - Once alpha is chosen, the memory we need is automatically $O(m) = O(\frac{n}{\alpha}) = O(n)$
    - Operations run in $O(1+\alpha) = O(1)$ time 

### Dynamic Hash Table

- We often don't know the size of $n$ in advance
- So instead of wasting space by starting with a big hash table, we can use the idea of dynamic arrays
    - Start with hash table of some size, and resize when $\alpha$ becomes too large
    - Then choose a new hash function from universal set, and rehash all objects

In [None]:
def Rehash(hash_table):
    load_factor = len(hash_table.keys()) / 
