## 5.5.1. Hash Functions
* Given a collection of items, a hash function that maps each item into a unique slot is referred to as a perfect hash function. 
* If we know the items and the collection will never change, then it is possible to construct a perfect hash function (refer to the exercises for more about perfect hash functions). 
* Unfortunately, given an arbitrary collection of items, there is no systematic way to construct a perfect hash function. 
* Luckily, we do not need the hash function to be perfect to still gain performance efficiency.

* One way to always have a perfect hash function is to increase the size of the hash table so that each possible value in the item range can be accommodated. 
* This guarantees that each item will have a unique slot. 
* Although this is practical for small numbers of items, it is not feasible when the number of possible items is large. 
* For example, if the items were nine-digit Social Security numbers, this method would require almost one billion slots. 
* If we only want to store data for a class of 25 students, we will be wasting an enormous amount of memory.

* Our goal is to create a hash function that minimizes the number of collisions, is easy to compute, and evenly distributes the items in the hash table. 
* There are a number of common ways to extend the simple remainder method. 
* We will consider a few of them here.

* The folding method for constructing hash functions begins by dividing the item into equal-size pieces (the last piece may not be of equal size). 
* These pieces are then added together to give the resulting hash value. 
* For example, if our item was the phone number 436-555-4601, we would take the digits and divide them into groups of 2 (43,65,55,46,01). 
* After the addition, 43+65+55+46+01, we get 210. If we assume our hash table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the remainder. 
* In this case 210 % 11 is 1, so the phone number 436-555-4601 hashes to slot 1.
* Some folding methods go one step further and reverse every other piece before the addition. 
* For the above example, we get 43+56+55+64+01=219 which gives 219 % 11=10.

* Another numerical technique for constructing a hash function is called the mid-square method. 
* We first square the item, and then extract some portion of the resulting digits. 
* For example, if the item were 44, we would first compute 442=1,936. By extracting the middle two digits, 93, and performing the remainder step, we get 5 (93 % 11). 
* Table 5 shows items under both the remainder method and the mid-square method. 
* You should verify that you understand how these values were computed.

## Table 5: Comparison of Remainder and Mid-Square Methods¶

| Item | Remainder | Mid-Square |
|------|-----------|------------|
| 54   | 10        | 3          |
| 26   | 4         | 7          |
| 93   | 5         | 9          |
| 17   | 6         | 8          |
| 77   | 0         | 4          |
| 31   | 9         | 6          |


* We can also create hash functions for character-based items such as strings. 
* The word “cat” can be thought of as a sequence of ordinal values.

In [None]:
ord('c')
ord('a')
ord('t')

* We can then take these three ordinal values, add them up, and use the remainder method to get a hash value (see Figure 6). 
* Listing 1 shows a function called hash that takes a string and a table size and returns the hash value in the range from 0 to tablesize-1.

In [None]:
# Listing 1

def hash(astring, tablesize):
    sum = 0
    for pos in range(len(astring)):
        sum = sum + ord(astring[pos])

    return sum%tablesize

* It is interesting to note that when using this hash function, anagrams will always be given the same hash value. 
* To remedy this, we could use the position of the character as a weight. 
* Figure 7 shows one possible way to use the positional value as a weighting factor. 
* The modification to the hash function is left as an exercise.

* You may be able to think of a number of additional ways to compute hash values for items in a collection. 
* The important thing to remember is that the hash function has to be efficient so that it does not become the dominant part of the storage and search process. 
* If the hash function is too complex, then it becomes more work to compute the slot name than it would be to simply do a basic sequential or binary search as described earlier. 
* This would quickly defeat the purpose of hashing.