#### Hash Table 

A **hash table (hash map)** is a data structure that implements an associative array abstract data type, a structure that can map **keys** to the **addresses of values(data)**. A hash table must come with a **hash function** to compute an index value (also called a **hash code**) which points to into an array of **hash buckets**, the place where data is stored.

A **collision** occurs when two or more keys return the same hash code/result.

<center>
<img src="adt_hash_table.png" width="400" align="center"/>
</center>

<u>For example</u>, to store a phone book, you uses a person's name as key, and his phone number is the value(data) to be looked up.
- Key is passed into the hash function to generate an index value which points to a location where data is stored.
- Potentially multiple data may be stored in the same bucket, i.e. multiple keys may point to same bucket.

#### Hash Table Operations
The basic operations of a hash table is to add, find and remove item from table. 
- `insert(key,value)`: Add `value` data associated with key `key`
- `find(key)`: return `value` data with provided `key`
- `remove(key)`: remove `key` and its associated `value` data from the table
- `hash(key)`: returns the address to store the data value`


#### Features of a Hash function ####
*Note that these features are different from those of a Cryptographic hash function like MD5,SHA-256,SHA-3,BLAKE2*

To achieve a good hashing mechanism, It is important to have a good hash function with the following basic requirements:
* There should be **minimum collisions** as far as possible in the hash function that is used. 
* A hash function should have an **efficient/fast computation for the unique keys**.
* Hash function should **prevent clustering** by distributing the mappings evenly to all the buckets in the array.


##### Python's hash function

- maps an immutable object to 
```
[-2305843009213693950,2305843009213693950] in a 32-bit interpreter

[-9223372036854775807, 9223372036854775807] in a 64-bit interpreter
```
- `hash(object)%n` is evenly distributed in [0, n-1]
> For every instance of Python 3.x(and above) interpreter, `hash()` would have different seeds and thus, could give out a different value even though we have same object. 


In [None]:
import sys
sys.maxsize

9223372036854775807

In [None]:
hash('kavish')

-4362909624645209864

In [None]:
class Foo:
    pass
    def __hash__(self):
        return 999

f = Foo()
hash(f)

999

#### Cryptographic hash
- use in digital signature
- use in authentication
- prove non-repudiation

In [None]:
import hashlib

# Create an MD5 hash object
m = hashlib.md5()

# The input string to be hashed
input_string = "Hello, world"
m.update(input_string.encode())
# Get the hexadecimal representation of the hash
hex_dig = m.hexdigest()
print(hex_dig, len(hex_dig)*4)


In [None]:
m = hashlib.md5()
input_string = open("HASHEDDATA.TXT").read()
m.update(input_string.encode())
hex_dig = m.hexdigest()
print(hex_dig, len(hex_dig)*4)

In [None]:
m = hashlib.md5()
f  = open("adt_hash_table.png","rb")
m.update(f.read())
hex_dig = m.hexdigest()
print(hex_dig, len(hex_dig)*4)

#### Collision handling

Since the key space is much larger than the buckets space (in most situations),  collisions are bound to occur, we have to use appropriate collision resolution techniques to take care of the collisions. The following are some examples of the resolution techniques.
- **Linear Probing** : When the hash function causes a collision by mapping a new key to a cell of the hash table that is already occupied by another key, linear probing searches the table for the next closest following free location and inserts the new key there. Lookups are performed in the same way, by searching the table sequentially starting at the position given by the hash function, until finding a bucket with a matching key or an empty bucket.

    - *Note that the probing algorithm needs to handle wraparound and overflow conditions*
    - (This type of algorithm is known as closed hashing/open addressing)
    
- **Separate Chaining** : this technique involves building a linked list with key-value pair for each search array indices. The collided items are chained together through a single linked list, which can be traversed to access the item with a unique search key.
    - (This type of algorithm is known as open hashing/closed addressing)

> In A-Level, while student are **not expected** to know the technical term for the different techniques, they are **still required** to be able to describe the process of the collision resolution.


### Exercise 1

Implement the following array-based `HashTable` encapsulated using OOP.
You should use the Python's built-in `hash()` method to obtain hash value of the object to be stored


<center>

|`HashTable`|
|------------------------|
|------------------------|
|`+array: ARRAY OF OBJECT`|
|`+size: INTEGER`|
|------------------------|
|`+constructor(INTEGER)`|
|`+isFull(): BOOLEAN`|
|`+isEmpty(): BOOLEAN`|
|`+hash(OBJECT): INTEGER`|
|`+insert(OBJECT): BOOLEAN`|
|`+find(OBJECT): OBJECT`|
|`+delete(OBJECT): BOOLEAN`|
|`+print()`|

</center>

<center>

|Attribute/Method descriptions: |  |
|-|-|
| `HashTable.constructor(INTEGER)`| Initialises a `HashTable` with the given size. |
| `HashTable.insert(OBJECT):BOOLEAN`	 | Inserts an OBJECT into the `HashTable` (at index of hash value of the given objecy). When a collision occurs, the open addressing strategy of linear probing is utilised.RETURNS True if OBJECT is inserted else FALSE |
| `HashTable.find(OBJECT): BOOLEAN`	 | Determines if the given OBJECT exists in the `HashTable` by checking the index equal to the hash value of the OBJECT. If not found there, further probes until all indices have been checked. Returns the OBJECT  if found, else returns None.  |
| `HashTable.delete(OBJECT): BOOLEAN` | Attempts to delete the given OBJECT by first checking at the index equal to the hash value of the OBJECT. If not found there, further probes until all indices have been checked. Returns `True` if found and deleted, else returns `False`. |
| `HashTable.print()` |	Prints the contents of the `HashTable` using `str()` method on each object stored.|	

</center>
Your solution should also include sufficient test cases to adequately test all functionality.

In [None]:
## Code
class HashTable:
    def __init__(self, size):
        self.array = [None for _ in range(size)]
        self.size = size
    def __repr__(self):
        return f"{self.array}"
    def hash(self, object):
        return hash(object)%self.size
    def insert(self, object):
        index = self.hash(object)
        if self.array[index] == None:
            self.array[index] = object 
            return True
        else:
            ## linear probing
            i = (index+1)%self.size
            while i != index:
                if self.array[i] == None:
                    self.array[i] = object 
                    return True
                else:
                    i = (i+1)%self.size
            return False

In [None]:
## Complete Hash Table
##LZ
class HashTable:
    def __init__(self,size):
        self.arr = [None for i in range(size)]
        self.size = size

    def __repr__(self):
        return f'{self.arr}'

    def hash(self,obj):
        return hash(obj)%self.size #get remainder so that no index out of range
    
    def insert(self,obj):
        index = self.hash(obj)
        if self.arr[index] == None: #empty
            self.arr[index] = obj
            return True #say that insert was successful 
        else:
            ##linear probing
            i = (index + 1)%self.size #also implementing wrap around
            while i != index:         #probing
                if self.arr[i] == None: #if bucket empty, insert
                    self.arr[i] = obj
                    return True
                else:
                    i = (i+1)%self.size #if not, increment i
            return False                #while loop stopped, meaning no space to insert so return False
        
    def isFull(self):
        for i in self.arr:
            if i == None:
                return False
        return True
    
    def isEmpty(self):
        for i in self.arr:
            if i != None:
                return False
        return True

    def find(self,itm):
        index = self.hash(itm)
        if self.arr[index] == itm:
            return itm
        else:
            i = (index+1)%self.size
            while i!= index:
                if self.arr[i] == itm:
                    return itm
                else:
                    i = (i+1)%self.size
            return None
    
    def delete(self,itm):
        index = self.hash(itm)
        if self.arr[index] == itm:
            self.arr[index] = None
            return True
        else:
            i = (index+1)%self.size
            while i!= index:
                if self.arr[i] == itm:
                    self.arr[i] = None
                    return True
                else:
                    i = (i+1)%self.size
            return False

In [None]:
ht = HashTable(3)
ht.insert(1)
ht.insert(2)
ht.insert(1)
ht.insert(5)
print(ht)

### Exercise 2: 
Implement a hash table for a phone book. Each entry in the phone book is a pair of `Name` and `Phone`.
* `Name` is used as the key.
* `(Name, Phone)` tuple is saved as the data.

We will define a class `HashTable` to store the data.
* It has a list attribute `buckets` which keeps all data.
* Initialize the list size, i.e. how many buckets, by input parameter `size`.
* It has a <u>static</u> function `_hash()` which returns an `index` value based on input parameter `key`. The logic to be implemented in `_hash()` function is straight forward. We will simply return length of the `key` as the `index` value.
* Linear probing is used to resolve collision
* The `index` value specifies which bucket to put the data.

In [1]:

## Code for hash table
## INCORRECT implementation please fix this!!
## sometimes you use pass the tuple in to be hashed , sometimes you pass a str
class HashTable:
    def __init__(self,size=8):
        self.arr = [None for _ in range(size)]
        self.size = size
    def __repr__(self):
        return f"{self.arr}"
    def hash(self,string):
        return len(string) % self.size
    def insert(self,value):
        index = self.hash(value[0])
        if self.arr[index] == None:
            self.arr[index] = value
            return True
        else:
            ## linear probe
            i = (index+1)%self.size
            while i != index:
                if self.arr[i] == None:
                    self.arr[i] = value
                    return True
                else:
                    i = (i + 1) % self.size
            return False

    def find_tuple(self,value):

        index = self.hash(value)


        if self.arr[index] != None and self.arr[index][0] == value:
            return self.arr[index][1]
        else:
            i = (index+1)%self.size
            while i != index:
                try:
                    if self.arr[i] and self.arr[i][0] == value:
                        return self.arr[i][1]
                    else:
                        i = (i + 1) % self.size
                except:
                    i = (i + 1) % self.size

            return False
    def remove_tuple(self,value):
        index = self.hash(value)
        if self.arr[index] != None and self.arr[index][0] == value:
            self.arr[index] = None
            return
        else:
            i = (index+1)%self.size
            while i != index:
                try:
                    if self.arr[i] and self.arr[i][0] == value:
                        self.arr[i] = None
                        return True
                    else:
                        i = (i + 1) % self.size
                except:
                    i = (i + 1) % self.size
            return False

### Exercise 2.1
Let's try to add following items into the Hash Table.
* Create a hash table of 8 buckets.
* For each element in the list `contacts` below, 
>
```python
    contacts = [
        ('Ben', '357-0394'),
        ('Alan', '558-9171'),
        ('Freddi', '760-2466'),
        ('Alison','123-3456'),
        ('Amanda', '357-0394'),
        ('Stephanie', '299-5109')]
>
```   
* Use `_hash()` function to find out which bucket it belongs to;
* Put the contact in the bucket.
* Print out the `buckets` to view how contacts are stored.

In [None]:
## Test cases for Hash Table

##### Exercise 2.2

With the populated hash table, how do you retrieve the data of for a name, e.g. `'Amanda'`?
* Use `_hash()` function to find `index` value.
* Locate the bucket by index.
* Return the bucket.

In [None]:
## Test case for find

##### Exercise 2.3

We may need to remove an item, e.g. `'Alison'`, from the hash table.
* Use `_hash()` function to find index value.
* Locate the bucket by index and set it to `None`.

In [None]:
## Test case for delete


## Test case to find Amanda's Phone

#### Using Seperate Chaining to resolve collision

Ideally, the hash function will assign each key to a unique bucket. But since a hash function returns a small number for a big key, there is possibility that two keys result in same value. That is **hash table collision**.

### Example 5

Consider the following list `contacts` where **the hash function generates same index value for 4 entries**, and thus, all data are stored in same bucket. 

>```python
    contacts = [
        ('Ben', '357-0394'),
        ('Stephanie', '299-5109'),
        ('Alan', '558-9171'),
        ('Amanda', '357-0394'),
        ('Christ', '558-9171'),
        ('Freddi', '760-2466'),
        ('Steven', '299-5109')]
>```

Since there are four contacts' name has length of 6 characters, their hashed indexes point to the same bucket. Thus the bucket needs to be able to hold multiple contacts.

Modify the insert operation to use seperate chaining to resolve collision.

Provide test cases to validate your solution

____

In [None]:
## Code Seperate Chaining

class HashTable:
    def __init__(self, size):
        self.array = [ []  for _ in range(size)]
        self.size = size
    def __repr__(self):
        return f"{self.array}"
    def hash(self,key):
        return hash(key)%self.size

    def insert(self, object):
        if type(object) != tuple:
            return False
        index = self.hash(object[0])
        self.array[index].append(object)

    def find(self, key):
        index = self.hash(key)
        for object in self.array[index]:
            if object[0] == key:
                return object
        return None

        

## Exercise 3 2018/A Level/P1/Q3 H2 Computing
- to be submitted in Coursemology

The file, `HASHEDDATA.TXT`, holds details of the names and telephone numbers of 250 people. 
There are a total of 500 lines in the file, and a number of these lines are empty of name and telephone number.
An index is stored for each line of the file. 
The format of the data in the file is: 
>```
><Index>, <PersonName>, <TelephoneNumber> 
>```

The first 10 lines from the file are shown as follows: 

>```
>0, ,
>1, ,
>2, ,
>3, Boon Keng V., 07492 546415
>4, ,
>5, ,
>6, Ahmad Yusof, 07439 778665
>7, Durno Peter, 07662 863518
>8, Batisah Wong, 07362 156265
>9, ,
>```

The values in the file are separated by the comma character. 

A record structure is used to store a name and telephone number. A data structure of 500 records is needed to store all the names and telephone numbers. Each line in the file is written to a corresponding position in the data structure.

The records with index six to eight from the data structure are: 

<center>

| **Index** | **PersonName** | **TelephoneNumber**  |
|-|-|-|
| 6 | Ahmad Yusof | 07439 778665 |
| 7| Durno Peter | 07662 863518 |
| 8| Batisah Wong | 07362 156265 |

</center>

### Task 1


Use program code to create a:

- record structure to hold the name and telephone number for one person
- data structure, using this record structure to store 500 records.



Your program code.      
<div style="text-align: right">[6]</div>

### Task 2

Write program code to:

- read the lines from the file
- extract the `<Index>`, `<PersonName>` and `<TelephoneNumber>` values
- store these values in the data structure.

Create a procedure called `DisplayValues` that will loop though the data structure and display the index, name and telephone number for every record where the name is present.

Ensure your procedure uses headings to identify the data displayed.

#   
<div style="text-align: right">[13]</div>

### Evidence 2

A Screenshot showing the output.      
<div style="text-align: right">[1]</div>

A hashing function was used to create the file. The same hashing function can be used to search the data structure for a particular name. The hashing function generates a hash. This is calculated as follows:

>```
>Get SearchName
>Set HashTotal to 0
>FOR each Character in SearchName
>  Get the ASCII code for Character
>  Multiply the ASCII code by the position of Character in SearchName
>  Add the result to the HashTotal
>Calculate Hash as HashTotal MOD 500
>RETURN Hash
>```

### Task 3

Add the program code for the hashing function. Use the following specification:
>```
>FUNCTION GenerateHash(SearchName : STRING) : INTEGER
>```

The function has a single parameter `SearchName` and returns an integer value.
Write additional code for your program to allow you to test the implementation of this function.
The following test data will assist you.
- “Tait Davinder” should return a hash of 87
- “Anandan Yeo" should return a hash of 156



Your program code.     
<div style="text-align: right">[8]</div>

### Evidence

A screenshot (or screenshots) of your program to show the results of the hash calculation for both the given test data values.    
<div style="text-align: right">[2]</div>

The hash calculated from the `SearchName` can be used to find a corresponding record in the data structure.
If the `SearchName` is not found in the record given by the hash **and** the record is not empty:
- compare `SearchName` with the next record
- until the `SearchName` is found or an empty record is found.

If an empty record is found then the program will report that the name is “NOT FOUND”.

If the record is found, the program will output the index, name and telephone number.

### Task 4

Add the program code to implement the search as described.

Your program code.
<div style="text-align: right">[7]</div>

### Evidence

A screenshot (or screenshots) of your program to show the results of the following searches:

Search 1: Charlie Love <br>
Search 2: Chin Tan <br>
Search 3: John Barrowman
<div style="text-align: right">[3]</div>