# Example: Fun with Hashing Functions
This example familiarizes students with the concept of [hash functions](https://en.wikipedia.org/wiki/Hash_function), i.e., a function that maps data to an index of a fixed-size table. In particular, we'll look [at the `myhash(...)` implementation](src/Compute.jl), which is an example of a `linear hash function` of the form:
$$
h(x) = (ax+b)~\text{mod}~{m}
$$
where $\text{mod}$ denotes the [modulo operation](https://en.wikipedia.org/wiki/Modulo), and $a,b$ and $m$ are parameters. The $m$ parameter (called the `size`) strongly influences the likelihood of `collisions.`

## Setup
This example may use external third-party packages. In [the `Include.jl` file](Include.jl), we load our codes to access them in the notebook, set some required paths for this example, and load any required external packages.

In [3]:
include("Include.jl");

## Example: Computing the hashcode of a String using a Linear hash function
One of the interesting (and amazing!) things about the [Dictionary type in Julia](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) or [Python](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) is the ability to map `any key` => to `any value`, where the `key` is unique. But how does this work?
* Behind the scenes, we use a `hash` function to take a `key` and convert it into an `Int` index into an `Array{typeof{value},1}` that holds the value. Thus, the magic of a dictionary is just a clever way of computing an array index.

Let's specify a `test_string` with some data in it:

In [5]:
test_string = "This is a test string. In lecture ... wow!";

Next, let's use [the `myhash(...)` function](src/Compute.jl) to compute the `hashcode` of the `test_string`:

In [7]:
test_hashed_value = myhash(test_string, β = 31, size = 1000) # big size

227

### OK, but does this trick always work?
The short answer is no; sometimes, there may be `collisions` when generating the `hashcode` of a `key.` When the `size` parameter is `small,` there is a higher likelihood that two strings will get mapped to the same `hashcode.`

Let's check this out with another example. 

In [9]:
another_test_string = "CHEME-4800-5800-Fall-2024"

"CHEME-4800-5800-Fall-2024"

In [10]:
another_hashed_value = myhash(another_test_string, β = 31, size = 10)

6

#### Can we generate a collision?
Let's chop up `another_test_string::String` and hash each substring with small and large values of the `size::Int64` parameters. We expect that with small values of `size,` we will see collisions. However, as `size` becomes larger, the frequency of collisions should decrease. Let's check out this intuition.

In [12]:
N = length(another_test_string)
collisions = Set{Int64}();
for i ∈ 1:N
    
    test_string = another_test_string[1:i]
    test_value = myhash(test_string, β = 31, size = 10)

    @show (test_string, test_value)
    
    if ((test_value ∈ collisions) == false)
        push!(collisions, test_value);
    else
       println("We have a collision: $(test_string) with $(test_value)") 
    end
end

(test_string, test_value) = ("C", 7)
(test_string, test_value) = ("CH", 9)
(test_string, test_value) = ("CHE", 8)
(test_string, test_value) = ("CHEM", 5)
(test_string, test_value) = ("CHEME", 4)
(test_string, test_value) = ("CHEME-", 9)
We have a collision: CHEME- with 9
(test_string, test_value) = ("CHEME-4", 1)
(test_string, test_value) = ("CHEME-48", 7)
We have a collision: CHEME-48 with 7
(test_string, test_value) = ("CHEME-480", 5)
We have a collision: CHEME-480 with 5
(test_string, test_value) = ("CHEME-4800", 3)
(test_string, test_value) = ("CHEME-4800-", 8)
We have a collision: CHEME-4800- with 8
(test_string, test_value) = ("CHEME-4800-5", 1)
We have a collision: CHEME-4800-5 with 1
(test_string, test_value) = ("CHEME-4800-58", 7)
We have a collision: CHEME-4800-58 with 7
(test_string, test_value) = ("CHEME-4800-580", 5)
We have a collision: CHEME-4800-580 with 5
(test_string, test_value) = ("CHEME-4800-5800", 3)
We have a collision: CHEME-4800-5800 with 3
(test_string, test_v