# Hashing Functions

## Introduction

These functions are used in various applications, including data structures, cryptography, and error detection. In Python, hashing functions are used to implement hash tables for dictionaries and sets, as well as for cryptographic purposes.

### Examples

<img src='files/hash_function.png' source='wikipedia'>

### Properties of Hashing Functions

1. **Deterministic**: The same input will always produce the same hash output.
1. **Fixed Size**: The hash value has a fixed size, regardless of the size of the input.
1. **Efficient**: The computation of the hash value is typically fast.
1. **Uniformity**: The distribution of hash values should be uniform to avoid clustering.
1. **Irreversibility**: It should be computationally infeasible to reconstruct the input from the hash value.
1. **Collision Resistance**: It should be difficult to find two different inputs that produce the same hash value (collision).
1. **"Salt"**: Sometimes a random value, named "salt" is added for security measures.

## Why using Hashing functions?

1. **Data Structures**: Hash tables are used in dictionaries and sets for efficient data retrieval.
1. **Cryptography**: Cryptographic hashing functions are used for digital signatures, message authentication codes (MACs), and password storage.
1. **Data Integrity**: Hashing is used to verify that data has not been altered.
1. And many more...

## Built-in Hashing with `hash()`

Python provides a built-in `hash()` function that can be used to generate a hash value for immutable objects like strings, numbers, and tuples. This function is primarily used for implementing hash tables in dictionaries and sets.

### Examples:

In [None]:
# Hashing a string
hash("Hello, World!")

In [None]:

# Hashing a number
hash(12345)

In [None]:
hash((1, 2, 3))

In [None]:
# Hashing a tuple
hash((1, 2, 3))

In [None]:
# Yields an error!
#hash({1, 2, 3})

In [None]:
# Yields an error!
#hash({'a': 1})

## Mutable vs. Immutable Types

In Python, data types can be categorized as mutable or immutable.

### Immutable Types

Immutable types are objects whose state cannot be modified after they are created. Examples:

- **Strings**: `str`
- **Numbers**: `int`, `float`, `complex`
- **Tuples**: `tuple`
- **Frozen Sets**: `frozenset`
- **Boolean** : `bool`
- **Range** : `range`
- **Bytes** : `bytes`

### Mutable Types

Mutable types are objects whose state can be modified after they are created. In the built-in Python:

- **Lists**: `list`
- **Dictionaries**: `dict`
- **Sets**: `set`
- **Byte Arrat**: `bytearray`

But many more objects from specific libraries are mutable.

### Example: Mutable vs. Immutable

In [None]:
# Immutable type: string
s = "hello"
print(id(s))  # Memory address of s
s = s.upper()
print(id(s))  # Memory address of s after modification

In [None]:
# Mutable type: list
l = [1, 2, 3]
print(id(l))  # Memory address of l
l.append(4)
print(id(l))  # Memory address of l after modification

## Hash table

A hash map (or hash table) is a data structure that maps keys to values using a hash function. Python uses hash maps to implement dictionaries and sets efficiently.

In [None]:
67890 % 8

In [None]:
-8621292912452721651 % 8

In [None]:
8893 % 8

In [None]:
# Creating a dictionary
my_dict = {}

# Adding a key-value pair
my_dict['name'] = 'Alice'

# Internally, Python does the following:
# 1. Compute the hash value of the key 'name': hash_value = hash('name')
#    Let's say hash('name') = 12349
# 2. Determine the index in the hash table using the hash value.
#    index = hash_value % table_size
#    Let's say table_size = 8, so index = 12349 % 8 = 5
# 3. Store the key 'name', its hash value 12349, and the value 'Alice' in the hash table at index 5.

# Adding another key-value pair
my_dict['age'] = 25

# Internally, Python does the following:
# 1. Compute the hash value of the key 'age': hash_value = hash('age')
#    Let's say hash('age') = 8893
# 2. Determine the index in the hash table using the hash value.
#    index = hash_value % table_size
#    Let's say table_size = 8, so index = 8893 % 8 = 2
# 3. Store the key 'age', its hash value 8893, and the value 25 in the hash table at index 2.

### Graphic representation


| Index | Hash Value | Key  | Value  |
|-------|------------|------|--------|
| 0     |            |      |        |
| 1     |            |      |        |
| 2     | 8893       | age  | 25     |
| 3     |            |      |        |
| 4     |            |      |        |
| 5     | 12349      | name | Alice  |
| 6     |            |      |        |
| 7     |            |      |        |

### In case of collision?

And if an index is already taken. Python creates a new table with more entries and hash again all the elements.

### Speed

That's why Python is so fast while managing dictionnaries and sets. Instead of having to loop over the entire object, it computes the index thanks to the hash value of the key. 🤯

## Let's go further!

### Using different algorithms with `hashlib`

For cryptographic purposes, Python provides the `hashlib` module, which includes various secure hash and message digest algorithms like SHA-256, SHA-3, and MD5.

In [None]:
import hashlib

# Create a SHA-3 hash object
hash_object = hashlib.sha256()

# Update the hash object with the input data
hash_object.update("Hello, World!".encode('utf-8'))

# Get the hexadecimal representation of the hash value
hash_object.hexdigest()

In [None]:
import hashlib

# Create a SHA-3 hash object
hash_object = hashlib.sha3_256()

# Update the hash object with the input data
hash_object.update("Hello, World!".encode('utf-8'))

# Get the hexadecimal representation of the hash value
hash_object.hexdigest()