# Computational Theory Problems

In [171]:
import numpy as np 
import math

## Introduction

This notebook explores key aspects of the Secure Hash Standard, with a particular focus on the SHA-256 hashing algorithm. The aim of this project is to understand how the various logical and mathematical operations specified in the standard come together to create a secure cryptographic hash function.

We examine each major stage of SHA-256 by analysing its underlying computational characteristics:
* Logical functions and bitwise operations that introduce non-linearity and variation at each round of the algorithm.
* Generation of the SHA-256 constants, derived from the fractional parts of the cube roots of the first 64 prime numbers, to provide a unique, non-predictable starting point that is crucial for security
* 



## Problem 1: Binary Words and Operations

### Overview

In this problem, we implement the core logical and bitwise operations defined in the Secure Hash Standard for the SHA-256 algorithm.

These operations - Parity, Ch, Maj, Σ₀, Σ₁, σ₀, and σ₁ - are applied repeatedly during each round of the SHA-256 compression function to introduce variation, randomness and non-linearity, which is important in cryptographic security.

To ensure correctness and match the behaviour described in the standard, all computations are performed using NumPy 32-bit integers.
This guarantees that operations such as bitwise shifts, rotates, and modular additions behave exactly as they do in SHA-256, where every value must remain within a 32-bit word.

By implementing these functions, we get a clear idea of how SHA-256 performs its basic logical operations.
We also need these functions later in Problems 3 and 4, where they are used during block processing and hash calculation.

### SHA-1 Functions
SHA-1 is shown here for background, because some of its logical functions (Parity, Ch, Maj) form the foundation for the expanded operations used in the SHA-2 family.<br><br>
SHA-1 runs 80 rounds of bitwise operations, using three logical functions — Parity, Ch, and Maj — to mix three 32-bit words (x, y, z) in each round.

The algorithm’s output is a 160-bit hash value, represented as five 32-bit words.<br><br>
Every rotation, XOR, or addition must stay within 32 bits — if anything overflows, <br>
`np.uint32` wraps it automatically, keeping the math identical to real 32-bit hardware.<br><br>
Although no longer considered secure, its structure influenced the development of SHA-224 and SHA-256.

### Parity Function

**Formula:** Parity(x XOR y XOR z)

**Purpose:** This function mixes three 32-bit numbers (x, y, and z) using XOR to add variation to the hash.

**What function is doing:**
1.  **x XOR y**:<br>
        Compares every bit of x to every bit of y.<br>
        If bits differ, result is 1; otherwise 0.

2.  **(x XOR y) XOR z**:<br>
        Takes the result of step 1 and compares it bit-by-bit with *z*.<br>
        If bits differ, result is 1; otherwise 0.<br>
        (Effectively counts whether the total number of 1 bits across x, y, z is odd.)



In [172]:
def Parity(x, y, z):
    """Computes the SHA-1 Parity function."""
    #Assigning variables as 32 bit unsigned integers
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)
    
    #Returns result of parity function as 32-bit unsigned integer
    return (x ^ y ^ z)

#Testing function 
result = Parity(14, 6, 1)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form


Binary result: 00000000000000000000000000001001
Decimal result: 9


### Ch(Choose) Function
 **Formula:** Ch(x, y, z) = (x AND y) XOR ((NOT x) AND z)

 **Purpose:** This function picks bits from y or z depending on the value of x, helping to mix data in each round.

**What function is doing:**
1.  **x & y**:<br>
Compares every bit of x to every bit of y.<br>
If both bits are 1, the result is 1, otherwise 0.

2.  **~x & z**:<br>
Flips all bits of x into opposite bits.<br>
Compares every bit of x to every bit of z.<br>
If both bits are 1, the result is 1, otherwise 0.

3.  **XOR**:<br> 
Compares Step 1 against Step 2 bit by bit.<br>
If bits differ, result is 1; otherwise 0.

In [173]:
def Ch(x, y, z):
    """Computes the SHA-1 Ch (choose) function"""
    #Assigning variables as 32 bit unsigned integers
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)

    #Returns result of Ch function as 32 bit unsigned integer
    return (x & y) ^ (~x & z)

#Testing function 
result = Ch(13, 6, 8)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form

Binary result: 00000000000000000000000000000100
Decimal result: 4


### Maj Function
 **Formula:** Maj(x, y, z) = (x AND y) XOR (x AND z) XOR (y AND z)

 **Purpose:** This function finds the most common bit among x, y, and z to help SHA-1 mix values fairly.

**What function is doing:**
1.  **x & y**:<br>
Compares every bit of x to every bit of y.<br>
If both bits are 1, the result is 1, otherwise 0.

2.  **x & z**:<br>
Compares every bit of x to every bit of y.<br>
If both bits are 1, the result is 1, otherwise 0.

3.  **y & z**:<br>
Compares every bit of x to every bit of y.<br>
If both bits are 1, the result is 1, otherwise 0.

4. **XOR**:<br>
Compares the three results from Steps 1–3 bit by bit.<br>
If bits differ, result is 1; otherwise 0.

In [174]:
def Maj(x, y, z):
    """Computes the SHA-1 Maj function"""
    #Assigning variables as 32 bit unsigned integers
    x = np.uint32(x)
    y = np.uint32(y)
    z = np.uint32(z)

    #Returns result of maj function as 32 bit unsigned integer
    return ((x & y) ^ (x & z) ^ (y & z))

#Testing function 
result = Maj(19, 21, 4)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form

Binary result: 00000000000000000000000000010101
Decimal result: 21


###  SHA-2 Functions
SHA-224 and SHA-256 use the same logical building blocks as SHA-1, but operate within the SHA-2 family of hash algorithms.<br><br>
Unlike SHA-1, which runs 80 rounds, SHA-224 and SHA-256 each run 64 rounds,<br>
using a larger 256-bit state and more complex logical functions for stronger security.<br>

It uses a mix of 6 functions - Ch, Maj, Σ₀, Σ₁, σ₀, and σ₁.<br>

Instead of five 32-bit words (like SHA-1’s 160-bit state), SHA-2 works with eight 32-bit words, giving outputs of 224 or 256 bits.<br><br>
These functions introduce non-linearity and help ensure that each round transforms the message in a way that is irreversible and highly sensitive to changes in input<br>([SHA-224 and SHA-256 explanation](https://mojoauth.com/compare-hashing-algorithms/sha-224-vs-sha-256/))

### Rotate Right Function

**Formula:** ROTR^n(x) = (x >> n) OR (x << (w - n))<br>
Where w = 32 for a 32-bit word


**Purpose:** This `rotr` function ([adapted from GeeksforGeeks](https://www.geeksforgeeks.org/dsa/rotate-bits-of-an-integer/)) performs a circular right shift within a fixed 32-bit word,<br>
moving the bits that fall off the right back to the left side.

**What function is doing:**<br>
**n = n % 32:**<br>
Ensures the rotation amount stays between 0–31.<br>
Rotating by 32 (or any multiple) returns the number unchanged.<br>
Eg: If n = 35, then n % 32 = 3 - rotate by 3 bits.<br>

1. **x >> n**:<br>
        Shifts every bit of x to the right by n positions.<br>
        The rightmost bits are dropped and zeros fill in on the left.<br>


2. **32 - n**:<br>
        Represents the number of bits that need to wrap around from the right side to the left.<br>

3. **x << (32 - n)**:
        Moves the dropped bits back to the left side of the 32-bit word, maintaining circular rotation<br>

4. **OR:**<br>
        Compares results from Step 1 and Step 3 bit by bit.
        If either of the bits are 1, result is bit is 1; Otherwise its 0


        



In [175]:
def rotr(x, n):
    """Performs a 32-bit circular right rotation"""
    #Assigning variables as 32 bit unsigned integer
    x = np.uint32(x)
    #Keep rotation within 32 bit word range
    n = n % 32
    #Shift bits right by n and move lost bits to left side
    rotr = ((x >> n) | (x << (32 - n))) 

    #Returning result as 32 bit unsigned integer
    return rotr

#Testing function 
result = rotr(12, 2)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form

Binary result: 00000000000000000000000000000011
Decimal result: 3


### Rotate Shift Function

**Formula:** SHR^n(x) = (x >> n) <br>

**Purpose:** This `rightShift` function performs a right shift operation by moving bits of x to the right by n positions,<br>
filling in the left side with zeros.

**What function is doing:**<br>

1. **n = shift amount (0–31):**<br>
Specifies how many positions the bits in x will move to the right.  

2. **x >> n:**<br>
Shifts every bit of x right by n positions.  
The rightmost bits are killed, and zeros fill in on the left.  

3. **n32(x >> n):**<br>
Ensures the result stays within a 32-bit unsigned integer range.   



In [176]:
def rightShift(x, n):
    """Performs a right rotation shift operation"""
    #Assigning variable as 32 bit unsigned integer
    x = np.uint32(x)
    
    #Shift bits right by n and adds 0's to the left side in replacement of shifted bits
    rightShift = (x >> n)

    #Returning result as 32-bit unsigned integer
    return rightShift

#Testing function 
result = rightShift(58, 2)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form

Binary result: 00000000000000000000000000001110
Decimal result: 14


### Σ₀ (Sigma0) Function

**Formula:** Σ₀(x) = ROTR²(x) XOR ROTR¹³(x) XOR ROTR²²(x)<br>

**Purpose:** This function rotates the bits of input x to the right by 2, 13, and 22 bits.<br>
It increases mixing of bits to strengthen the hash’s randomness and security.

**What the function is doing:**<br>

1. **rotr(x, 2):**<br>
   Calls the `rotr` function.<br>
   Rotates all bits of *x* to the right by 2 bits.

2. **rotr(x, 13):**<br>
   Calls the `rotr` function.<br>
   Rotates all bits of *x* to the right by 13 bits.

3. **rotr(x, 22):**<br>
   Calls the `rotr` function.<br>
   Rotates all bits of *x* to the right by 22 bits.

4. **XOR:**<br>
   Combines the results from Steps 1–3 bit by bit.<br>
   If bits differ, result is 1; otherwise 0.


In [177]:
def Sigma0(x):
    """Computes the SHA-2 Σ₀(x) function."""
    #Assigning variable as 32 bit unsigned integer
    x = np.uint32(x)

    #Returning result of right-rotating x by 2, 13, and 22 bits,
    #then XORing the three results together    
    return (rotr(x,2) ^ rotr(x, 13) ^ rotr(x, 22))

#Testing function 
result = Sigma0(12)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form

Binary result: 00000000011000000011000000000011
Decimal result: 6303747


### Σ₁ (Sigma1) Function

**Formula:** Σ₁(x) = ROTR^6(x) XOR ROTR^11(x) XOR ROTR^25(x)

**Purpose:** This function works exactly like Σ₀(x) (see previous section),  
but uses different rotation amounts - 6, 11, and 25 bits —  
to increase the mixing of bits at a different stage of the hash computation.

**Reference:** Implementation adapted from the same pattern used in Σ₀(x),  
with only rotation values changed.

In [178]:
def Sigma1(x):
    """Computes the Σ₁(x) function."""
    #Assigning variable as 32 bit unsigned integer
    x = np.uint32(x)

    #Returning result of right-rotating x by 6, 11, and 25 bits,
    #then XORing the three results together    
    return (rotr(x,6) ^ rotr(x, 11) ^ rotr(x, 25))

#Testing function 
result = Sigma1(12)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form

Binary result: 00110001100000000000011000000000
Decimal result: 830473728


### σ₀ (sigma0) Function

**Formula:** σ₀(x) = ROTR^7(x) XOR ROTR^18(x) XOR SHR^3(x)

**Purpose:** This function rotates the bits of input x to the right by 7 and 18 bits.<br>
It then performs a right shift operation on x by 3 bits.
It increases mixing of bits to strengthen the hash’s overall security.

**What the function is doing:**<br>

1. **rotr(x, 7):**<br>
   Calls the `rotr` function.<br>
   Rotates all bits of x to the right by 7 bits.

2. **rotr(x, 18):**<br>
   Calls the `rotr` function.<br>
   Rotates all bits of x to the right by 18 bits.

3. **rightShift(x, 3):**<br>
   Calls the `rightShift` function.<br>
   Shifts all bits of x to the right by 3 bits, replacing leftmost bits with zeros.

4. **XOR:**<br>
   Combines the results from Steps 1–3 bit by bit.<br>
   If bits differ, result is 1; otherwise 0.

In [179]:
def sigma0(x):
    """Computes the σ₀(x) function."""
    #Assigning variable as 32 bit unsigned integer
    x = np.uint32(x)

    #Returning result of right-rotating x by 7, and 18, and then
    #right-shifting x by 3
    return (rotr(x,7) ^ rotr(x, 18) ^ rightShift(x, 3))#XOR of 3 results

#Testing function 
result = sigma0(12)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form

Binary result: 00011000000000110000000000000001
Decimal result: 402849793


### σ₁ (sigma1) Function

**Formula:** σ₁(x) = ROTR^17(x) XOR ROTR^19(x) XOR SHR^10(x)

**Purpose:** This function works exactly like σ₀(x) (see previous section),  
but uses different rotation and right shift amounts - 17, 19, and 10 bits —  
to increase the mixing of bits at a different stage of the hash computation.

**Reference:** Implementation adapted from the same pattern used in σ₀(x),  
with only rotation and right shift values changed.

In [180]:
def sigma1(x):
    """Computes the σ₁(x) function."""
    #Assigning variable as 32 bit unsigned integers
    x = np.uint32(x)
    
    #Returning result of right-rotating x by 17, and 19, and then
    #right-shifting x by 10
    return (rotr(x,17) ^ rotr(x, 19) ^ rightShift(x, 10))#XOR of 3 results

#Testing function 
result = sigma1(12)
print(f"Binary result: {result:032b}") #Binary output as 32-bit word
print(f"Decimal result: {result}") #Result in decimal form

Binary result: 00000000000001111000000000000000
Decimal result: 491520


## Problem 2: Fractional Parts of Cube Roots

### Overview 
In this problem, we generate the first 64 prime numbers and use them to compute the constants defined in the Secure Hash Standard for the SHA-256 algorithm. These constants, listed at the bottom of page 11 of the standard, are derived by taking the fractional parts of the cube roots of the first 64 primes and extracting their first 32 bits.<br><br>
These constants are used to help mix the data and add randomness to the hashing process.  

### Find Prime Function

**The Sieve of Eratosthenes Algorithm**<br>
The Sieve of Eratosthenes Algorithm is an efficient algorithm for finding all prime numbers up to a given limit.<br>
My understanding of the algorithm was developed through the explanation provided in [LibreTexts](<https://math.libretexts.org/Courses/Coalinga_College/Math_for_Educators_(MATH_010A_and_010B_CID120)/04%3A_Number_Theory/4.03%3A_The_Sieve_of_Eratosthenes>)
, which illustrates the process of iteratively crossing out multiples of each prime starting from 2.
The method continues until all non-prime numbers have been marked, leaving only prime numbers unmarked.


**Purpose:** The `find_primes` function ([adapted from GeeksforGeeks](https://www.geeksforgeeks.org/python/python-program-for-sieve-of-eratosthenes/)), finds the first `n` prime numbers using a vectorized version of the Sieve of Eratosthenes algorithm.<br><br>
I decided to use this method of finding prime numbers as it is a fast and memory efficient way of generating prime numbers, particularly as the number of required primes increases.<br><br>
When testing prime functions from the
[given documentation](https://github.com/ianmcloughlin/computational-theory/blob/main/materials/prime_numbers.ipynb), on larger numbers such as 90000, the time taken to return the primes was slower (6s), compared to vectorised Sieve Algorithm (0s).<br><br>
The algorithm systematically eliminates non-prime numbers by marking multiples of each prime, and when implemented with [NumPy vectorization](https://www.geeksforgeeks.org/numpy/vectorized-operations-in-numpy/), this marking is performed in a single optimized operation rather than in repeated Python loops, leveraging NumPy's underlying C implementation for faster and more efficient computations.

**What the function is doing:**<br><br>
1. **prime = np.ones(k + 1, dtype=bool):**<br>
Creates a boolean array filled with `True` values, to mark any potential primes.

2. **prime[0:2] = False:**<br>
Uses the [slice function](https://www.geeksforgeeks.org/python/python-slice-function/) to set 0 and 1 to False as they are not primes.

3. **for p in range(2, int(math.sqrt(k)) + 1):**<br>
Loops through every number from 2 up to square root of `k`. Any non-prime number will have at least one factor less than or equal to its square root, so checking beyond this point is unnecessary.
This means the loop only needs to test possible divisors up to the square root of k for efficiency.

4. **if prime[p]:**<br>
Checks if number in at current index is `True` - is prime.

5. **prime[p*p:k+1:p] = False**<br>
Start from the square root of `p`.<br>
Mark every multiple of `p` up to `k` as `False` (not prime).<br>
Do in steps of `p`.

6. **result = np.nonzero(prime)[0].tolist()**<br>
Returns indices of all non 0 - `True` values, i.e the primes and returns it in list.

7. **return result[:n]**<br>
Uses list slicing to return all prime numbers from 0 to n from the list.


In [181]:
def find_primes(n):
    """Computes the first n primes numbers using vectorized version of Sieve of Eratosthenes"""
    #Upper limit large enough to find the first n primes
    k = 100000
    #Creates array filled with 1's (1 meaning True), Using a boolean array
    prime = np.ones(k + 1, dtype=bool)
    #Setting 0 and 1 to false, they are not primes by definition
    prime[0:2] = False
    #Loops through numbers from 2 to square root of n
    for p in range(2, int(math.sqrt(k)) + 1):
        #If current index is prime
        if prime[p]:
            prime[p*p:k+1:p] = False #Set its multiples to false
    #Returns all primes
    result = np.nonzero(prime)[0].tolist()
    #Gets all primes that were found only up to n
    return result[:n]

#Finding first 64 prime numbers
primes = find_primes(64)
#Printing prime numbers and how many there are
print(primes)       
print(len(primes))  

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311]
64


### Frac Cube Function

**Purpose:** This function calculates the first 32 bits of the fractional parts of the cube roots of a given list of prime numbers. These values correspond to the initial constants used in the SHA-256 algorithm, as defined in the Secure Hash Standard.

**What the function is doing:**<br>

1. **cubeRoots = np.cbrt(primes)**:<br><br>
Uses NumPy's [cube root](https://www.geeksforgeeks.org/python/numpy-cbrt-python/) to compute the cube root of each prime number in the list `primes`.

2. **fracParts = np.modf(cubeRoots)[0]:**<br>
The NumPy [np.modf()](https://www.geeksforgeeks.org/python/python-modf-function/) function splits each value into an array of 2 parts:
    * The fractional part of the value (digits after the decimal)
    * The integer part (whole-number before the decimal)
    
    For example:
    * Cube root of 3 = 1.44224957031
    * np.modf(1.44224957031) = ([0.44224957031], [1])

    Adding [0] returns only the fractional parts, while [1] would return the integer parts.

    This step therefore isolates the fractional part of each cube root. 

3. **fracParts = fracParts * (2 ** 32):**<br>
Multiplies each fractional part by 2^32 to shift it 32 bits to the left of the binary point.<br>
Example:
    * Fractional part of 3's cube root = 0.44224957031
    * 0.44224957031 * (2 ** 32) = 1894000995.25<br>
    This operation extracts the first 32 bits of the fractional value.

4. **first32bits = np.uint32(fracParts):**<br>
Converts the shifted values into 32-bit unsigned integers.
This ensures they are represented correctly as 32-bit binary words, matching the integer precision defined in the SHA-256 standard. 

5. **hexList = [hex(n) for n in first32bits]:**<br>
Converts each 32-bit integer into [hexadecimal format](https://blog.finxter.com/5-best-ways-to-convert-a-list-of-ints-to-hex-in-python/#:~:text=The%20built%2Din%20Python%20function,each%20integer%20in%20the%20list.) to compare against the 64 constant 32-bit words listed in the Secure Hash Standard (page 11).
Matching outputs confirm that the fractional-part constants have been computed correctly.



In [182]:
def fracCube(primes):

    """Calculates the first 32 bits of the fractional parts of the cube roots of a given list of prime numbers.
        Function returns a list of hexadecimal values representing the first 32 bits of the fractional parts of the cube roots of the given primes."""

    #Calculating cube root of prime values using NumPy
    cubeRoots = np.cbrt(primes)   
    #Seperating out fractional part of cube root of prime value
    fracParts = np.modf(cubeRoots)[0]
    #Shifting bits to the left of decimal point
    fracParts = fracParts * (2 ** 32)
    #Convert bits to 32-bit unsigned integers
    first32bits = np.uint32(fracParts)
    #Converting each 32 bit integer into hexadecimal format
    hexList = [hex(n) for n in first32bits]
    #Returning list of hex values
    return hexList

#Finding first 64 prime prime numbers
primes = find_primes(64)
#Calling function to return fractional part of cube root of prime as hexadecimal
fracCube(primes)


['0x428a2f98',
 '0x71374491',
 '0xb5c0fbcf',
 '0xe9b5dba5',
 '0x3956c25b',
 '0x59f111f1',
 '0x923f82a4',
 '0xab1c5ed5',
 '0xd807aa98',
 '0x12835b01',
 '0x243185be',
 '0x550c7dc3',
 '0x72be5d74',
 '0x80deb1fe',
 '0x9bdc06a7',
 '0xc19bf174',
 '0xe49b69c1',
 '0xefbe4786',
 '0xfc19dc6',
 '0x240ca1cc',
 '0x2de92c6f',
 '0x4a7484aa',
 '0x5cb0a9dc',
 '0x76f988da',
 '0x983e5152',
 '0xa831c66d',
 '0xb00327c8',
 '0xbf597fc7',
 '0xc6e00bf3',
 '0xd5a79147',
 '0x6ca6351',
 '0x14292967',
 '0x27b70a85',
 '0x2e1b2138',
 '0x4d2c6dfc',
 '0x53380d13',
 '0x650a7354',
 '0x766a0abb',
 '0x81c2c92e',
 '0x92722c85',
 '0xa2bfe8a1',
 '0xa81a664b',
 '0xc24b8b70',
 '0xc76c51a3',
 '0xd192e819',
 '0xd6990624',
 '0xf40e3585',
 '0x106aa070',
 '0x19a4c116',
 '0x1e376c08',
 '0x2748774c',
 '0x34b0bcb5',
 '0x391c0cb3',
 '0x4ed8aa4a',
 '0x5b9cca4f',
 '0x682e6ff3',
 '0x748f82ee',
 '0x78a5636f',
 '0x84c87814',
 '0x8cc70208',
 '0x90befffa',
 '0xa4506ceb',
 '0xbef9a3f7',
 '0xc67178f2']

## Problem 3: Padding


### Overview
In accordance with the Secure Hash Standard, messages must be padded before the hashing process begins. In this problem, we implement a generator functions that processes messages in accordance with Section 5.1.1 and 5.2.1 of the standard. The purpose of padding is to ensure that the final message length is a multiple of 512 bits (64 bytes), as SHA-224 and SHA-256 process data strictly in 512-bit blocks.<br>The following functions specify how this padding will be performed.

### File To Blocks Function
**Purpose:**<br>
For the padding process to work, the input message must be processed in 64-byte blocks. This function reads the file in fixed-size chunks without loading the entire file into memory, which is important for hashing large files. The yield statement turns the function into a generator, producing each block one at a time. ([generator yield explanation](https://www.datacamp.com/tutorial/python-generators))

**What the function is doing:**<br>

**f = open(filepath, 'rb')**:<br>
Opens the file in binary mode and assigns file to object `f`.<br>

**while block := f.read(no_bytes):**<br>
Reads the next `no_bytes` (default 64) from the file and assigns it to block.
The loop continues as long as `f.read()` returns a non-empty bytes object.
When the end of the file is reached, `f.read()` returns b'', which stops the loop.

**yield block:**<br>
`yield` returns the current block and pauses the function.
Each time the generator is called again, execution resumes and the next block is read.




In [183]:
def file_to_blocks(filepath, no_bytes=64):
  """Generator function that reads a file in binary mode and yields itin 64-byte blocks. 
    Each call to the generator returns the next block until the end of the file is reached."""
  #Opening file in binary mode.
  f = open(filepath, 'rb')
  #Yielding each new block of the file.
  while block := f.read(no_bytes):
      yield block

### Padded Message Generator Function

**Purpose:**<br>
Before the message block can be processed, it must be padded to ensure that its length is a multiple of 512 bits (64 bytes). Padding is accomplished by adding a 1-bit followed by as many 0-bits as necessary to achieve the desired length. The length of the original message is also appended to the end of the padded message, expressed as a 64-bit binary number. ([padding explanation](https://enlear.academy/blockchain-deep-dive-into-sha-256-secure-hash-algorithm-256-bit-824ac0e90b24)).<br>
This ensures the message is in the correct format to be split into 512-bit blocks and processed safely by the hashing algorithm.

**What the function is doing:**<br>
1. **no_bits = 0:**
Initialising the counter storing how many bits of the original message have been read.
-------------------------------------------
2. **for block in file_to_blocks(filepath):**<br>
Read the file in 64-byte chunks using the generator `file_to_blocks`.
    * **no_bits = no_bits + (len(block) * 8):**<br>
Count the number of bits in each block and add to the total.
    * **if len(block) == 64: yield block:**<br>
If the block is a full 64 bytes, yield it immediately.
-------------------------------------------
3.  **if no_bits == 0: block = bytes():**<br>
If the file is empty `byte()` creates an empty block so the padding logic can still run. 
-------------------------------------------

4. To complete the padding in one block we need 9 bytes free:<br>
    * `0x80` - 1 byte
    * The final message length - 8 bytes
    * zero padding in between - whatever space remains, Total space needed:<br>
**9 bytes minimum**

**if (64 - len(block)) >= 9:**<br>
If the length of the available bytes in a block is 9 or more, theres enough room to finish padding in this block.

To the block we add:<br>
* **yield (block**<br>
              **+ bytes([0x80])**<br>
                  Adds the first '1' bit (0x80 is 10000000 in binary)<br><br>
              **+ bytes([0x00] * (64 - len(block) - 1 - 8))**<br>
              The number of 0's to be padded, calculated by 64(whole block size) - the remaining space ( len(block) ) in the block - the appended 1(0x80) - the saved 8 bytes for the original message at the end<br><br>
              **+ no_bits.to_bytes(8, byteorder='big'))**<br>
              Convert original message in bits to an 8 byte big-endian value, as required by the SHA standard.<br>

----------------------------------------------------------------------


5. If there is between 1 and 8 bytes available, we need to spill the padding into the second block:

**elif (64 - len(block)) >= 1:**<br>
If the length of the available bytes is between 8 and 1 bytes, inclusive, there is space for 0x80, but not the final 8-byte length field, padding spills into second block.<br>

To the first block we add:<br>
* **yield block + bytes([0x80]) + bytes([0x00] * (64 - len(block) - 1)):**<br>
The same steps we completed in the previous if statement, without `- 8` as we append the last 8 bytes to the new block.

To the second block we add:

* **yield bytes([0x00] * 56) + no_bits.to_bytes(8, byteorder='big')**<br>
Adding 56 bytes to new block + final 8 bytes for original message length
----------------------------------------------------

6. If there are no bytes available in the last block

**else: yield bytes([0x80] + ([0x00] * 55)) + no_bits.to_bytes(8, byteorder='big'):**<br>
Building new block by adding 1 + 55 0's + last 8 bytes.

In [184]:
def padded_message_generator(filepath):
    """Return blocks of the file at filepath, padded"""
    no_bits = 0 #Number of bits read (needed for the final 8-byte length field)
    #Read file in 64 byte blocks
    for block in file_to_blocks(filepath): #Read file in 64 byte blocks
      no_bits += len(block) * 8 #Track total message length in bits
      #If the block is already a full 64 bytes, yield it 
      if len(block) == 64:
         yield block
    #Handling empty file: creating empty block so padding can be applied
    if no_bits == 0:
      block = bytes()
    #If 9 bytes are available, theres enough padding to finish the block
    if (64 - len(block)) >= 9: 
      yield (block + bytes([0x80]) + bytes([0x00] * (64 - len(block) - 1 - 8)) + no_bits.to_bytes(8, byteorder='big'))#Yield the block + Append 1 bit + Append 0 padding + Append final 8 byte length in big endian
    #If there is between 1 and 8 bytes available, spill into next block
    elif (64 - len(block)) >= 1:
       yield block + bytes([0x80]) + bytes([0x00] * (64 - len(block) - 1)) #Yield the block + Append 1 bit + Append 0 padding
       yield bytes([0x00] * 56) + no_bits.to_bytes(8, byteorder='big') #Add 56 bytes + 8 bytes for message length in big endian
    #If there are no bytes available in the last block
    else:
       yield bytes([0x80] + ([0x00] * 55)) + no_bits.to_bytes(8, byteorder='big') #Append 1 + 55 zero bytes + 8 bytes for message length in big endian

In [None]:
#Testing output when message is exactly 64 bytes 
#Writing 64 bytes of 'A' (ASCII 0x41) to file
with open("64-bytes.txt", "wb") as f:
    f.write(b"A" * 64)
#Looping through each block in file
for block in padded_message_generator("64-bytes.txt"):
    print(block.hex())

#Output:
#First block: 64 bytes of 0x41 (ASCII 'A')
#Second block: Begins with 0x80 - the '1' padding bit 
#              + 55 zero bytes
#              ends with 8 byte big endian value 0x200 = 512 in decimal
#Shows original message was 64 bytes - 512 bits

41414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141
80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200


In [None]:
#Testing output when message is exactly 63 bytes
#Writing 63 bytes of 'A' (ASCII 0x41) to file 
with open("63-bytes.txt", "wb") as f:
    f.write(b"A" * 63)
#Looping through each block in file
for block in padded_message_generator("64-bytes.txt"):
    print(block.hex())

#Output:
#First block: 63 bytes of 0x41 (ASCII 'A') + 0x80 - '1' padding bit
#Second block: Begins 56 zero bytes
#              ends with 8 byte big endian value 1f8 = 504 in decimal
#Shows original message was 63 bytes - 504 bits

41414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414180
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001f8


In [None]:
#Testing output when message is empty
#Looping through each block in file
for block in padded_message_generator("empty.txt"):
    print(block.hex())

#Output:
#Block: Begins with 0x80 - '1' padding bit at front 
#       + 55 zero bytes 
#       ends with all zeros as message input length is 0

80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000


## Problem 4: Hashes


## Problem 5: Passwords
