Testing to see if Hash can be used for file integrity monitoring. This will be useful in Forensics if a comparision can be done with Hash.

In [None]:
# Step 1: Upload the file using Colab's file uploader
from google.colab import files  # Import Colab's file upload utility

uploaded = files.upload()  # Prompt the user to upload a file. The uploaded file(s) will be saved in the Colab session storage.

uploaded_filename = list(uploaded.keys())[0]  # Get the name of the uploaded file (handles any filename or capitalization)

# Step 2: Import necessary libraries for file operations and hashing
import hashlib  # hashlib provides various secure hash functions, including SHA256 (NEW LEARNING)
import os       # os is used to check if files exist in the current directory

# Function to compute the SHA256 hash of a file
def compute_file_hash(file_path):
    """
    LEARNING:
    Computes the SHA256 hash of the given file.
    SHA256 is a cryptographic hash function that produces a unique, fixed-size 256-bit (64-character) hash.
    It is important because even a tiny change in the file will result in a completely different hash,
    making it ideal for file integrity checking.
    """
    sha256 = hashlib.sha256()  # Create a new SHA256 hash object
    with open(file_path, 'rb') as f:  # Open the file in binary mode for reading
        while True:
            data = f.read(65536)  # Read the file in 64KB chunks (efficient for large files)
            if not data:
                break  # Exit loop when no more data is read
            sha256.update(data)  # Update the hash object with the chunk of data
    return sha256.hexdigest()  # Return the final hash as a hexadecimal string (easy to read and store)

# Step 3: Check the file for the first time and print the hash
file_path = uploaded_filename  # Use the actual uploaded filename

if not os.path.exists(file_path):  # Check if the file exists in the current directory
    print(f"File not found: {file_path}. Please upload your file using the Colab file browser and rerun this cell.")
else:
    # Compute and print the hash of the uploaded file
    original_hash = compute_file_hash(file_path)
    print(f"Initial hash for '{file_path}': {original_hash}")

    # Step 4: Instruct the user to upload the file again (possibly a modified version)
    print(f"\nNow, if you want to check for changes, upload '{file_path}' again (overwrite the previous one if needed).")
    input("After uploading the file, press Enter to continue...")  # Wait for the user to upload and confirm

    # Check the file again and compare the hash
    if not os.path.exists(file_path):  # Check if the file exists after re-upload
        print(f"File not found: {file_path} after re-upload. Please make sure the file is uploaded.")
    else:
        new_hash = compute_file_hash(file_path)  # Compute the hash of the (possibly new) file
        print(f"New hash for '{file_path}': {new_hash}")
        if new_hash == original_hash:  # Compare the new hash to the original
            print("No change detected in the file.")  # The file is unchanged
        else:
            print("ALERT: The file has changed!")  # The file has been modified

Saving College.csv to College (3).csv
Initial hash for 'College (3).csv': 22bd737e1db91093552d009afe1d0f7614e40800337bbcbbb93c2e9ea63af478

Now, if you want to check for changes, upload 'College (3).csv' again (overwrite the previous one if needed).
After uploading the file, press Enter to continue...
New hash for 'College (3).csv': 22bd737e1db91093552d009afe1d0f7614e40800337bbcbbb93c2e9ea63af478
No change detected in the file.


This is a kool exercise as I can see how hash can be used to drive security and also data integrity! It can be useful with forensics and also to lock down data access.
The code is pretty simple and I am still getting used of ordering/sequencing. I need more practice with getting the right sequence, and remembering the commands (almost impossible)