**This file contains the code to:**
1. **Data Preprocessing:** Cropping the data and then later converting it into grayscale.
2. **Visualization:** Rendering the data using ```matplotlib```.
3. **Data Storage:** Saving the preprocessed data as a series of ```.png``` slices to AWS S3 Buckets.

**Installing the required Libraries:**

In [1]:
%pip install boto3 nibabel numpy matplotlib scikit-image opencv-python

Note: you may need to restart the kernel to use updated packages.


**Importing those Libraries:**

In [2]:
import boto3
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import io
import tempfile
import os

**In this Cell, we will render ```.nii``` files  from the S3 Bucket and save each file as a collection of 2d ```.png``` slices in a folder in the same S3 Bucket.**
1. **Connect to S3:** we will connect to our S3 bucket using the ```boto3``` library.
2. **Define Cropping Parameters:** we define our  ```crop``` values to crop out irrelevant data from the slices.
3. **Define Key Functions:** we implement functions to handle rendering and saving the data.

**The Rendering Function:** ```render_nii_from_s3()```

The function displays the middle 2d brain slice from a 3d ```.nii``` file. 

1. The function takes a ```.nii``` ```filename``` and locates it in the S3 Bucket.
2. Downloads the ```.nii``` file using a ```tempfile```.
3. Reads the ```data``` from the downloaded file (converting it into a vector).
3. Finds the middle index of the ```data``` vector and visualizes the corresponding slice in grayscale using ```matplotlib.pyplot```.
4. Displays the slice with an appropriate title.
5. The function includes robust error handling for scenarios such as corrupted or empty ```.nii``` files or errors during loading or visualization.

**The Saving Function:** ```save_png_from_nii()```

The function converts a 3d ```.nii``` file into a series of ```.png``` slices and uploads them to the S3 Bucket.

1. The function takes a ```.nii``` ```filename``` and locates it in the S3 Bucket.
2. Downloads the ```.nii``` file using a ```tempfile```.
3. Reads the ```data``` from the downloaded file (converting it into a vector).
4. Iterates over each ```slice_2d``` in the brain and does the following:

    - Applies the defined ```crop``` to the slice, generating a ```cropped_slice```.
    - Extracts ```brain_number``` and ```scan_type``` from the ```filename``` using delimiters.
    - Constructs the ```slice_path``` based on ```brain_number``` and ```scan_type``` for saving the ```cropped_slice``` in the S3 Bucket.
    - Names the ```.png``` file using the slice_idx (slice index) and the ```slice_path```.
    - Saves the ```cropped_slice``` as a grayscale ```.png``` file using a ```tempfile``` and uploads it to the S3 Bucket in the ```tanmay/``` directory.
    - Deletes the ```tempfile``` after the upload is complete.
    
5. The function includes robust error handling for scenarios such as corrupted or empty ```.nii``` files or failures during file processing or uploading.

In [3]:
# setting up the data pipeline to access the brains in the AWS S3 Bucket folder path:

s3 = boto3.resource('s3')
bucket_name = 'chemocraft-data'
folder_path = 'MICCAI_BraTS2020_TrainingData/'
bucket = s3.Bucket(bucket_name)

crop_left, crop_right = 20, 10
crop_top, crop_bottom = 30, 30

def render_nii_from_s3(filename): 
    print(f"Fetching file: {filename}")

    obj = bucket.Object(folder_path + filename)
    file_stream = io.BytesIO(obj.get()['Body'].read())

    with tempfile.NamedTemporaryFile(suffix='.nii', delete=False) as temp_file: # disabling autodelete of the tempfile so it can be saved
        temp_file.write(file_stream.getvalue())
        temp_file.flush()

        temp_file_path = temp_file.name
        print(f"Temporary file created: {temp_file_path}")

    try:
        img = nib.load(temp_file_path)
        data = img.get_fdata() # storing the brain into the data variable

        print(f"Data shape for {filename}: {data.shape}")

        if data.size == 0: # checking if data is nonexistent
            print(f"No data found in {filename}")
            return

        slice_idx = data.shape[2] // 2 # getting the index of the middle slice

        plt.figure(figsize=(3, 3)) # displaying the slice in a 3x3 square
        plt.imshow(data[:, :, slice_idx], cmap='gray') # color is set to grayscale
        plt.title(f'{filename} - Slice {slice_idx}') # creating a title for the image
        plt.axis('off')  # hiding the axes for a cleaner display
        plt.show() # showing the image

    except Exception as e:
        print(f"Error loading file {filename}: {e}") # reports problems with getting the file
    finally:
        try:
            os.remove(temp_file_path) 
            print(f"Deleted temporary file: {temp_file_path}")
        except OSError as cleanup_error:
            print(f"Error deleting temp file: {cleanup_error}")

def save_png_from_nii(filename):
    print(f"Fetching file: {filename}")
    obj = bucket.Object(folder_path + filename)
    file_stream = io.BytesIO(obj.get()['Body'].read())

    with tempfile.NamedTemporaryFile(suffix='.nii', delete=False) as temp_file:
        temp_file.write(file_stream.getvalue())
        temp_file.flush()

        temp_file_path = temp_file.name
        print(f"Temporary file created: {temp_file_path}")
    
        try:
            img = nib.load(temp_file_path)
            data = img.get_fdata()
            
            start_y = crop_top
            end_y = data.shape[0] - crop_bottom
            start_x = crop_left
            end_x = data.shape[1] - crop_right

            if data.size == 0:
                print(f"No data found in {filename}")
                return
            
            for slice_idx in range(data.shape[2]): # for each slice in the .nii file
                slice_2d = data[:, :, slice_idx]
                cropped_slice = slice_2d[start_y:end_y, start_x:end_x]

                # creating the name and path for the .png slice:

                filename = filename.removesuffix(".nii") 
                brain_number = filename.split('_')[-2]
                scan_type = filename.split('_')[-1]

                slice_path = f"brain_slices/{brain_number}/{scan_type}"
                print(f"Saving file in directory: {slice_path}") 

                png_filename = f"{slice_path}/{slice_idx}.png" 
                
                with tempfile.NamedTemporaryFile(suffix= '.png', delete=False) as temp_png: # creates a temp .png file used to save the grayscale brain slice
                    mpimg.imsave(temp_png.name, cropped_slice, cmap='gray')
                    temp_png.flush()
                    temp_png.seek(0)
                    temp_png_name = temp_png.name
                try: 
                    s3.Bucket(bucket_name).upload_file(temp_png_name, f"tanmay/{png_filename}")
                    os.remove(temp_png_name)
                except Exception as e:
                    print(f"Error saving file: {png_filename}, {e}")
                    
        except Exception as e:
            print(f"Error saving file {filename}: {e}")

found_files = False

i=0 # counter for the number of brains in the dataset

for obj in bucket.objects.filter(Prefix=folder_path): # for each .nii file
    if obj.key.endswith('.nii'): 
        found_files = True
        # print(obj.key)
        filename = obj.key.split('/')[-2] + '/' + obj.key.split('/')[-1]  # Get the filename 
        # print(filename)
        # render_nii_from_s3(filename) 
        # save_png_from_nii(filename)
        i+=1

print(f"There are {i} brains. ('.nii' files)")
if not found_files: # checks if the directory is empty
    print(f"No .nii files found in the folder {folder_path}")

There are 495 brains. ('.nii' files)
