**The goal of this GenAI project is to create a Machine Learning Model capable of generating different types of 3D ```.nii``` Brain scans which contain malignant tumor.**

This notebook contains the ```python``` code as well as the related documentation.

There are two approaches being experimented with:
1. Turning 3D ```.nii``` scans into 155 individual 2D ```.png``` slices and using those to train the GAN, and in turn, the GAN will generate 2D slices.
2. Training the GAN on vectorized 3D ```.nii``` files, and in turn, the GAN will also generate a 3D model which can later be converted to a ```.nii``` file.

Installing the required Libraries:

In [1]:
%pip install boto3 nibabel numpy matplotlib scikit-image opencv-python




Importing those Libraries:

In [2]:
import boto3
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import io
import tempfile
import os

**In this Cell, we will be rendering the ```.nii``` files  from the S3 Bucket, and then saving each one as a collection of 2D ```.png``` slices in the S3 Bucket.**
1. First, we will connect to our bucket using the ```boto3``` library.
2. Next, we define our  ```crop``` values, which will be used to crop out the useless data.
3. Then, we define our two important functions for rendering data for the user to see and for saving the data as slices.

**For the Rendering Function:**

We define our function  ```render_nii_from_s3()``` which uses a ```.nii``` file to show the middle 2D Brain Slice. 
1. The function takes in a ```.nii``` ```filename```, and finds it in the S3 Bucket.
2. Using a ```tempfile```, the ```.nii``` file is stored.
3. The ```data``` is then read from the ```tempfile```. (It is a vector now.)
3. The middle index is then found from the ```data``` vector, and then ```matplotlib```'s ```pyplot``` finally displays it in ```grayscale``` using the appropriate title. 
4. The function also handles any errors that could arise in the entire process, such as the the ```.nii``` file being empty or corrupted, or errors which could arise when loading the ```.nii``` file or displaying the ```slice```.

**For the Saving Function:**

We define our function ```save_png_from_nii()``` which uses a ```.nii``` file and stores it as a collection of ```.png``` images in the S3 Bucket.
1. The function takes in a ```.nii``` ```filename```, and finds it in the S3 Bucket.
2. Using a ```tempfile```, the ```.nii``` file is stored.
3. The ```data``` is then read from the ```tempfile```. (It is a vector now.)
4. The ```crop``` is then applied to the ```data``` using the ```crop``` values defined at the start.
5. 

In [3]:
# Setting up the data pipeline to access the 5 brains in the AWS S3 Bucket folder path:

s3 = boto3.resource('s3')
bucket_name = 'chemocraft-data'
folder_path = 'MICCAI_BraTS2020_TrainingData/'
bucket = s3.Bucket(bucket_name)

# Setting Crop values for Data Preprocessing:

crop_left, crop_right = 20, 10
crop_top, crop_bottom = 30, 30

def render_nii_from_s3(filename): # Function to display the middle slice of each brain scan type
    print(f"Fetching file: {filename}")

    obj = bucket.Object(folder_path + filename)
    file_stream = io.BytesIO(obj.get()['Body'].read())

    # Using temp files for efficent computing:

    with tempfile.NamedTemporaryFile(suffix='.nii', delete=False) as temp_file:  # Disable auto-delete
        temp_file.write(file_stream.getvalue())
        temp_file.flush()

        temp_file_path = temp_file.name
        print(f"Temporary file created: {temp_file_path}")

    try:
        img = nib.load(temp_file_path)
        data = img.get_fdata() # Storing brain into data variable

        print(f"Data shape for {filename}: {data.shape}") # Displaying shape of each brain

        if data.size == 0: # if data is nonexistent
            print(f"No data found in {filename}")
            return

        slice_idx = data.shape[2] // 2 # Getting index of middle slice to display it

        plt.figure(figsize=(3, 3)) # Displaying in 3x3 square
        plt.imshow(data[:, :, slice_idx], cmap='gray') # Color is set to grayscale
        plt.title(f'{filename} - Slice {slice_idx}') # Creating a title for the image
        plt.axis('off')  # Hide axes for cleaner display
        plt.show() # Finally showing the image

    except Exception as e:
        print(f"Error loading file {filename}: {e}") # Reports problems with getting the file
    finally:
        try:
            os.remove(temp_file_path)  # We do not want to save files locally, so we now delete the temp files
            print(f"Deleted temporary file: {temp_file_path}")
        except OSError as cleanup_error:
            print(f"Error deleting temp file: {cleanup_error}")

def save_png_from_nii(filename): # Function which will save .png grayscale brain slices to AWS S3 Buckets
    print(f"Fetching file: {filename}")
    obj = bucket.Object(folder_path + filename)
    file_stream = io.BytesIO(obj.get()['Body'].read())

    with tempfile.NamedTemporaryFile(suffix='.nii', delete=False) as temp_file:  # Disable auto-delete
        temp_file.write(file_stream.getvalue())
        temp_file.flush()

        temp_file_path = temp_file.name
        print(f"Temporary file created: {temp_file_path}")
    
        try:
            img = nib.load(temp_file_path)
            data = img.get_fdata()
            
            start_y = crop_top
            end_y = data.shape[0] - crop_bottom
            start_x = crop_left
            end_x = data.shape[1] - crop_right

            if data.size == 0:
                print(f"No data found in {filename}")
                return
            
            for slice_idx in range(data.shape[2]): # For each slice of each brain
                slice_2d = data[:, :, slice_idx]
                cropped_slice = slice_2d[start_y:end_y, start_x:end_x]

                filename = filename.removesuffix(".nii") # Removes the .nii part

                # Folder for each brain inside the Brain_Slices:

                brain_number = filename.split('_')[-2]
                scan_type = filename.split('_')[-1]

                slice_path = f"brain_slices/{brain_number}/{scan_type}"
                print(f"Saving file in directory: {slice_path}") 

                png_filename = f"{slice_path}/{slice_idx}.png" 
                
                with tempfile.NamedTemporaryFile(suffix= '.png', delete=False) as temp_png: # This part creates a temp png .file used to save the grayscale brain slice
                    mpimg.imsave(temp_png.name, cropped_slice, cmap='gray')
                    temp_png.flush()
                    temp_png.seek(0)
                    temp_png_name = temp_png.name
                try: 
                    s3.Bucket(bucket_name).upload_file(temp_png_name, f"tanmay/{png_filename}")
                    os.remove(temp_png_name)
                except Exception as e:
                    print(f"Error saving file: {png_filename}, {e}")
                    
        except Exception as e:
            print(f"Error saving file {filename}: {e}")

found_files = False

i=0 # Counter for the number of brains

for obj in bucket.objects.filter(Prefix=folder_path):
    if obj.key.endswith('.nii'):
        found_files = True
        # print(obj.key)
        filename = obj.key.split('/')[-2] + '/' + obj.key.split('/')[-1]  # Get the filename 
        # render_nii_from_s3(filename)
        # save_png_from_nii(filename)
        i+=1

print(f"There are {i} brains. ('.nii' files)")
if not found_files:
    print(f"No .nii files found in the folder {folder_path}")

There are 495 brains. ('.nii' files)


Defining the GAN Architecture
1. The Generator
2. The Discriminator
3. Connecting the Generator & Discriminator
4. Compiling the Model

The Generator:

In [4]:
# Generator's output shape must be the same shape as the cropped real image: (210, 180, 1)

from tensorflow.keras import layers, models

dim = 100

def build_generator(latent_dim, output_shape=(210, 180, 1)):
    model = models.Sequential(name="ChemoCraft_Generator")
    print("Building Generator")

    model.add(layers.Input(shape=(latent_dim,)))
    model.add(layers.Dense(128, activation="relu"))
    model.add(layers.Dense(256 * 32, activation="relu"))
    model.add(layers.Reshape(target_shape=(16, 16, 32)))
    model.add(layers.Conv2DTranspose(filters=32, kernel_size=5, strides=6, padding="same", activation="relu"))
    model.add(layers.Conv2DTranspose(filters=8, kernel_size=3, strides=5, padding="same", activation="relu"))
    
    prev_out = model.layers[-1].output.shape

    model.add(layers.Conv2D(1, kernel_size=(prev_out[1]-output_shape[0]+1, prev_out[2]-output_shape[1]+1), strides=1, padding="valid", activation="tanh"))

    return model

chemocraft_generator = build_generator(latent_dim=dim)
chemocraft_generator.summary()

Building Generator


The Discriminator:

In [5]:
# Discriminator's input shape should be the same shape as the generator's output: (210, 180, 1)

shape = (210, 180, 1)

def build_discriminator(input_shape=(210, 180, 1)):

    model = models.Sequential(name="ChemoCraft_Discriminator")
    print("Building Discriminator Model")

    model.add(layers.Input(shape=input_shape))
    model.add(layers.Conv2D(filters=16, kernel_size=9, strides=5, padding="same", activation="relu")) 
    model.add(layers.Conv2D(filters=32, kernel_size=5, strides=4, padding="same", activation="relu"))
    model.add(layers.Conv2D(filters=128, kernel_size=3, strides=3, padding="same", activation="relu"))
    model.add(layers.Flatten())
    model.add(layers.Dense(1, activation="sigmoid", name="output"))

    return model
   
chemocraft_discriminator = build_discriminator(shape)
chemocraft_discriminator.summary()

Building Discriminator Model


Connecting Generator & Discriminator through the GAN:

In [6]:
def compile_gan(generator, discriminator, latent_dim):
    discriminator.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
    z = layers.Input(shape=(latent_dim,))
    img = generator(z)
    discriminator.trainable = False
    validity = discriminator(img)
    gan = models.Model(z, validity)
    gan.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
    return gan

chemocraft_gan = compile_gan(chemocraft_generator, chemocraft_discriminator,latent_dim=dim)
chemocraft_gan.summary()

Setting up Data Pipeline for Training:

In [7]:
s3 = boto3.resource('s3')
bucket_name = 'chemocraft-data'
folder_path = 'MICCAI_BraTS2020_TrainingData/'
bucket = s3.Bucket(bucket_name)

folder_path = 'tanmay/brain_slices/'

keys = []

for obj in bucket.objects.filter(Prefix=folder_path):
    if obj.key.endswith('.png'):
        sample_key = obj.key.split('/')[-3] # Getting the Brain numbers.
        if sample_key not in keys:
            keys.append(sample_key)

print(len(keys))

99


In [8]:
from keras.preprocessing.image import load_img

folder_path = 'tanmay/brain_slices/'

def load_images(bucket_name, folder_path, folder_suffix):
    directory = f"{folder_path}{folder_suffix}/"
    print(f"Loading images from S3 Bucket: {bucket_name}{directory}")
    images = []

    for obj in bucket_name.objects.filter(Prefix=directory):
        if obj.key.endswith('.png'):
            try:
                file_stream = io.BytesIO(obj.get()['Body'].read())
                image = load_img(file_stream, target_size=(210, 180), color_mode='grayscale')
                print(f"Adding {obj.key.removeprefix("tanmay/brain_slices/")} into an array.")
                image = np.array(image) / 255.0  # Normalize to [0, 1]
                images.append(image)
            
            except Exception as e:
                print(f"Error loading image {obj.key}: {e}")

    return np.array(images)

my_arr = load_images(bucket_name=bucket, folder_path=folder_path, folder_suffix="320/flair") # Testing functionality on a small folder
print(my_arr.shape)

Loading images from S3 Bucket: s3.Bucket(name='chemocraft-data')tanmay/brain_slices/320/flair/
Adding 320/flair/0.png into an array.
Adding 320/flair/1.png into an array.
Adding 320/flair/10.png into an array.
Adding 320/flair/100.png into an array.
Adding 320/flair/101.png into an array.
Adding 320/flair/102.png into an array.
Adding 320/flair/103.png into an array.
Adding 320/flair/104.png into an array.
Adding 320/flair/105.png into an array.
Adding 320/flair/106.png into an array.
Adding 320/flair/107.png into an array.
Adding 320/flair/108.png into an array.
Adding 320/flair/109.png into an array.
Adding 320/flair/11.png into an array.
Adding 320/flair/110.png into an array.
Adding 320/flair/111.png into an array.
Adding 320/flair/112.png into an array.
Adding 320/flair/113.png into an array.
Adding 320/flair/114.png into an array.
Adding 320/flair/115.png into an array.
Adding 320/flair/116.png into an array.
Adding 320/flair/117.png into an array.
Adding 320/flair/118.png into a

Batch Training:

In [9]:
import tensorflow as tf

def train_gan(generator, latent_dim, discriminator, gan, training_array, batch_size):
    for _ in range(len(training_array) // batch_size):
        # Select random batch of real images
        idx = np.random.randint(0, len(training_array), batch_size)
        real_slices = np.array([training_array[i] for i in idx])

        # Generate fake images
        noise = np.random.normal(0, 1, (batch_size, latent_dim))
        fake_slices = generator.predict(noise)

        # Train the discriminator
        d_loss_real = discriminator.train_on_batch(real_slices, np.ones((batch_size, 1)))
        d_loss_fake = discriminator.train_on_batch(fake_slices, np.zeros((batch_size, 1)))
        d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

        # Train the generator
        noise = np.random.normal(0, 1, (batch_size, latent_dim))
        g_loss = gan.train_on_batch(noise, np.ones((batch_size, 1)))

        print(f"D Loss: {d_loss[0]}, G Loss: {g_loss[0]}")

In [10]:
folder_path = 'tanmay/brain_slices/'

epochs = 150

for epoch in range(epochs):
    for key in keys:
        brain_array = load_images(bucket_name=bucket, folder_path=folder_path, folder_suffix=key)
        print("Training Gan now:")
        train_gan(generator=chemocraft_generator, latent_dim=dim, discriminator=chemocraft_discriminator, gan=chemocraft_gan, training_array=brain_array, batch_size=5)
        print(f"Epoch {epoch + 1}/{epochs}")

Loading images from S3 Bucket: s3.Bucket(name='chemocraft-data')tanmay/brain_slices/002/
Adding 002/flair/0.png into an array.
Adding 002/flair/1.png into an array.
Adding 002/flair/10.png into an array.
Adding 002/flair/100.png into an array.
Adding 002/flair/101.png into an array.
Adding 002/flair/102.png into an array.
Adding 002/flair/103.png into an array.
Adding 002/flair/104.png into an array.
Adding 002/flair/105.png into an array.
Adding 002/flair/106.png into an array.
Adding 002/flair/107.png into an array.
Adding 002/flair/108.png into an array.
Adding 002/flair/109.png into an array.
Adding 002/flair/11.png into an array.
Adding 002/flair/110.png into an array.
Adding 002/flair/111.png into an array.
Adding 002/flair/112.png into an array.
Adding 002/flair/113.png into an array.
Adding 002/flair/114.png into an array.
Adding 002/flair/115.png into an array.
Adding 002/flair/116.png into an array.
Adding 002/flair/117.png into an array.
Adding 002/flair/118.png into an arra

AttributeError: 'NoneType' object has no attribute 'update_state'