<a href="https://colab.research.google.com/github/tylerb55/COMP530/blob/main/ResNet50Fed.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Load the dataset into colab**

In [2]:
! git clone https://github.com/tylerb55/COMP530.git

Cloning into 'COMP530'...
remote: Enumerating objects: 7323, done.[K
remote: Counting objects: 100% (230/230), done.[K
remote: Compressing objects: 100% (223/223), done.[K
remote: Total 7323 (delta 19), reused 209 (delta 7), pack-reused 7093[K
Receiving objects: 100% (7323/7323), 1.09 GiB | 19.98 MiB/s, done.
Resolving deltas: 100% (244/244), done.
Checking out files: 100% (5010/5010), done.


In [3]:
! pip install --upgrade tensorflow-federated==0.20.0

Collecting tensorflow-federated==0.20.0
  Downloading tensorflow_federated-0.20.0-py2.py3-none-any.whl (819 kB)
[?25l[K     |▍                               | 10 kB 32.7 MB/s eta 0:00:01[K     |▉                               | 20 kB 8.8 MB/s eta 0:00:01[K     |█▏                              | 30 kB 7.7 MB/s eta 0:00:01[K     |█▋                              | 40 kB 3.5 MB/s eta 0:00:01[K     |██                              | 51 kB 3.7 MB/s eta 0:00:01[K     |██▍                             | 61 kB 4.4 MB/s eta 0:00:01[K     |██▉                             | 71 kB 4.6 MB/s eta 0:00:01[K     |███▏                            | 81 kB 4.8 MB/s eta 0:00:01[K     |███▋                            | 92 kB 5.4 MB/s eta 0:00:01[K     |████                            | 102 kB 4.2 MB/s eta 0:00:01[K     |████▍                           | 112 kB 4.2 MB/s eta 0:00:01[K     |████▉                           | 122 kB 4.2 MB/s eta 0:00:01[K     |█████▏                      

# **Import Necessary Libraries**

In [96]:
import numpy as np
import collections
import matplotlib.image as img
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_federated as tff
import tensorflow_datasets as tfds
import nest_asyncio as na
import random 

In [97]:
NUM_CLIENTS=5 
NUM_EPOCHS=10
BATCH_SIZE=32
SHUFFLE_BUFFER=100
PREFETCH_BUFFER=10

# **Load the dataset into the environment and make it a federated dataset**

In [98]:
def Train_and_Test_Set(directory_name):
  """a function to load the images in a large directory into a tensorflow dataset object
  the data is split 80:20 in training:test. The dataset is shuffled before splitting
  and the images are formatted to 512x512 pixels (pixel values range from 0 to 255)
  Images are labelled 0,1 based on the folder they are in. Normalcases are 0 and 
  cancercases are 1"""
  dataset_train=tf.keras.preprocessing.image_dataset_from_directory('/content/COMP530/'+directory_name,
                                                                    labels='inferred',
                                                                    label_mode='int',
                                                                    class_names=['NormalCases','cancercases'],
                                                                    color_mode='rgb',
                                                                    image_size=(512,512),
                                                                    shuffle=True,
                                                                    seed=305,
                                                                    validation_split=0.2,
                                                                    subset='training',
                                                                    batch_size=None
                                                                    )
  
  dataset_test=tf.keras.preprocessing.image_dataset_from_directory('/content/COMP530/'+directory_name,
                                                                    labels='inferred',
                                                                    label_mode='int',
                                                                    class_names=['NormalCases','cancercases'],
                                                                    color_mode='rgb',
                                                                    image_size=(512,512),
                                                                    shuffle=True,
                                                                    seed=305,
                                                                    validation_split=0.2,
                                                                    subset='validation',
                                                                    batch_size=None
                                                                    )

  return dataset_train,dataset_test

In [99]:
def Train_Test_Set(directory_name):
  """a function to load the images in a large directory into a tensorflow dataset object
  the data is split 80:20 in training:test. The dataset is shuffled before splitting
  and the images are formatted to 512x512 pixels (pixel values range from 0 to 255)
  Images are labelled 0,1 based on the folder they are in. Normalcases are 0 and 
  cancercases are 1"""
  dataset_train=tf.keras.preprocessing.image_dataset_from_directory('/content/COMP530/'+directory_name,
                                                                    labels='inferred',
                                                                    label_mode='int',
                                                                    class_names=['Normal','Cancer'],
                                                                    color_mode='rgb',
                                                                    image_size=(512,512),
                                                                    shuffle=True,
                                                                    seed=305,
                                                                    validation_split=0.2,
                                                                    subset='training',
                                                                    batch_size=None
                                                                    )
  
  dataset_test=tf.keras.preprocessing.image_dataset_from_directory('/content/COMP530/'+directory_name,
                                                                    labels='inferred',
                                                                    label_mode='int',
                                                                    class_names=['Normal','Cancer'],
                                                                    color_mode='rgb',
                                                                    image_size=(512,512),
                                                                    shuffle=True,
                                                                    seed=305,
                                                                    validation_split=0.2,
                                                                    subset='validation',
                                                                    batch_size=None
                                                                    )

  return dataset_train,dataset_test

In [100]:
def federate_dataset(Dataset,clients):
  """
  args:
  Dataset - the dataset object to be passed into the function and separated between the clients in the simulation
  clients - the number of clients for the dataset to have come from in the simulated environment
  return:
  dataset_by_client - the original dataset federated between the number of clients. 
  each client is labelled by their client id, in a dictionary. each id points to an individual dataset object
  """
  image_count=tf.data.experimental.cardinality(Dataset).numpy()
  image_per_set=int(np.floor(image_count/clients))

  client_train_dataset=collections.OrderedDict()
  Dataset=tfds.as_numpy(Dataset)
  count=0
  client_num=1
  y=[]
  x=[]
  """this code snippet assigns particular training examples to each client. In this example they have been evenly distributed
  hetrogeneity of federated data could be explored by randomly assigning a clients to each training example.
  This would result in some clients having more than enough training data locally and some  and some suffering from data
  paucity locally. """
  for image in Dataset:
    count+=1
    y.append(image[1])
    x.append(image[0])
    if(count==image_per_set):
      x=np.asarray(x,dtype=np.float32)
      y=np.asarray(y,dtype=np.int32)
      data=collections.OrderedDict((('label', y), ('image', x)))
      client_train_dataset["client_"+str(client_num)]=data
      count=0
      client_num+=1
      y=[]
      x=[]

  federated_dataset=tff.simulation.datasets.TestClientData(client_train_dataset)
  return federated_dataset

In [101]:
def preprocess(dataset):

  def format_batch(element):
    """flatten the pixels in a batch and return the features as an 'OrderedDict'"""
    tf.image.per_image_standardization(element['image'])
    return collections.OrderedDict(
        x=tf.image.resize(element['image'],(224,224)),
        y=tf.reshape(element['label'],[-1,1])
    )
  return dataset.repeat(NUM_EPOCHS).shuffle(SHUFFLE_BUFFER, seed=1).batch(BATCH_SIZE).map(format_batch).prefetch(PREFETCH_BUFFER)

In [102]:
  data_augmentation = tf.keras.Sequential([
  tf.keras.layers.RandomFlip("horizontal_and_vertical"),
  tf.keras.layers.RandomRotation(0.2),
  ])

In [103]:
def make_federated_data(client_data,client_ids,training):
  """ the federated dataset that can be passed into the federated environemt to train or test the network """
  data_augmentation = tf.keras.Sequential([
  tf.keras.layers.RandomFlip("horizontal_and_vertical"),
  tf.keras.layers.RandomRotation(0.2),
  ])
  if training:
    client_ids=random.sample(client_ids,random.randint(1,NUM_CLIENTS))
  return[
         preprocess(client_data.create_tf_dataset_for_client(x)).map(lambda x: (data_augmentation(x['x']),x['y']))
         for x in client_ids         
  ]

In [104]:
""" the original dataset held in keras dataset objects """
dataset_train, dataset_test=Train_and_Test_Set("Dataset1")
#dataset_train, dataset_test=Train_Test_Set("IQQ-OTHNCCD+")
""" the dataset split by the which client the data has come from """
federated_train_dataset=federate_dataset(dataset_train,NUM_CLIENTS)
federated_test_dataset=federate_dataset(dataset_test,NUM_CLIENTS)
""" an example dataset for a single client used to get the input specification for the federated model """
example_dataset = federated_train_dataset.create_tf_dataset_for_client(federated_train_dataset.client_ids[0])
preprocessed_example_dataset=preprocess(example_dataset)

Found 1097 files belonging to 2 classes.
Using 878 files for training.
Found 1097 files belonging to 2 classes.
Using 219 files for validation.


# **Load a model pretrained on the imagenet dataset as the CNN for the environment**

In [105]:
def ResNet_model():
  base_model=tf.keras.applications.ResNet50(weights='imagenet',input_shape=(224,224,3),include_top=False) # use resnet50 as the base for the tl model
  base_model.trainable = False # freeze the resnet50 layers

  inputs = tf.keras.Input(shape=(224, 224, 3))
  x = base_model(inputs, training=False)# Convert features of shape `base_model.output_shape[1:]` to vectors
  x = tf.keras.layers.GlobalAveragePooling2D()(x)
  x = tf.keras.layers.Dropout(0.2)(x)
  x = tf.keras.layers.Dense(64,activation='relu')(x)
  x = tf.keras.layers.Dense(32,activation='relu')(x)
  x = tf.keras.layers.Dense(16,activation='relu')(x)
  outputs = tf.keras.layers.Dense(1,activation='sigmoid')(x)# A Dense classifier with a single unit (binary classification)
  model = tf.keras.Model(inputs, outputs)
  return model


In [106]:
def MobileNet_model():
  base_model=tf.keras.applications.MobileNetV3Small(weights='imagenet',input_shape=(224,224,3),include_top=False) # use mobilenetv2 as the base for the tl model
  base_model.trainable = False # freeze the mobilenetv2 layers

  inputs = tf.keras.Input(shape=(224, 224, 3))
  x = base_model(inputs, training=False)# Convert features of shape `base_model.output_shape[1:]` to vectors
  x = tf.keras.layers.GlobalAveragePooling2D()(x)
  x = tf.keras.layers.Dropout(0.2)(x)
  x = tf.keras.layers.Dense(64,activation='relu')(x)
  x = tf.keras.layers.Dense(32,activation='relu')(x)
  x = tf.keras.layers.Dense(16,activation='relu')(x)
  outputs = tf.keras.layers.Dense(1,activation='sigmoid')(x)# A Dense classifier with a single unit (binary classification)
  model = tf.keras.Model(inputs, outputs)
  return model

In [107]:
def Inception_model():
  base_model=tf.keras.applications.InceptionV3(weights='imagenet',input_shape=(224,224,3),include_top=False) # use Inceptionv3 as the base for the tl model
  base_model.trainable = False # freeze the inceptionv3 layers

  inputs = tf.keras.Input(shape=(224, 224, 3))
  x = base_model(inputs, training=False)# Convert features of shape `base_model.output_shape[1:]` to vectors
  x = tf.keras.layers.GlobalAveragePooling2D()(x)
  x = tf.keras.layers.Dropout(0.2)(x)
  x = tf.keras.layers.Dense(64,activation='relu')(x)
  x = tf.keras.layers.Dense(32,activation='relu')(x)
  x = tf.keras.layers.Dense(16,activation='relu')(x)
  outputs = tf.keras.layers.Dense(1,activation='sigmoid')(x)# A Dense classifier with a single unit (binary classification)
  model = tf.keras.Model(inputs, outputs)
  return model

In [108]:
#resnet=ResNet_model()
#resnet.summary()
mobilenet=MobileNet_model()
mobilenet.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v3/weights_mobilenet_v3_small_224_1.0_float_no_top_v2.h5
Model: "model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_10 (InputLayer)       [(None, 224, 224, 3)]     0         
                                                                 
 MobilenetV3small (Functiona  (None, 7, 7, 576)        939120    
 l)                                                              
                                                                 
 global_average_pooling2d_4   (None, 576)              0         
 (GlobalAveragePooling2D)                                        
                                                                 
 dropout_4 (Dropout)         (None, 576)               0         
                                                                 
 dense_16 (Dense)            (None, 64)  

# **Create the federated environment based on the ResNet50 model** 

In [109]:
def federated_model():
  #resnet=ResNet_model()
  mobilenet=MobileNet_model()
  return tff.learning.from_keras_model(
      #resnet,
      mobilenet,
      input_spec=preprocessed_example_dataset.element_spec,
      loss=tf.keras.losses.BinaryCrossentropy(),
      metrics=[tf.keras.metrics.BinaryAccuracy(),tf.keras.metrics.Precision(),tf.keras.metrics.TruePositives(),tf.keras.metrics.TrueNegatives(),tf.keras.metrics.FalsePositives(),tf.keras.metrics.FalseNegatives()])

In [110]:
iterative_process = tff.learning.build_federated_averaging_process(
    federated_model,
    client_optimizer_fn=lambda: tf.keras.optimizers.Adam(learning_rate=0.0001),
    server_optimizer_fn=lambda: tf.keras.optimizers.Adam(learning_rate=0.001))

In [111]:
%load_ext tensorboard

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


In [112]:
%reload_ext tensorboard

In [113]:
na.apply()
logdir = "/tmp/logs/scalars/training/"
summary_writer = tf.summary.create_file_writer(logdir)
state=iterative_process.initialize()
NUM_ROUNDS=101
with summary_writer.as_default():
  for round_num in range(1, NUM_ROUNDS):
    federated_train_data = make_federated_data(federated_train_dataset, federated_train_dataset.client_ids,training=True)
    state, metrics = iterative_process.next(state, federated_train_data)
    train=metrics['train']
    specificity=train['true_negatives']/(train['true_negatives']+train['false_positives'])
    sensitivity=train['true_positives']/(train['true_positives']+train['false_negatives'])
    tf.summary.scalar('Specificity', specificity, step=round_num)
    tf.summary.scalar('Sensitivity', sensitivity, step=round_num)
    print('round {:2d}, metrics={}'.format(round_num, metrics['train']),'(Specificity,',specificity,') (Sensitivity,',sensitivity,')')
    for name, value in metrics['train'].items():
        tf.summary.scalar(name, value, step=round_num)
        if(name=='recall'):
          break

round  1, metrics=OrderedDict([('binary_accuracy', 0.6150857), ('precision', 0.6194412), ('true_positives', 5321.0), ('true_negatives', 61.0), ('false_positives', 3269.0), ('false_negatives', 99.0), ('loss', 0.69165295), ('num_examples', 8750), ('num_batches', 275)]) (Specificity, 0.018318318 ) (Sensitivity, 0.98173434 )
round  2, metrics=OrderedDict([('binary_accuracy', 0.64), ('precision', 0.64), ('true_positives', 2240.0), ('true_negatives', 0.0), ('false_positives', 1260.0), ('false_negatives', 0.0), ('loss', 0.68997616), ('num_examples', 3500), ('num_batches', 110)]) (Specificity, 0.0 ) (Sensitivity, 1.0 )
round  3, metrics=OrderedDict([('binary_accuracy', 0.64), ('precision', 0.64), ('true_positives', 1120.0), ('true_negatives', 0.0), ('false_positives', 630.0), ('false_negatives', 0.0), ('loss', 0.68911695), ('num_examples', 1750), ('num_batches', 55)]) (Specificity, 0.0 ) (Sensitivity, 1.0 )
round  4, metrics=OrderedDict([('binary_accuracy', 0.61904764), ('precision', 0.6190476

# **Output the results to graph visualisations**

In [114]:
!ls {logdir}
%tensorboard --logdir {logdir} --port=0

events.out.tfevents.1651831952.373e2ff7fdb6.5364.0.v2
events.out.tfevents.1651835340.373e2ff7fdb6.5364.1.v2
events.out.tfevents.1651840008.373e2ff7fdb6.5364.2.v2
events.out.tfevents.1651841478.373e2ff7fdb6.5364.3.v2
events.out.tfevents.1651848972.373e2ff7fdb6.5364.4.v2


Reusing TensorBoard on port 40581 (pid 2630), started 6:25:37 ago. (Use '!kill 2630' to kill it.)

<IPython.core.display.Javascript object>

# **Evaluation on the test set**

In [115]:
evaluation=tff.learning.build_federated_evaluation(federated_model)

In [116]:
federated_test_data = make_federated_data(federated_test_dataset, federated_test_dataset.client_ids,training=False)

In [117]:
test_metrics = evaluation(state.model, federated_test_data)

In [118]:
test=test_metrics['eval']
specificity=test['true_negatives']/(test['true_negatives']+test['false_positives'])
sensitivity=test['true_positives']/(test['true_positives']+test['false_negatives'])
accuracy=test['binary_accuracy']
print('Evaluation Metrics: (Accuracy:)',accuracy,'(Specificity:)',specificity,'(Sensitivity:)',sensitivity)

Evaluation Metrics: (Accuracy:) 0.61860466 (Specificity:) 0.0 (Sensitivity:) 1.0


In [119]:
# Uncomment and run this cell to remove old outputs from the directory so new results can be seen on tensorboard

#!rm -R /tmp/logs/scalars/*