# Histopathologic Cancer Detection

This submission, woefully late for the actual competition, is to satisfy Week 3 coursework for the Introduction to Deep Learning class, a portion of the Data Science Masters Degree at University of Colorado, Boulder.

Intention, I will build CNN classifiers from scratch for image detection of cancer cells. The competition scoring is based on finding a single cancerous cell in the centermost 32x32 square of each image. This subtle little point is what made this competition very difficult. Since we are not competing, these classifiers will score the whole image as malignant or benign. This will result in lower scores but easier to train (converge) CNN models. That these classifiers are built to this altered specification (different than the competition) is a crucial subpoint...

This notebook will include several iterations of CNNs to try things:
1. The first CNN will be basic, and try a myriad of possibilities until I have a working classifier.
2. The second will be more complex than the first, and iterate on the success.
3. The third will be even more complex than the first two, and iterate once more.
4. The 4th through 9th models scored are transfer learning examples using Densenet 121, 169, & 201. 

The goal here is to iterate on what works three times to try new things, illustrate a transfer learning implementation, and then finally an ensemble implementation. Once we have the first one working, I'll try different activation functions, adding and subtracting layers, and dialing in the hyperparameters for the models.

I will use the kaggle dataset located here: (https://www.kaggle.com/competitions/histopathologic-cancer-detection/data). This dataset has been curated and differs from the original PCAM dataset because duplicate images were remove. Kaggle is hosting this curated dataset for the machine learning community to use for fun and practice. This dataset was provided by Bas Veeling, with additional input from Babak Ehteshami Bejnordi, Geert Litjens, and Jeroen van der Laak. You may view and download the official Pcam dataset from GitHub (https://github.com/basveeling/pcam). The data is provided under the CC0 License, following the license of Camelyon16. 

### TLDR Results

In [None]:
import pandas as pd
#TLDR Results
import pandas as pd
fin = pd.DataFrame({"Model":['DenseNet 201 (2 training epochs)','DenseNet 169 (12 training epochs)','DenseNet 121 (12 training epochs)','DenseNet 201 (12 training epochs)','Densenet169 (2 training epochs)','Densenet 121 (2 training Epochs)','Model 2 CNN','Model 3 CNN'],
"Private Score":[0.815,0.8199,0.815,0.7865,0.8133,0.817,0.773,0.7477],
'Public Score':[0.8396,0.8452,0.8425,0.8307,0.8253,0.8325,0.8336,0.801]})
fin.sort_values(by='Private Score',ascending=False)

Work to do.
1. EDA
2. Instantiate GPU/TPU Strategy
3. Data Preprocessing
4. Build Appropriate Model(s)
5. Train and Test Models 
6. Iterate 
7. Transfer Learning Example
8. Submit Each Iteration for Scoring 
9. Ensemble Results (second notebook here: https://www.kaggle.com/toddgardiner/binary-cancer-classifier-ensembler/ )
10. Conclusion

Let's begin by loading the libraries we'll need...

In [None]:
import numpy as np 
import pandas as pd 
from PIL import Image
import matplotlib.pyplot as plt
import tensorflow as tf 
import tensorflow_io as tfio
import keras
from keras.models import Sequential
from keras.layers import AvgPool2D,BatchNormalization, Conv2D, Dense, Flatten, Input, GlobalAveragePooling2D, Dropout 
from keras.layers import MaxPool2D, MaxPooling2D, ReLU, concatenate
import math, gc, copy
AUTOTUNE = tf.data.AUTOTUNE
import warnings
warnings.filterwarnings('ignore')
import os
# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))
print("Tensorflow Version In Use: ", tf.__version__ , " \nNotebook was built using tf version 2.13.0")

Next, we detect our hardware and light up GPUs or TPUs if we have them.

In [None]:
# Detect hardware and light up the GPUs/TPUs
try:
     # detect and init the TPU
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()

    # instantiate a distribution strategy
    tf.tpu.experimental.initialize_tpu_system(tpu)
    tpu_strategy = tf.distribute.TPUStrategy(tpu)
 
    # tell us what happened
    print('Running on TPU ', tpu.cluster_spec().as_dict())

except ValueError: # If TPU not found
    tpu = None
    tpu_strategy = tf.distribute.get_strategy() # Default strategy that works on CPU and single GPU
    print('Running on CPU instead')

print("Number of accelerators: ", tpu_strategy.num_replicas_in_sync)
print("TPU: ", tpu)


# EDA: Exploratory Data Analysis

With our libraries and hardware set up, we begin the exploratory data analysis or EDA. This process will involve several steps including.
1. Load the data
2. Characterize the data
3. Visualize the data
4. Develop (Initial) Strategy for Modeling


In [None]:
# Load the data into a dataframe (df)

df = pd.read_csv('/kaggle/input/histopathologic-cancer-detection/train_labels.csv')

# get basic df information
print(df.info())
print('')

# get basic df statistical information
print(df.describe())
print('')

#see the raw data
df.head()


From those lines of code we know that there are 220 thousand rows of data, 40% of the observations have a cancer, the labels in the output are binary (1,0), there are no nulls values in either column, and we can be fairly certain that this maintained and curated dataset needs no cleaning. 

There is only one thing left to check, are there 220,025 images in the training folder to match? Let's load them into a list and compare the lengths to find out.

In [None]:
imagelist = os.listdir('/kaggle/input/histopathologic-cancer-detection/train')
print("Length of Image List:", len(imagelist))

The number of rows in the csv match the number of images in the training folder. Preliminarily, the training data looks good. Let's move on to the test data folder and repeat the process. Since there is no training.csv, we will see if submission.csv matches the image list first.

In [None]:
# Load the data into a dataframe (dfva from testing)

dfva = pd.read_csv('/kaggle/input/histopathologic-cancer-detection/sample_submission.csv')

# get basic df information
print(dfva.info())
print('')

# get basic df statistical information
print(dfva.describe())
print('')

# get the length of files in the testing folder
imagevalist = os.listdir('/kaggle/input/histopathologic-cancer-detection/test/')
print("Length of Validation Image List:", len(imagevalist))

#see the raw data
dfva.head()

This data is indeed a validation set. The data has no labels, but otherwise conforms to a clean set of data (no nulls, images match csv rows, etc.). This data will serve as the validation set in our model, scored after the fact for testing. 

What we learned:
1. We will need to do a train/test split on the training data to build the model effectively.
2. We have a lot of data to work with (220,000+ observations).
3. The dataset isn't perfectly balanced (60/40) but it's not bad.
4. There are only two classes (0,1) so we are building a binary classifier.
5. The data in csv doesn't contain path information to the files (we'll have to add that).

What we still need to know:
1. What do the images look like? (format, channel construction, size)
2. Are the images uniform?
3. What are the implications on RAM, CPU, and GPU/TPU moving forward?

Let's look at some images and find out...

In [None]:
fig, axs = plt.subplots(3,3) 
h = 0
v = 0
for i in range(9):
    imid = df.id.sample(1).values[0]
    #print(imid)
    image = Image.open('/kaggle/input/histopathologic-cancer-detection/train/'+imid+'.tif')
    axs[h,v].imshow(image)
    if h == 2:
        v +=1
        h = 0
    else:
        h +=1
print("Last Image Specifications: Shape",image.size,"\nFormat:",image.info )
print(os.stat('/kaggle/input/histopathologic-cancer-detection/train/'+imid+'.tif').st_size, "bytes on disk")

I run that cell multiple times to get a representative sample. These are consistently 96x96 tif images in the RAW format weighing 27935 bytes (27kb). Let's see if the validation set is the same size.


In [None]:
fig, axs = plt.subplots(3,3) 
h = 0
v = 0
for i in range(9):
    imid = dfva.id.sample(1).values[0]
    #print(imid)
    image = Image.open('/kaggle/input/histopathologic-cancer-detection/test/'+imid+'.tif')
    axs[h,v].imshow(image)
    if h == 2:
        v +=1
        h = 0
    else:
        h +=1
print("Last Image Specifications: Shape",image.size,"\nFormat:",image.info )
print(os.stat('/kaggle/input/histopathologic-cancer-detection/test/'+imid+'.tif').st_size, "bytes on disk")

Great, let's check with the command line to confirm our intutions.

In [None]:
# !ls -U -hal /kaggle/input/histopathologic-cancer-detection/train/ | head -10

# total 5.9G
# drwxr-xr-x 2 nobody nogroup   0 Feb 13  2023 .
# drwxr-xr-x 4 nobody nogroup   0 Feb 13  2023 ..
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 d43c081bafa286f9c1f7e921883f26ceafebc912.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 092d0eedebce504847715ee046b6ad74b57599b4.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 b0d2582c6218a8764323fc940b41312282b99bf4.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 187c99df762f13f99818e5593d4bab4c6577e7e3.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 7c5270c83837de5a5cbb2dca511559dc39d19d53.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 5a32933e093185f5fc91d30fc83ad571c6818d25.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 42e77d193e73811e0bb65a0cbd9b01c5c27900fa.tif

In [None]:
# !ls -U -hal /kaggle/input/histopathologic-cancer-detection/test/ | head -10

# total 1.6G
# drwxr-xr-x 2 nobody nogroup   0 Feb 13  2023 .
# drwxr-xr-x 4 nobody nogroup   0 Feb 13  2023 ..
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 a7ea26360815d8492433b14cd8318607bcf99d9e.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 59d21133c845dff1ebc7a0c7cf40c145ea9e9664.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 5fde41ce8c6048a5c2f38eca12d6528fa312cdbb.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 bd953a3b1db1f7041ee95ff482594c4f46c73ed0.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 523fc2efd7aba53e597ab0f69cc2cbded7a6ce62.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 d23c66547f4a00555a174d2fcb860ae399b66edc.tif
# -rw-r--r-- 1 nobody nogroup 28K Feb 13  2023 fabf2fca23f71655974767e29eda86a9b2c97a72.tif


Everything with the files seems good. Each file is 28K, as we computed above. Let's move on to making a pipeline for our data.

# Data Preprocessing

We found that there are 5.9GB of photo data in the last step. We could load all the photos into memory (RAM) and build a model, but we'd be limiting the models ability to use RAM in training. Loading batches of images will be better, and we'll let the system define the optimizations with tf.data.AUTOTUNE turned on.

We also know that we need to add paths to the data, balance the classes, and select random training and test sets. We'll handle all that in this first step before we make a tensorflow dataset.

In [None]:
randomseed = 12

# Balance the classes as we pull out the training set.
print("Hom Many Class 1 Observations:",df.label.sum())
pull80 = int(df.label.sum() * .8)
pull20 = df.label.sum()-pull80
print("Define 80% of these Observations:",pull80)
print("Define 20% of these Observations:",pull20)

# Make dataframes of index positions for cancer and benign
cancerlist = pd.DataFrame(df.index[df['label'] ==1].tolist(),columns=['id'])
benignlist = pd.DataFrame(df.index[df['label'] ==0].tolist(),columns=['id'])

# Random sample the 71294 for each list
trainlistc = cancerlist['id'].sample(pull80,replace=False,random_state = randomseed)
trainlistb = benignlist['id'].sample(pull80,replace=False,random_state = randomseed)

print("Length of Cancer Training List:",len(trainlistc))
print("Length of Benign Training List:",len(trainlistb))

# Add Columns To Account For Training and Testing
cancerlist['train'] = 0
benignlist['train'] = 0
cancerlist['test'] = 0
benignlist['test'] = 0

# Populate Column
for i in range(len(trainlistc)):
    cancerlist['train'].loc[cancerlist['id'] == trainlistc.iat[i]] = 1
    benignlist['train'].loc[benignlist['id'] == trainlistb.iat[i]] = 1

# Sample out test set from remainder
testlistc = cancerlist['id'].loc[cancerlist['train']==0].sample(pull20,\
                                    replace=False,random_state=randomseed)
testlistb = benignlist['id'].loc[benignlist['train']==0].sample(pull20,\
                                    replace=False,random_state=randomseed)

# Populate Column
for i in range(len(testlistc)):
    cancerlist['test'].loc[cancerlist['id'] == testlistc.iat[i]] ==1
    benignlist['test'].loc[benignlist['id'] == testlistb.iat[i]] ==1

# Share Output Status
print("Length of Cancer Testing List:",len(testlistc))
print("Length of Benign Testing List:",len(testlistb))

# Ensure we didn't get rows in both
print("Number of Cancer rows in both Testing and Training Set", \
      cancerlist.id.loc[(cancerlist['train']==1)&(cancerlist['test']==1)].count())
print("Number of Benign rows in both Testing and Training Set", \
      benignlist.id.loc[(benignlist['train']==1)&(benignlist['test']==1)].count())


Now we have index numbers to randomly select train/test sets with balanced classes.
We will build the dataframes for the train and test set, complete with image paths.

In [None]:
def addimginfo(id):
    return f"/kaggle/input/histopathologic-cancer-detection/train/{id}.tif"
    
#Build Training Dataframe and View
dftrc = df.loc[df.index[trainlistc.tolist()]]
dftrb = df.loc[df.index[trainlistb.tolist()]]
dftr = pd.concat([dftrc,dftrb]).sample(frac=1).sample(frac=1,random_state=randomseed).reset_index(drop=True)
dftr['path'] = dftr.id.apply(addimginfo)
print("Length:",len(dftr.id)," Number of Cancer Obs:",dftr.label.sum())
dftr.head()

In [None]:
#Build Testing Dataframe and View
dftec = df.loc[df.index[testlistc.tolist()]]
dfteb = df.loc[df.index[testlistb.tolist()]]
dfte = pd.concat([dftec,dfteb]).sample(frac=1,random_state=randomseed).sample(frac=1).reset_index(drop=True)
dfte['path'] = dfte.id.apply(addimginfo)
print("Length:",len(dfte.id)," Number of Cancer Obs:",dfte.label.sum())
dfte.head()

In [None]:
# Collect the garbage before moving on
del dftec
del dfteb
del dftrc
del dftrb
del imagelist, imagevalist
del pull20, pull80, fig, axs
gc.collect()

Next we turn these dataframes into tensorflow Datasets. We do so by making numpy arrays of the data, applying the dataset function, and then apply a map function to download the binary image data on the fly. Notice that we have also set them to batch through at 64 images per iteration, and give tensorflow the ability to optimize the prefecthing of batches for speed (AUTOTUNE).

In [None]:
# define function to open file, decode, convert to float, and normalize to 0-1

@tf.function
def grab_images(path):
    file = tf.io.read_file(path)
    img = tfio.experimental.image.decode_tiff(file, index=0)
    img = tf.image.random_flip_left_right(img, seed=None)
    img = tf.image.random_flip_up_down(img, seed=None)
    img =img[:,:,0:-1]
    img = img/255
    img = tf.image.convert_image_dtype(img,dtype=tf.float32)
    return img

# test the function
tester = grab_images('/kaggle/input/histopathologic-cancer-detection/train/0001a2bc5d4aa55989f014bfad74a95ac3dfff54.tif')
plt.imshow(tester)
plt.show()
# ensure we are normalized between 0-1
print(tester[0:5,0:5,:])

# ensure we have the right shape for RGBA (4th channel is pixel intensity of 1)
tester.shape


In [None]:
# make both label datasets
trlabs = tf.data.Dataset.from_tensor_slices(np.array([np.array([0,1]) if i ==1 else np.array([1,0]) for i in dftr.label.values ]))
telabs = tf.data.Dataset.from_tensor_slices(np.array([np.array([0,1]) if i ==1 else np.array([1,0]) for i in dfte.label.values ]))

# make both path datasets
trpaths = tf.data.Dataset.from_tensor_slices(np.array([path for path in dftr.path.values]))
tepaths = tf.data.Dataset.from_tensor_slices(np.array([path for path in dfte.path.values]))

# create image datasets on the fly
trimgs = trpaths.map(grab_images)
teimgs = tepaths.map(grab_images)

# zip them together
trset = tf.data.Dataset.zip((trimgs,trlabs)).batch(64).prefetch(AUTOTUNE)
teset = tf.data.Dataset.zip((teimgs,telabs)).batch(64).prefetch(AUTOTUNE)

In [None]:
# # Ensure the datasets are working...
# for i in trset.take(1):
#     for element in i:
#         print(element)

In [None]:
checkpoint_filepath =''
#define the callbacks for upcoming models
earlyst = tf.keras.callbacks.EarlyStopping(monitor="binary_crossentropy", 
                                           patience = 5)
rlrop = tf.keras.callbacks.ReduceLROnPlateau(monitor="binary_crossentropy", 
                                             factor=.1,
                                             patience = 2,
                                             min_lr = 0)

With collated tf datasets in hand, it's time to do some modeling. 

# Build Models and Experiment Until Success

First, several things didn't work at all. What you'll see below is my 12th or 15th attempt at a working model. Each of these things were tried but led to an exploding gradient.
- adam optimizer
- nadam optimizer 
- categorical cross-entropy loss function
- momentum between .2 and .5
- learning rates above .00025
- softmax activation on final layer
- sigmoid activation on final layer


# Model 1: Basic CNN
The first thing I tried that worked is shown below. A fairly simple convolutional neural network built in Keras Sequential. I used 5 layers of a convolution and an average pooling layer, followed by 3 dense layers of neurons with final activation of tanh and a Binary Cross Entropy Loss Function to make the model choose from our two classes. You'll notice I programmed the model inside the tpu_strategy scope so the accelerators all function seamlessly. The JIT compiler is also turned on for speed.

In [None]:
 
with tpu_strategy.scope():
    model = Sequential([
    Input(shape=(96, 96, 3)),  
   
    Conv2D(32, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),      
    
    Conv2D(32, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
 
    Conv2D(64, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
    
    Conv2D(64, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
 
    Conv2D(32, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
 
    # Transition to Neural Network
    Flatten(),
    Dense(288, activation='relu'),
    Dense(128, activation='relu'),
    Dense(2, activation='tanh')
    ])
    
    model.compile(
         
#         optimizer = tf.keras.optimizers.Adam(
#                         learning_rate=0.0025,
#                     # USE L1 Regularization?
#                       beta_1=0.9,
#                         # USE L2 Regularization?
#                         beta_2=0.2,
#                                 epsilon=1e-07,
#                                 amsgrad=False,
#                                 weight_decay=None,
#                                 clipnorm=None,
#                                 clipvalue=None,
#                                 global_clipnorm=None,
#                                 use_ema=False,
#                                 ema_momentum=0.97,
#                                 ema_overwrite_frequency=None,
#                                 jit_compile=True,
#                         ),
     optimizer =    tf.keras.optimizers.experimental.RMSprop(
            learning_rate=0.00025,
#             rho=0.9,
            momentum=0.15,
#             epsilon=1e-07,
#             centered=False,
#             weight_decay=None,
#             clipnorm=None,
#             clipvalue=None,
#             global_clipnorm=None,
#             use_ema=False,
#             ema_momentum=0.99,
#             ema_overwrite_frequency=100,
            jit_compile=True,
#             name='RMSprop',
#             **kwargs
        ),
        loss= 'BinaryCrossentropy',
        metrics=[ 'BinaryCrossentropy', 'accuracy']
    )

model.summary()


In [None]:
# define model save location
checkpoint_filepath = '/kaggle/working/model1/'
!mkdir {checkpoint_filepath}
                                           
checkpoints = model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

history = model.fit(
                    trset,
                    epochs=30,
                    callbacks=[rlrop,earlyst,checkpoints],
                    validation_data = teset
                    )

# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

print("Make Some Predictions")
x = model.predict(teset.take(1)) #batch of 64

print("Binary Decision Logits:\n",x[0:10])

print("Predictions:\n",[np.argmax(x) for x in x[0:30]],'\nTruth:\n',[x for x in dfte.label.values[0:30]])

We can see the logits for predictions are not all of the same class, manual inspection of predictions yields roughly the same result as the tensorflow output, and the graphs populate. This is a working model.

This model is pretty good, with accuracy above 80%, but it is nowhere near optimal. There is a zero slope in the graphs for accuracy and loss, telling us the model is fully trained. The learning rate reduction on plateu callback works really well, as does the callback for early stopping. This model was not submitted for scoring.

The problem is that our accuracy is already receding in epoch 4 and the learning rate dives by an order of magnitude. This tells me that I have too high an initial momentum, learning rate, and that we can improve.

# Model 2: CNN Iteration
V2 was scored at 83.36%% on the public leaderboard and 77.3% on the private leaderboard.

Next I tried adding in batch normalization, reducing the learning rate, and lowering the momentum too. I also halved the learning rate reduction on plateaus to see if we were getting too small too fast. This model is an improvement on the first successful model. 

In [None]:
dropout_conv = 0.2
with tpu_strategy.scope():
    model2 = Sequential([
    Input(shape=(96, 96, 3)), 
    # Layer 1
    Conv2D(32, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),      
    Conv2D(32, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
    BatchNormalization(), 
        
    # Layer 2
    Conv2D(64, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
    Conv2D(64, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'), 
    BatchNormalization(), 
        
    # Layer 3        
    Conv2D(32, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
    BatchNormalization(),

    # Transition to Neural Network
    Flatten(),
    Dense(288, activation='relu'),
    Dense(128, activation='relu'),
    Dense(2, activation='tanh')
    ])
    
    model2.compile(

     optimizer =    tf.keras.optimizers.experimental.RMSprop(
            learning_rate=0.000025,
#             rho=0.9,
            momentum=0.025,
#             epsilon=1e-07,
#             centered=False,
#             weight_decay=None,
#             clipnorm=None,
#             clipvalue=None,
#             global_clipnorm=None,
#             use_ema=False,
#             ema_momentum=0.99,
#             ema_overwrite_frequency=100,
             jit_compile=True,
#             name='RMSprop',
#             **kwargs
        ),
        loss= 'BinaryCrossentropy',
        metrics=[ 'BinaryCrossentropy', 'accuracy']
    )

model2.summary()


In [None]:
# modify callbacks
earlyst = tf.keras.callbacks.EarlyStopping(monitor="binary_crossentropy", 
                                           patience = 5)
rlrop = tf.keras.callbacks.ReduceLROnPlateau(monitor="binary_crossentropy", 
                                             factor=.5,
                                             patience = 2,
                                             min_lr = 0)
checkpoint_filepath = '/kaggle/working/model1/'
!mkdir {checkpoint_filepath}
checkpoints = model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

history2 = model2.fit(
                    trset,
                    epochs=100,
                    callbacks=[rlrop,earlyst,checkpoints],
                    validation_data = teset
                    )

# summarize history for accuracy
plt.plot(history2.history['accuracy'])
plt.plot(history2.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history2.history['loss'])
plt.plot(history2.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

Again, we have a slope of zero at the end of the training, logits that are depicting real predictions (not shown), and manual inspection confirms the model results. This model trained more evenly (consistent improvement through 20 epochs before a lr reduction, denoting a consistent learning rate), trained far longer, and is more accurate. The final epochs show me that this model is fairly topped out though. To improve further I'll need a change of architecture.

# Model 3: CNN w New Layout
Scored 80.1% on the Public Leaderboard and 74.77% on the private Leaderboard

The nature of this problem is cell structure recognition. For a final interation I wanted to try to layer the convolutions up so that they could work together in each layer. So, I added a few convolutional layers, and tried out the JIT compiler. Here I tried another callback, checkpoints, to save the best model automatically. Finally, I tried sigmoid but went back to the tanh as a final activation. Although this model did better in training, the model didn't generalize as well to the scoring system. This tells us that we in fact overfit the data here.

In [None]:
with tpu_strategy.scope():
    model3 = Sequential([
    Input(shape=(96, 96, 3)), 
    # Layer 1
    Conv2D(16, 3, padding='same', activation = 'relu'),
    Conv2D(16, 3, padding='same', activation = 'relu'),
    Conv2D(16, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),  
    BatchNormalization(), 
    
    # Layer 2
    Conv2D(32, 3, padding='same', activation = 'relu'),
    Conv2D(32, 3, padding='same', activation = 'relu'),
    Conv2D(32, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
    BatchNormalization(), 
    
    # Layer 3
    Conv2D(64, 3, padding='same', activation = 'relu'),
    Conv2D(64, 3, padding='same', activation = 'relu'),
    Conv2D(64, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
    BatchNormalization(),    
    
    # Layer 4
    Conv2D(128, 3, padding='same', activation = 'relu'),
    Conv2D(128, 3, padding='same', activation = 'relu'),
    Conv2D(128, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'), 
    BatchNormalization(), 
        
    # Layer 5        
    Conv2D(64, 3, padding='same', activation = 'relu'),
    AvgPool2D(pool_size=2, padding='same'),
    

    # Transition to Neural Network
    Flatten(),
    Dense(576, activation='relu'),
    Dense(128, activation='relu'),
    Dense(2, activation='tanh')
    ])
    
    model3.compile(

     optimizer =    tf.keras.optimizers.experimental.RMSprop(
            learning_rate=0.000025,
#             rho=0.9,
            momentum=0.025,
#             epsilon=1e-07,
#             centered=False,
#             weight_decay=None,
#             clipnorm=None,
#             clipvalue=None,
#             global_clipnorm=None,
#             use_ema=False,
#             ema_momentum=0.99,
#             ema_overwrite_frequency=100,
            jit_compile=True,
#             name='RMSprop',
#             **kwargs
        ),
        loss= 'BinaryCrossentropy',
        metrics=[ 'BinaryCrossentropy', 'accuracy']
    )

model3.summary()


In [None]:
# !rm -rf {checkpoint_filepath}

In [None]:
checkpoint_filepath = '/kaggle/working/model3/'
!mkdir {checkpoint_filepath}

checkpoints = model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

history3 = model3.fit(
                    trset,
                    epochs=100,
                    callbacks=[rlrop,earlyst,checkpoints],
                    validation_data = teset
                    )

print("Make Some Predictions")
x = model3.predict(teset.take(1)) #batch of 64

print("Binary Decision Logits:\n",x[0:10])

print("Predictions:\n",[np.argmax(x) for x in x[0:30]],'\nTruth:\n',[x for x in dfte.label.values[0:30]])
# summarize history for accuracy
plt.plot(history3.history['accuracy'])
plt.plot(history3.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history3.history['loss'])
plt.plot(history3.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

This was by far the best model built from scratch, and was used to build and submit our predictions for V1 (scores above). However, this model was overfit. The final results were not as good as Model 2.

# Model 4 -5 : Densenet 121 
* (Densenet 121 - 2 Training Epochs) Private Leaderboard 81.7 Public Leaderboard 83.25
* (Densenet 121 - 12 Training Epochs) Private Leaderboard 81.5 Public Leaderboard 84.25

Here I transitioned away from models I built from scratch into models that are pretrained. I tried the densenet 121, 169, and 201 models. The results above are marginally better than the self constructed models, but only marginally so. We'd expect, the densenet 201 model to score the best. But that's not what happened



In [None]:
# Import Densenet
# Versions tried DenseNet121, DenseNet169, DenseNet201

densemodel = tf.keras.applications.densenet.DenseNet121(weights='imagenet', input_shape = (96,96,3), include_top=False)
# densemodel = tf.keras.applications.densenet.DenseNet169(weights='imagenet', input_shape = (96,96,3), include_top=False)

for layer in densemodel.layers:
    layer.trainable=False

In [None]:
with tpu_strategy.scope():
    model4 = Sequential([
        densemodel,
        AvgPool2D(pool_size=2, padding='same'),
        
        # Transition to Neural Network
        Flatten(),
        Dense(1920, activation='relu'),
        Dense(128, activation='relu'),
        Dense(2, activation='tanh')
        ])
    
    model4.compile(

     optimizer =    tf.keras.optimizers.experimental.RMSprop(
            learning_rate=0.000025,
#             rho=0.9,
            momentum=0.025,
#             epsilon=1e-07,
#             centered=False,
#             weight_decay=None,
#             clipnorm=None,
#             clipvalue=None,
#             global_clipnorm=None,
#             use_ema=False,
#             ema_momentum=0.99,
#             ema_overwrite_frequency=100,
            jit_compile=True,
#             name='RMSprop',
#             **kwargs
        ),
        loss= 'BinaryCrossentropy',
        metrics=[ 'BinaryCrossentropy', 'accuracy']
    )

model4.summary()


In [None]:
checkpoint_filepath = '/kaggle/working/model4/'
!mkdir {checkpoint_filepath}
checkpoints = model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)


history4 = model4.fit(
                    trset,
                    epochs=12,
                    callbacks=[rlrop,earlyst,checkpoints],
                    validation_data = teset
                    )

print("Make Some Predictions")
x = model4.predict(teset.take(1)) #batch of 64

print("Binary Decision Logits:\n",x[0:10])

print("Predictions:\n",[np.argmax(x) for x in x[0:30]],'\nTruth:\n',[x for x in dfte.label.values[0:30]])
# summarize history for accuracy
plt.plot(history4.history['accuracy'])
plt.plot(history4.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history4.history['loss'])
plt.plot(history4.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# Models 6-7: Densenet 169
* (Densenet 169 - 2 Training Epochs) Private Leaderboard 81.33 Public Leaderboard 84.52
* (Densenet 169 - 2 Training Epochs) Private Leaderboard 81.99 Public Leaderboard 82.53


In [None]:
densemodel = tf.keras.applications.densenet.DenseNet169(weights='imagenet', input_shape = (96,96,3), include_top=False)

for layer in densemodel.layers:
    layer.trainable=False

with tpu_strategy.scope():
    model5 = Sequential([
        densemodel,
        AvgPool2D(pool_size=2, padding='same'),
        
        # Transition to Neural Network
        Flatten(),
        Dense(1920, activation='relu'),
        Dense(128, activation='relu'),
        Dense(2, activation='tanh')
        ])
    
    model5.compile(

     optimizer =    tf.keras.optimizers.experimental.RMSprop(
            learning_rate=0.000025, 
            momentum=0.025, 
            jit_compile=True
        ),
        loss= 'BinaryCrossentropy',
        metrics=[ 'BinaryCrossentropy', 'accuracy']
    )

model5.summary()


In [None]:
checkpoint_filepath = '/kaggle/working/model5/'
!mkdir {checkpoint_filepath}
checkpoints = model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)


history5 = model5.fit(
                    trset,
                    epochs=12,
                    callbacks=[rlrop,earlyst,checkpoints],
                    validation_data = teset
                    )

print("Make Some Predictions")
x = model5.predict(teset.take(1)) #batch of 64

print("Binary Decision Logits:\n",x[0:10])

print("Predictions:\n",[np.argmax(x) for x in x[0:30]],'\nTruth:\n',[x for x in dfte.label.values[0:30]])
# summarize history for accuracy
plt.plot(history5.history['accuracy'])
plt.plot(history5.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history5.history['loss'])
plt.plot(history5.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# Models 8-9: DenseNet 201

* (Densenet 201 - 2 Training Epochs) Private Leaderboard 81.5 Public Leaderboard 83.96
* (Densenet 201 - 2 Training Epochs) Private Leaderboard 78.65 Public Leaderboard 83.07

In [None]:
densemodel = tf.keras.applications.densenet.DenseNet201(weights='imagenet', input_shape = (96,96,3), include_top=False)

for layer in densemodel.layers:
    layer.trainable=False

with tpu_strategy.scope():
    model6 = Sequential([
        densemodel,
        AvgPool2D(pool_size=2, padding='same'),
        
        # Transition to Neural Network
        Flatten(),
        Dense(1920, activation='relu'),
        Dense(128, activation='relu'),
        Dense(2, activation='tanh')
        ])
    
    model6.compile(

     optimizer =    tf.keras.optimizers.experimental.RMSprop(
            learning_rate=0.000025, 
            momentum=0.025,
            jit_compile=True
        ),
        loss= 'BinaryCrossentropy',
        metrics=[ 'BinaryCrossentropy', 'accuracy']
    )

model6.summary()


In [None]:
checkpoint_filepath = '/kaggle/working/model6/'
!mkdir {checkpoint_filepath}
checkpoints = model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)


history6 = model6.fit(
                    trset,
                    epochs=12,
                    callbacks=[rlrop,earlyst,checkpoints],
                    validation_data = teset
                    )

print("Make Some Predictions")
x = model6.predict(teset.take(1)) #batch of 64

print("Binary Decision Logits:\n",x[0:10])

print("Predictions:\n",[np.argmax(x) for x in x[0:30]],'\nTruth:\n',[x for x in dfte.label.values[0:30]])
# summarize history for accuracy
plt.plot(history6.history['accuracy'])
plt.plot(history6.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history6.history['loss'])
plt.plot(history6.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

The above results show the training for the Densenet architectures. I trained the nets for 2 and 12 epochs each and scored them. The results are impressive, the training short, and the amount of code to implement small. That this only took 4 lines of code to implement hundreds of layers is cool. The only drawbacks are the amount of time needed to train each epoch (10m on DenseNet121 up to 40m on DenseNet201), and scoring slowly. The results are under the titles above. 

The big takeaways are is that this is easy, but not significantly better than building a custom CNN. And that I need to try everything. The big surprise is that Densenet 169 won based on private leaderboard scoring. Both DenseNet 121 and 201 scored best on 2 training epochs, not more. Very interesting results.

# Create Submission(s)

To create a submission, we need to make a dataframe from the csv, load the id column (complete with path references to the files) into a dataset, run the dataset through the get image function, then through the model of choice (after loading), get the argument max for each row of predictions, repopulate the labels column in the dataframe, and finally overwrite the CSV file with the new data.

In [None]:
# add path data to ids, switch to np.array, then make tf dataset
val = np.array(['/kaggle/input/histopathologic-cancer-detection/test/'+i+'.tif' for i in dfva.id.values])        
vapaths = tf.data.Dataset.from_tensor_slices(val)

# create image datasets on the fly
vaimgs = vapaths.map(grab_images).batch(64).prefetch(AUTOTUNE)

# load best model from iteration chosen model1, model2, model3, model4, model5, or model6
with tpu_strategy.scope():
    modelp = tf.keras.models.load_model('/kaggle/working/model5/')

# make predictions
predsraw = modelp.predict(vaimgs)

# get classifications from logits
preds = [np.argmax(x) for x in predsraw]

# post preds to dataframe
dfva.label = preds

# save the df to csv for submission
dfva.to_csv('submission.csv',index=False)
                
# Technical note: If you save a submission file to your computer, 
# you can upload the submission directly to the competition page and get a score.  
# This method was used on all transfer learning iterations to reduce GPU usage.

In [None]:
!head submission.csv 

# Table of Results

In [None]:
 gridset = pd.DataFrame({
 'Architecture':['Model 1 Basic CNN','Model 1 Basic CNN','Model 1 Basic CNN','Model 1 Basic CNN',
                 'Model 1 Basic CNN','Model 1 Basic CNN','Model 1Basic CNN', 'Model 2 Basic CNN',
                 'Model 3 Basic CNN','DenseNet 121 (2 epoch training)','DenseNet 121 (12 epoch training)','DenseNet 169 (2 epoch training)',
                 'DenseNet 169 (12 epoch training)','DenseNet 201 (2 epoch training)','DenseNet 201 (12 epoch training)'],
 'Learning Rate':[.001,.001,.001,.0025,
                  .0025,.00025,.00025,.00025,
                  .000025,.000025,.000025,.000025,
                  .000025,.000025,.000025],
 'Momentum':[.3,.25,.25,.2,
             .15,.15,.025,.025,
             .025,.025,.025,.025,
             .025,.025,.025],
 'Optimizer':['Adam','Nadam','Adam','Adam',
              'RMS Prop','RMS Prop','RMS Prop','RMS Prop',
              'RMS Prop','RMS Prop','RMS Prop','RMS Prop',
              'RMS Prop','RMS Prop','RMS Prop'],
 'C Activation':['Relu','Relu','Relu','Relu',
                 'Relu','Relu','Relu','Relu',
                 'Relu','Relu','Relu','Relu',
                 'Relu','Relu','Relu'],
 'NN Activation':['Sigmoid','Sigmoid','Tanh','Softmax',
                  'Sigmoid','Sigmoid','Tanh','Tanh',
                  'Tanh','Tanh','Tanh','Tanh',
                  'Tanh','Tanh','Tanh'],
 
 "Private LB":['Gradient Runaway','Gradient Runaway','Gradient Runaway','Gradient Runaway',
                'Gradient Runaway','Not Scored', 'Not Scored', 77.3, 
                74.77 , 81.7, 81.5 , 81.33 ,
                81.99 , 81.5 , 78.65 
          ],
"Public LB":['Gradient Runaway','Gradient Runaway','Gradient Runaway','Gradient Runaway',
             'Gradient Runaway','Not Scored','Not Scored',  83.36,
             80.1, 83.25, 84.25, 82.53,
             84.52, 83.96, 83.07
          ],
 "Rank":[None,None,None,None,
         None,None,None,7,
         8,2,3,5,
         1,3,6]

 })

gridset

# Conclusion:

This project was to detect cancer cells in images. We found out that a CNN is very capable of such a task. Things that are incredibly important to the constuctionof useful CNN classifiers are optimizers, loss functions, and learning rates. Get these wrong and even a basic classifier is incapable of running. Finding out that NADAM and ADAM optimizers were not capable was helpful. SGD and RMSProp work very well, but SGD is considerably slower to converge than RMSProp. Settling on an adequate learning rate was equally as important, without a proper learning rate the classifier was inoperable. Once I got past the runaway gradients problem, I could use the callbacks function to really dial inn the learning rates and make changes to the basic architecture. The learning rate changes, to what was appropriate, yielded significant gains in the basic classifier too. The final major learning point is the difference between tanh and sigmoid activation functions on the neural network. Tanh, in my opinion is the superior choice.

When it comes to the model architecture, more isn't always better. Not only because of overfitting, but considering how long things take to run. The Densenet 169 won based on the private leaderboard, beating out the more complex 201 model. The Densenet 201 took longer to train too. The smaller Densenet 121 took less time per epoch to train and scored second with only 2 training epochs. None of the models was remarkably better, within a few percentage points of accuracy from hand constructed models. The same held true for the basic CNNs I built. Complexity doesn't help a CNN avoid overfitting. I was lucky to have over 140,000 training examples to help mitigate the overfitting issue, and added random flips (horizontal and vertical) to the training. The more complex model outperformed the basic model and is the basis for the best submission from a CNN I built from scratch. Were I to try this again, I'd explore the transfer learning opportunities with efficientnet, inceptionV3, Xception, and ConvNext. I think that doing so would expand upon what I've learned here. 

Tensorflow is a very capable framework once you get the Datasets operating on the fly. The program keeps the RAM load down, and facilitate focus on the modeling. I was impressed with the JIT compiler, which sped up training considerably, too. 

Thanks for taking the moments to look at my workbook.