<a href="https://colab.research.google.com/github/thor4/neuralnets/blob/master/projects/1-CNN/step2-generate_predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generate predictions from model
--- 

## 1: Setup the model
This model was created using the `step1_train_vanilla_CNN_v2.ipynb` Jupyter notebook. 
Run the cell to download a zip file from OSF then extract its contents into the newly created directory: `content/18kim_range_vanilla/`

In [1]:
# @title Download model

import requests, os
from zipfile import ZipFile

print("Start downloading and unzipping `Range model`...")
name = '18kim_range_vanilla'
fname = f"{name}.zip"
url = f"https://osf.io/tycf6/download" #osf share link
r = requests.get(url, allow_redirects=True)
with open(fname, 'wb') as fh:
  fh.write(r.content) #download file

with ZipFile(fname, 'r') as zfile:
  zfile.extractall() #extract contents

if os.path.exists(fname):
  os.remove(fname) #delete zip file
else:
  print(f"The file {fname} does not exist")

print("Download completed.")

Start downloading and unzipping `Range model`...
Download completed.


#### Load the model
Next, we load the model using Tensorflow

In [2]:
import tensorflow as tf 
model = tf.keras.models.load_model('18kim_range_vanilla') 
model.summary() #verify architecture

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
rescaling_1 (Rescaling)      (None, 160, 160, 1)       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 158, 158, 160)     1600      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 79, 79, 160)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 77, 77, 80)        115280    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 38, 38, 80)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 36, 36, 40)        28840     
_________________________________________________________________
flatten (Flatten)            (None, 51840)             0

## 2: Download & load datasets to test model with
Download the test datasets from OSF and extract the contents into the newly created directory: `content/datasets/`

In [3]:
# @title Download datasets

print("Start downloading and unzipping `18 test datasets`...")
name = 'model2_dset1-18_cond'
fname = f"{name}.zip"
url = f"https://osf.io/jkryf/download" #osf share link
r = requests.get(url, allow_redirects=True)
with open(fname, 'wb') as fh:
  fh.write(r.content) #download file

with ZipFile(fname, 'r') as zfile:
  zfile.extractall("datasets") #extract contents

if os.path.exists(fname):
  os.remove(fname) #delete zip file
else:
  print(f"The file {fname} does not exist")

print("Download completed.")

Start downloading and unzipping `18 test datasets`...
Download completed.


Load all 18 sets and use prefetch to streamline image loading

In [4]:
# @title Load datasets into tensorflow

from tensorflow.keras.preprocessing import image_dataset_from_directory

BATCH_SIZE = 32 
IMG_SIZE = (160, 160) #forces a resize from 170x170 since MobileNetV2 has weights only for certain sizes
AUTOTUNE = tf.data.AUTOTUNE #prompts the tf.data runtime to tune the value dynamically at runtime
def model2_init_sets(BATCH_SIZE, IMG_SIZE, AUTOTUNE):
    curr_dir = os.getcwd() 
    set1_dir = os.path.join(curr_dir, 'datasets/s1-t_0.1-c_0.3')
    set2_dir = os.path.join(curr_dir, 'datasets/s2-t_0.1-c_0.45')
    set3_dir = os.path.join(curr_dir, 'datasets/s3-t_0.1-c_1')
    set4_dir = os.path.join(curr_dir, 'datasets/s4-t_0.2-c_0.3')
    set5_dir = os.path.join(curr_dir, 'datasets/s5-t_0.2-c_0.45')
    set6_dir = os.path.join(curr_dir, 'datasets/s6-t_0.2-c_1')
    set7_dir = os.path.join(curr_dir, 'datasets/s7-t_0.4-c_0.3')
    set8_dir = os.path.join(curr_dir, 'datasets/s8-t_0.4-c_0.45')
    set9_dir = os.path.join(curr_dir, 'datasets/s9-t_0.4-c_1')
    set10_dir = os.path.join(curr_dir, 'datasets/s10-t_0.8-c_0.3')
    set11_dir = os.path.join(curr_dir, 'datasets/s11-t_0.8-c_0.45')
    set12_dir = os.path.join(curr_dir, 'datasets/s12-t_0.8-c_1')
    set13_dir = os.path.join(curr_dir, 'datasets/s13-t_1.6-c_0.3')
    set14_dir = os.path.join(curr_dir, 'datasets/s14-t_1.6-c_0.45')
    set15_dir = os.path.join(curr_dir, 'datasets/s15-t_1.6-c_1')
    set16_dir = os.path.join(curr_dir, 'datasets/s16-t_3.2-c_0.3')
    set17_dir = os.path.join(curr_dir, 'datasets/s17-t_3.2-c_0.45')
    set18_dir = os.path.join(curr_dir, 'datasets/s18-t_3.2-c_1')
    set1 = image_dataset_from_directory(set1_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale') #3000 images 2 classes
    set2 = image_dataset_from_directory(set2_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set3 = image_dataset_from_directory(set3_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set4 = image_dataset_from_directory(set4_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set5 = image_dataset_from_directory(set5_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set6 = image_dataset_from_directory(set6_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set7 = image_dataset_from_directory(set7_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale') #3000 images 2 classes
    set8 = image_dataset_from_directory(set8_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set9 = image_dataset_from_directory(set9_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set10 = image_dataset_from_directory(set10_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set11 = image_dataset_from_directory(set11_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set12 = image_dataset_from_directory(set12_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set13 = image_dataset_from_directory(set13_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale') #3000 images 2 classes
    set14 = image_dataset_from_directory(set14_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set15 = image_dataset_from_directory(set15_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set16 = image_dataset_from_directory(set16_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set17 = image_dataset_from_directory(set17_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set18 = image_dataset_from_directory(set18_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE, color_mode='grayscale')
    set2 = set2.prefetch(buffer_size=AUTOTUNE) 
    set1 = set1.prefetch(buffer_size=AUTOTUNE) 
    set3 = set3.prefetch(buffer_size=AUTOTUNE) 
    set4 = set4.prefetch(buffer_size=AUTOTUNE) 
    set5 = set5.prefetch(buffer_size=AUTOTUNE) 
    set6 = set6.prefetch(buffer_size=AUTOTUNE) 
    set7 = set7.prefetch(buffer_size=AUTOTUNE) 
    set8 = set8.prefetch(buffer_size=AUTOTUNE) 
    set9 = set9.prefetch(buffer_size=AUTOTUNE) 
    set10 = set10.prefetch(buffer_size=AUTOTUNE)
    set11 = set11.prefetch(buffer_size=AUTOTUNE)
    set12 = set12.prefetch(buffer_size=AUTOTUNE)
    set13 = set13.prefetch(buffer_size=AUTOTUNE)
    set14 = set14.prefetch(buffer_size=AUTOTUNE)
    set15 = set15.prefetch(buffer_size=AUTOTUNE)
    set16 = set16.prefetch(buffer_size=AUTOTUNE)
    set17 = set17.prefetch(buffer_size=AUTOTUNE)
    set18 = set18.prefetch(buffer_size=AUTOTUNE)
    return set1,set2,set3,set4,set5,set6,set7,set8,set9,set10,set11,set12,set13,set14,set15,set16,set17,set18

set1,set2,set3,set4,set5,set6,set7,set8,set9,set10,set11,set12,set13,set14,set15,set16,set17,set18 = model2_init_sets(BATCH_SIZE, IMG_SIZE, AUTOTUNE)

Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.


## **NEXT:** get code from vs code to run through all sets and merge with this goal: save all logits and actual labels for each dataset in a dataframe which will be 18*2=36 columns and however many samples for rows

Define function for generating logits from dataset:

In [5]:
def get_logits(dataset, model):
    all_pred=tf.zeros([], tf.float64) #initialize array to hold all prediction logits (single element)
    all_labels=tf.zeros([], tf.float64) #initialize array to hold all actual labels (single element)
    for image_batch, label_batch in dataset.as_numpy_iterator():
        predictions = model.predict_on_batch(image_batch).flatten() #run batch through model and return logits
        all_pred = tf.experimental.numpy.append(all_pred, predictions)
        all_labels = tf.experimental.numpy.append(all_labels, label_batch)
    #tf.size(all_pred) #1335 elements, 1334 images + 1 placeholder 0 at beginning
    all_pred = all_pred[1:]
    all_labels = all_labels[1:]
    return all_pred,all_labels

NEXT: Set random seed to see about reproducing the same logit predictions each time by starting with the same batch first

In [16]:
all_pred, all_labels = get_logits(set1,model)

In [17]:
all_pred.numpy()[:5] #first five logits, of 1000

array([-1.05096734,  2.50202107, -3.50221658, -1.16470253, -1.67171085])

In [15]:
all_pred.numpy().shape

(1000,)

In [None]:
all_sets = [set1,set2,set3,set4,set5,set6,set7,set8,set9,set10,set11,set12,set13,set14,set15,set16,set17,set18]
for dataset in all_sets: #run for all sets:
    df, df_results, all_avg_acc, avg_accuracy = get_conf_acc_dataset(dataset, model, logit_thres_min, logit_thres_max)

### 3: Test different datasets
Setup and run the test to calculate both Tf's accuracy and my naive calculation

In [None]:
def test_dataset(current_set):
  all_acc=tf.zeros([], tf.float64) #initialize array to hold all accuracy indicators (single element)
  loss, acc = model.evaluate(current_set) #now test the model's performance on the test set
  for image_batch, label_batch in current_set.as_numpy_iterator():
      predictions = model.predict_on_batch(image_batch).flatten() #run batch through model and return logits
      predictions = tf.nn.sigmoid(predictions) #apply sigmoid activation function to transform logits to [0,1]
      predictions = tf.where(predictions < 0.5, 0, 1) #round down or up accordingly since it's a binary classifier
      accuracy = tf.where(tf.equal(predictions,label_batch),1,0) #correct is 1 and incorrect is 0
      all_acc = tf.experimental.numpy.append(all_acc, accuracy)
  all_acc = all_acc[1:]  #drop first placeholder element
  avg_acc = tf.reduce_mean(all_acc)
  print('My Accuracy:', avg_acc.numpy()) 
  print('Tf Accuracy:', acc) 


In [None]:
test_dataset(set1)

My Accuracy: 0.543
Tf Accuracy: 0.503000020980835


My accuracy yielded a 54.3% accuracy while Tensorflow's yielded 50.30%

In [None]:
test_dataset(set4)

My Accuracy: 0.558
Tf Accuracy: 0.5040000081062317


My accuracy yielded a 54.3% accuracy while Tensorflow's yielded 50.30%

In [None]:
test_dataset(set10)

My Accuracy: 0.552
Tf Accuracy: 0.5040000081062317


My accuracy yielded a 55.2% accuracy while Tensorflow's yielded 50.40%

In [None]:
test_dataset(set15)

My Accuracy: 0.987
Tf Accuracy: 0.9660000205039978


My accuracy yielded a 98.7% accuracy while Tensorflow's yielded 96.6%

In [None]:
test_dataset(set16)

My Accuracy: 0.648
Tf Accuracy: 0.5199999809265137


My accuracy yielded a 64.8% accuracy while Tensorflow's yielded ~52%

In [None]:
test_dataset(set17)

My Accuracy: 0.832
Tf Accuracy: 0.675000011920929


My accuracy yielded a 83.2% accuracy while Tensorflow's yielded 67.5%

In [None]:
test_dataset(set18)

My Accuracy: 1.0
Tf Accuracy: 1.0


My accuracy yielded 100% accuracy while Tensorflow's yielded 100%

Dataset 17 showed the biggest disparity and dataset 18 was the only set that showed the same accuracy in both calculations