<a id='contents'></a>
## Contents

* [Setup](#setup)
* [Loading and processing metadata](#loading_and)
* [Class distribution](#class_distribution)
* [Train/val split](#trainval_split)
* [Balancing the training set](#balancing)
* [Expanding the validation set](#expanding)
* [Fine-tuning EfficientNet or ResNet18](#fine-tuning)
* [Small sample for testing code](#small_sample)
* [Model architecture and state dictionary](#model_architecture)
* [Inference: getting probabilities](#inference1)
* [Inference: combining probabilities](#inference2)
* [Inference: combining predictions](#inference3)
* [Evaluation](#evaluation)

<a id='setup'></a>
## Setup
↑↑ [Contents](#contents) ↓ [Loading and processing metadata](#loading_and)

The code cell below appears at the top of each notebook. 

1. **Environment Variable Setup:**
   - Checks if running on Google Colab (`COLAB_GPU` in `os.environ`).
   - If on Colab, mounts Google Drive and sets `SKIN_LESION_CLASSIFICATION` environment variable.
   - If not on Colab, expects `SKIN_LESION_CLASSIFICATION` environment variable set to project root on local system.

2. **Path Definitions:**
   - Defines `project_path` as a `Path` object representing project root using environment variable.

3. **Custom Module Import Setup:**
   - Defines `scripts_path` as relative path to `/scripts` directory within project.

4. **Module Import Configuration:**
   - Adds `scripts_path` to `sys.path` for Python module import search.

5. **Importing Custom Modules:**
   - Imports `path_setup.subfolders` function from custom `utils` module in `/scripts` directory.
   - This function likely sets up a dictionary containing paths to all subdirectories in project root.

In [1]:
import os
from pathlib import Path
import sys

# If we're using Google Colab, we set the environment variable to point to the relevant folder in our Google Drive:
if 'COLAB_GPU' in os.environ:
    from google.colab import drive
    drive.mount('/content/drive')
    os.environ['SKIN_LESION_CLASSIFICATION'] = '/content/drive/MyDrive/Colab Notebooks/skin-lesion-classification'

# Otherwise, we use the environment variable on our local system:
project_environment_variable = "SKIN_LESION_CLASSIFICATION"

# Path to the root directory of the project:
project_path = Path(os.environ.get(project_environment_variable))

# Relative path to /scripts (from where custom modules will be imported):
scripts_path = project_path.joinpath("scripts")

# Add this path to sys.path so that Python will look there for modules:
sys.path.append(str(scripts_path))

# Now import path_step from our custom utils module to create a dictionary to all subdirectories in our root directory:
from utils import path_setup
path = path_setup.subfolders(project_path)

path['project'] : D:\projects\skin-lesion-classification
path['images'] : D:\projects\skin-lesion-classification\images
path['models'] : D:\projects\skin-lesion-classification\models
path['expository'] : D:\projects\skin-lesion-classification\expository
path['literature'] : D:\projects\skin-lesion-classification\literature
path['notebooks'] : D:\projects\skin-lesion-classification\notebooks
path['presentation'] : D:\projects\skin-lesion-classification\presentation
path['scripts'] : D:\projects\skin-lesion-classification\scripts
path['streamlit'] : D:\projects\skin-lesion-classification\streamlit


<a id='loading_and'></a>
## Loading and processing metadata
↑↑ [Contents](#contents) ↑ [Setup](#setup) ↓ [Class distribution](#class_distribution)

We use the ```process``` class of our custom ```processing``` module to facilitate processing of metadata (metadata for all images is contained in the ```metadata.csv``` file). Most of the attributes of the ```process``` can be seen in the code cell below: we typically won't need to set all of them, but we list them to show the possibilities. 

For instance, we could specify ```restrict_to = {'dx' : ['mel', 'nv']}``` and ```to_classify = {'melanoma' : ['mel'], 'mole' : ['nv'] }```. This would effectively restrict our dataset to all images of all lesions with ```dx``` class either ```mel``` or ```nv```, then set up a ```mel``` versus ```nv``` binary classification problem. (Indeed, this will be one of things we do.)

* ```tvr```: 'training set to validation set' ratio. This refers to the lesions in our dataset (7470 of them if we use the entire dataset). ```tvr = 3``` will result in approximately 75% of the lesions in our dataset being represented in the training set (that's 5601 lesions if we start with the whole dataset).

* ```seed```: sets a random seed to be used whenever randomness is used in a processing step, such as shuffling the records at the beginning of the train/val split.

* ```keep_first```: there are 7470 distinct lesions represented by 10015 different images, because some lesions are represented by multiple images. Thus, when performing the train/val split, for each lesion, we have a choice as to which image to select as the 'first' representative image of the lesion (which we'll label as ```t1```, but more on this later). If ```keep_first``` is ```True```, we select the first image corresponding to each lesion as its principal representative; if ```False```, we select a random image corresponding to each lesion as its principal representative.

* ```stratified```: if ```True```, we perform a stratified train/val split, meaning that we first select lesions within each class according to ```tvr```/```seed```/```keep_first``` logic, then combine them; if ```False```, we select lesions from the dataset globally according to ```tvr```/```seed```/```keep_first``` logic. Either way, the overall distribution of lesions by class will be more-or-less preserved in the training and validation sets, but it will be more exact if we perform a stratified split (as we always will).

* ```train_one_img_per_lesion```: if ```True```, each lesion will be represented by precisely one image in the training set, meaning that a model will only ever see one image of each lesion during training; if ```False```, each lesion may be represented by more than one image in the training set (if there is more than one image of the lesion in our original dataset), and a model therefore may see more than one image of the lesion. If we balance the training set (see below), by upsampling lesions, this will be done by sampling one single image of each lesion multiple times if ```train_one_img_per_lesion``` is ```True```. However, if ```train_one_img_per_lesion``` is ```False``` and we balance the training set, for any given lesion we will select all the different images we have of it before sampling the exact same image another time. We'll go into this in detail below.

* ```val_expansion_factor```: if a positive integer $n$, this will cause each lesion in the validation set to be repeated $n$ times, meaning that $n$ images (not necessarily all different) of the lesion will be passed to the model in the evaluation stage, to obtain $n$ sets of probabilities for each of the classes. If $n > 1$, we would combine the $n$ sets of probabilities and predictions into a single prediction for each lesion. As with balancing the training set, there are two ways we can 'expand' the validation set: we can have the exact same image of the lesion repeated $n$ times, or we can use all available images before repeating an image another time. There's no attribute for specifying, becase we will always just do both. If ```val_expansion_factor``` is ```None```, the 'all images per lesion' validation set will use all available images of each lesion (which will vary from lesion to lesion), and combine predictions into a single prediction for each lesion. Note that if we apply a random transformation to an image before it is fed to our model, that means the model will give different sets of probabilities for each image of a lesion, even if we're validating on the exact same image repeated $n$ times. We'll go into this in more detail below.

* ```sample_size```: we can specify how many images from each lesion class we want to use to train our model. We've just discussed the two different ways we can oversample lesions.

In [2]:
from typing import Type, Union      # For type hints
from processing import process      # Custom module for processing metadata

data_dir: Path = path["images"]     # Path to directory containing metadata.csv file
csv_filename: str = "metadata.csv"  # The filename
    
restrict_to: Union[dict, None] = None                   # Remove all records *unless* column k lies in list v, for k : v in restrict_to dictionary.    
remove_if: Union[dict, None] = None                     # Remove all records if column k lies in list v, for k : v in remove_if dictionary.    
drop_row_if_missing_value_in: Union[list, None] = None  # We drop all rows for which there is a missing value in a column from this list.   
                                    
tvr: int = 3              # Ratio of training set to validation set. See discussion below for explanation.
seed: int = 0             # Random seed for parts of the process where randomness is called for.
keep_first: bool = False  # If False, then, for each lesion, we choose a random image to assign to our training set. 
stratified: bool = True   # If True, we stratify classes so that the proportions remain as stable as possible after train/val split. 
                          # If False, the proportions will be roughly similar.

to_classify: Union[list, dict] = ["mel",   # These are the lesion types we are interested in classifying. 
                                  "bcc",   # Any missing ones will be grouped together as the 0-label class: no need to write "other" here.
                                  "akiec", # If 'other' is not desired, use restrict_to attribute above
                                  "nv",]   # Can also be a dictionary, like { 'malignant' : ['mel', 'bcc'], 'benign' : ['nv', 'bkl']}

train_one_img_per_lesion: Union[bool, None] = False # If False, we take advantage of the (in some cases) multiple images of a lesion in our dataset
val_expansion_factor: Union[int, None] = 3          # A random transformation may be applied to an image before making a prediction.
                                                    # For a given lesion, we may make multiple predictions (as specified here), and combine them into a single prediction.
    
sample_size: Union[None, dict] = {"mel": 2000,     # Handling class imbalance by upsampling minority classes/downsampling majority classes     
                                  "bcc": 2000,     # Specify how many images of each lesion diagnosis we want in our training set.
                                  "akiec": 2000, 
                                  "nv": 2000,
                                  "other" : 2000,} # Could also leave out "other" here, and include e.g. "df: 2000" if we wanted to.    

Having set the attributes above, we can now create an isntance of the ```process``` class, as below.

In [3]:
# Create an instance of the process class with attribute values as above.
demo = process(data_dir=data_dir,
               csv_filename=csv_filename,
               restrict_to=restrict_to,
               remove_if=remove_if,
               drop_row_if_missing_value_in=drop_row_if_missing_value_in,
               tvr=tvr,
               seed=seed,
               keep_first=keep_first,
               stratified=stratified,
               to_classify=to_classify,
               train_one_img_per_lesion=train_one_img_per_lesion,
               val_expansion_factor=val_expansion_factor,
               sample_size=sample_size,)

- Loaded file 'D:\projects\skin-lesion-classification\images\metadata.csv'.
- Inserted 'num_images' column in dataframe, to the right of 'lesion_id' column.
- Inserted 'label' column in dataframe, to the right of 'dx' column: 
  {'bkl': 0, 'df': 0, 'vasc': 0, 'akiec': 1, 'bcc': 2, 'mel': 3, 'nv': 4}
- Added 'set' column to dataframe, with values 't1', 'v1', 'ta', and 'va', to the right of 'localization' column.
- Basic, overall dataframe (pre-train/test split): self.df
- Balancing classes in training set.
- Balanced training set (uses as many different images per lesion as possible): self.df_train
- Expanding validation set: will combine 3 predictions into one, for each lesion in val set.
- Expanded validation set (one image per lesion, repeated 3 times): self.df_val1
- Expanded validation set (use up to 3 different images per lesion, if available): self.df_val_a
- Small sample dataframes for code testing: self._df_train_code_test, self._df_val1_code_test, self._df_val_a_code_test


At this stage, we will just point out the mappings ```{'bkl': 0, 'df': 0, 'vasc': 0, 'akiec': 1, 'bcc': 2, 'mel': 3, 'nv': 4}``` (in this example). The three lesion classes get mapped to ```other``` in this classification task we've specified. It's important to maintain consistency of the mappings across the entire training and validation process, which is why we've taken care to order the lesion classes alphabetically, regardless of the way a user might specify the ```to_classify``` attribute. Labels are encoded according the the ```label_codes``` attribute of the ```process``` class: if there is an ```other``` class, it is always encoded as ```0```, then the other class names are encoded in alphabetical order. 

In [5]:
demo.label_codes

{0: 'other', 1: 'akiec', 2: 'bcc', 3: 'mel', 4: 'nv'}

<a id='class_distribution'></a>
## Class distribution
↑↑ [Contents](#contents) ↑ [Loading and processing metadata](#loading_and) ↓ [Train/val split](#trainval_split)

We can use the ```dx_dist``` method of the ```process``` class to tabulate the distribution of lesions by class, and also of images by class. Below, we can see that our stratified train/val split has preserved relative proportions of the five classes by lesion. We can also see the distribution of images by class: note that it is different from the distribution of lesions by class, and not necessarily preserved, because for certain classes (like melanoma) there are more likely to be multiple images per lesion. 

In [6]:
for across in ["lesions", "images"]:
    for subset in ["all", "train", "val"]:
        process.dx_dist(demo, subset = subset, across = across)


DISTRIBUTION OF LESIONS BY DIAGNOSIS: OVERALL



dx,nv,other,mel,bcc,akiec
freq,5403.0,898.0,614.0,327.0,228.0
%,72.33,12.02,8.22,4.38,3.05


Total lesions: 7470.


DISTRIBUTION OF LESIONS BY DIAGNOSIS: TRAIN



dx,nv,other,mel,bcc,akiec
freq,4052.0,673.0,460.0,245.0,171.0
%,72.34,12.02,8.21,4.37,3.05


Total lesions: 5601 (74.98% of all lesions).


DISTRIBUTION OF LESIONS BY DIAGNOSIS: VAL



dx,nv,other,mel,bcc,akiec
freq,1351.0,225.0,154.0,82.0,57.0
%,72.28,12.04,8.24,4.39,3.05


Total lesions: 1869 (25.02% of all lesions).


DISTRIBUTION OF IMAGES BY DIAGNOSIS: OVERALL



dx,nv,other,mel,bcc,akiec
freq,6705.0,1356.0,1113.0,514.0,327.0
%,66.95,13.54,11.11,5.13,3.27


Total images: 10015.


DISTRIBUTION OF IMAGES BY DIAGNOSIS: TRAIN



dx,nv,other,mel,bcc,akiec
freq,5007.0,1008.0,831.0,384.0,250.0
%,66.94,13.48,11.11,5.13,3.34


Total images: 7480 (74.69% of all images).


DISTRIBUTION OF IMAGES BY DIAGNOSIS: VAL



dx,nv,other,mel,bcc,akiec
freq,1698.0,348.0,282.0,130.0,77.0
%,66.98,13.73,11.12,5.13,3.04


Total images: 2535 (25.31% of all images).



<a id='trainval_split'></a>
## Train/val split
↑↑ [Contents](#contents) ↑ [Class distribution](#class_distribution) ↓ [Balancing the training set](#balancing)

<!-- <details>
    <summary><b><i>Train test split explanation: click here to expand/collapse</i></b></summary> -->
    
We partition our dataset based on ```lesion_id```, **not** on ```image_id```: that way, every lesion will be represented in training or in validation, but not both.

For each classification task, we will train a model by making use of
* **exactly one** image for every lesion in our training set;
* **all** images of every lesion in our training set.

In both cases, we will vaildate our model by making use of 
* **exactly one** image for every lesion in our validation set;
* **all** images of every lesion in our validation set (at least, _potentially_ all of them). 

**However**, we will make only one prediction per lesion (```lesion_id```) in our validation set: if there are multiple images of a lesion in the validation set, we will combine the predictions for the multiple images into a single prediction for the lesion.

Accordingly, we proceed as follows. We'll explain by example, assuming the dataset is not filtered before splitting (if it is, the number of distinct lesions will be less than $7470$, and the proportions will be different).
1. Randomly select (without replacement) a proportion of our $7470$ distinct ```lesion_id```s and label them with ```t``` (train). 
2. Label the remaining ```lesion_id```s with ```v``` (validate).
3. For each ```lesion_id``` labeled with a ```t```:
    * Select an ```image_id``` and label it ```t1```.
    * Label all (if any) remaining ```image_id```s corresponding to this ```lesion_id``` with ```ta```.
4.  For each ```lesion_id``` labeled with a ```v```:
    * Select an ```image_id``` and label it ```v1```.
    * Label all (if any) remaining ```image_id```s corresponding to this ```lesion_id``` with ```va```.

In Step 1, the number of ```lesion_id```s randomly selected to be labeled ```t``` will be such that the ratio of ```t```s to ```v```s is as close as possible to a specified ratio ```tvr``` (we default to $3$, i.e. $\approx75\%$ of lesions are represented in training). In Steps 3 and 4, the first substep can be done randomly (our default choice), or we can simply choose the "first" image in our table that corresponds to the lesion (see ```keep_first``` attribute of the ```process``` class). 

The four train/val scenarios we could consider are:
* ```t1v1```: train on precisely those images labeled ```t1``` and validate on precisely those labeled ```v1```.
* ```t1va```: train on precisely those images labeled ```t1``` and validate on precisely those labeled ```v1``` **or** ```va```.
* ```tav1```: train on precisely those images labeled ```t1``` **or** ```ta``` and validate on precisely those labeled ```v1```.
* ```tava```: train on precisely those images labeled ```t1``` **or** ```ta``` and validate on precisely those labeled ```v1``` ***or*** ```va```.

The mnemonic is ```t``` for training, ```v``` for validation, ```1``` for one-image-per-lesion, and ```a``` for all images.
<!-- </details> -->

In [7]:
# Let's have a look at our metadata dataframe, which is now just an attribute of the metadata instance of the process class.
from utils import print_header

instance = demo
df = instance.df

print_header("First five rows of metadata table")

to_print = ["Added columns\n".upper(), 
            "\'num_images\': number of images of lesion in dataset", 
            "\'label\': class to which lesion belongs",
            "\'set\': train/val assignment",
            "\'t*\': lesion is in the training set",
            "\'v*\': lesion is in the validation set",
            "\'t1\': we would train on this image if training a model on exactly one, or on all, image(s) per lesion in the training set",
            "If training set is balanced using one image per lesion, this one image would be re-used as many times as necessary.",
            "\'ta\': we would train on this image if training a model on all images of each lesion in the training set",
            "If training set is balanced using all images per lesion, images labeled ta would all be used before any image is repeated.",
            "\'v1': we\'d use this image if validating a model on exactly one, or on all, image(s) per lesion in the validation set",
            "If a validation expansion factor is given, this one image would be re-used that many times",
            "\'va': we\'d use this image if validating on all images of each lesion in the validation set" ,
            "If a validation expansion factor is given, iamges labeled va would all be used before any image is repeated.",
            "NB: if more than one image is used for any lesion in validation, the predictions will be combined into a single prediction"
           ]

print("\n- ".join(to_print))
display(df.head())


FIRST FIVE ROWS OF METADATA TABLE

ADDED COLUMNS

- 'num_images': number of images of lesion in dataset
- 'label': class to which lesion belongs
- 'set': train/val assignment
- 't*': lesion is in the training set
- 'v*': lesion is in the validation set
- 't1': we would train on this image if training a model on exactly one, or on all, image(s) per lesion in the training set
- If training set is balanced using one image per lesion, this one image would be re-used as many times as necessary.
- 'ta': we would train on this image if training a model on all images of each lesion in the training set
- If training set is balanced using all images per lesion, images labeled ta would all be used before any image is repeated.
- 'v1': we'd use this image if validating a model on exactly one, or on all, image(s) per lesion in the validation set
- If a validation expansion factor is given, this one image would be re-used that many times
- 'va': we'd use this image if validating on all images of ea

Unnamed: 0,lesion_id,num_images,image_id,dx,label,dx_type,age,sex,localization,set
0,HAM_0000118,2,ISIC_0027419,bkl,0,histo,80.0,male,scalp,ta
1,HAM_0000118,2,ISIC_0025030,bkl,0,histo,80.0,male,scalp,t1
2,HAM_0002730,2,ISIC_0026769,bkl,0,histo,80.0,male,scalp,va
3,HAM_0002730,2,ISIC_0025661,bkl,0,histo,80.0,male,scalp,v1
4,HAM_0001466,2,ISIC_0031633,bkl,0,histo,75.0,male,ear,va


<a id='balancing'></a>
## Balancing the training set
↑↑ [Contents](#contents) ↑ [Train/val split](#trainval_split) ↓ [Expanding the validation set](#expanding)

<!-- <details>
    <summary><b><i>Balancing/upsampling explanation: click here to expand/collapse</i></b></summary> -->

We explain the balancing procedure by way of example. (This is performed by the ```balance``` method of the ```process``` class in our ```processing``` module.) We assume the dataset has not been filtered, training to validation ratio is $3$, etc. There are $460$ distinct melanoma lesions represented in our training set. As most melanoma are represented by multiple distinct images, there are a total of $831$ distinct images of melanoma lesions in our training set. Suppose we want our training set to contain $2000$ melanoma images: each of the $460$ distinct melanoma lesions will be represented by $2000/460 \approx 4.35$ images on average. We do not merely sample with replacement.

The goal is to (a) have as little variance as possible in the number of times a lesion is represented, and (b) use as many distinct images as possible (taking advantage of the fact that there are multiple _distinct_ images of most melanoma). Thus, we note that $2000 = 4\times 460 + 160$, so we will use each of the $460$ distinct melanoma lesions four times, and make the remainder up by randomly sample $160$ distinct lesions from the $160$. In other words, exactly $300$ distinct lesions will each be represented by exactly four images, and exactly $160$ distinct lesions will each be represented by exactly five images: $2000 = 300 \times 4 + 160 \times 5$. 

How do we select the four images of each distinct melanoma lesion (plus another one image for $160$ of them)? Consider lesion id ```HAM_0000871``` for example: there are three distinct images of this lesion in our data set. Thus, if ```train_one_img_per_lesion``` is ```False```, we will use all three of them, and then randomly select one more (or two more if this particular lesion were to be one of the $160$ that are represented five times). See below. On the other hand, if ```train_one_img_per_lesion``` is ```True```, we have no choice but to use the one image (label ```t1```) four times.
    
<!-- </details> -->

In [8]:
# from utils import print_header

# The specific numbers in this example assume a certain choice for the attributes, including 
# sample_size: Union[None, dict] = {"mel": 2000,         
#                                   "bcc": 2000, 
#                                   "akiec": 2000, 
#                                   "nv": 2000,
#                                   "other" : 2000,}

instance = demo
df = demo.df_train

print_header("Eg: Representations of lesion HAM_0000871 in balanced training set")

to_print = ["HAM_0000871 represented by four images\n".upper(),
            "Three distinct images of this lesion to choose from: ISIC_0025964, ISIC_0030623, and ISIC_0025964",
            "Use ISIC_0025964 once, ISIC_0030623 twice, and ISIC_0025964 once",]

print("\n- ".join(to_print))

display(df[df['lesion_id'] == 'HAM_0000871'])


EG: REPRESENTATIONS OF LESION HAM_0000871 IN BALANCED TRAINING SET

HAM_0000871 REPRESENTED BY FOUR IMAGES

- Three distinct images of this lesion to choose from: ISIC_0025964, ISIC_0030623, and ISIC_0025964
- Use ISIC_0025964 once, ISIC_0030623 twice, and ISIC_0025964 once


Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set
1773,HAM_0000871,4,3,ISIC_0025964,2,mel,3,histo,40.0,female,chest,ta
1774,HAM_0000871,4,3,ISIC_0025964,2,mel,3,histo,40.0,female,chest,ta
1775,HAM_0000871,4,3,ISIC_0030623,1,mel,3,histo,40.0,female,chest,t1
3088,HAM_0000871,4,3,ISIC_0026506,1,mel,3,histo,40.0,female,trunk,ta


In [9]:
# from utils import print_header
# The specific numbers given in this example assume a certain choice for the attributes, including 
# sample_size: Union[None, dict] = {"mel": 2000,         
#                                   "bcc": 2000, 
#                                   "akiec": 2000, 
#                                   "nv": 2000,
#                                   "other" : 2000,}

instance = demo
df = demo.df_train
df = df[df['set'].isin(["ta", "t1"]) & (df['dx'] == 'mel')]

print_header("Eg: Melanoma in balanced training set")

to_print = ["Value counts for \'lesion_mult\' column\n".upper(),
            "300 distinct melanoma lesions each represented by four images: 300*4 = 1200",
            "160 distinct melanoma lesions each represented by five images: 160*5 = 800",]

print("\n- ".join(to_print[:3]))
display(df['lesion_mult'].value_counts())

print("\n- ".join(to_print[3:]))

display(df)


EG: MELANOMA IN BALANCED TRAINING SET

VALUE COUNTS FOR 'LESION_MULT' COLUMN

- 300 distinct melanoma lesions each represented by four images: 300*4 = 1200
- 160 distinct melanoma lesions each represented by five images: 160*5 = 800


4    1200
5     800
Name: lesion_mult, dtype: int64




Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set
1773,HAM_0000871,4,3,ISIC_0025964,2,mel,3,histo,40.0,female,chest,ta
1774,HAM_0000871,4,3,ISIC_0025964,2,mel,3,histo,40.0,female,chest,ta
1775,HAM_0000871,4,3,ISIC_0030623,1,mel,3,histo,40.0,female,chest,t1
1776,HAM_0000040,5,1,ISIC_0027190,5,mel,3,histo,80.0,male,upper extremity,t1
1777,HAM_0000040,5,1,ISIC_0027190,5,mel,3,histo,80.0,male,upper extremity,t1
...,...,...,...,...,...,...,...,...,...,...,...,...
7773,HAM_0002552,5,3,ISIC_0032936,2,mel,3,histo,25.0,male,upper extremity,ta
7774,HAM_0002552,5,3,ISIC_0032936,2,mel,3,histo,25.0,male,upper extremity,ta
7780,HAM_0002552,5,3,ISIC_0033232,1,mel,3,histo,25.0,male,upper extremity,ta
9998,HAM_0003521,5,2,ISIC_0032258,2,mel,3,histo,70.0,female,back,ta


<a id='expanding'></a>
## Expanding the validation set
↑↑ [Contents](#contents) ↑ [Balancing the training set](#balancing) ↓ [Fine-tuning EfficientNet or ResNet18](#fine-tuning)

As mentioned already, we make one prediction per lesion. However, we may have multiple images of a given lesion at our disposal: we could make a prediction for each of them and combine them somehow into a single prediction for the lesion. Even if there is only one image of a lesion, we could make multiple predictions on it: if a random transformation is applied to an image before our model makes a prediction on it, this would yield a different array of probabilities each time. Again, we could combine the results into a single prediction.

This is what the attribute ```val_expansion_factor``` of the ```process``` class is concerned with. Similarly to the way we balance the training set, we can replicate one single image per lesion in the validation set as many times as specified by ```val_expansion_factor```, as in ```self.df_val1```, or we can take advantage of other images of the lesion (if available), as in ```self.val_a```.

Note that if ```val_expansion_factor``` is set to a positive integer $n$, both validation sets ```self.df_val1``` and ```self.df_val_a``` will have the same number of records ($n$ times the number of lesions in the validation set), because each lesion will be represented by $n$ images. However, if ```val_expansion_factor``` is ```None``` (which is the default), then ```self.df_val_a``` will just contain all images of each lesion in the validation set (and the number of images of a given lesion may vary from one to six).

In [10]:
# from utils import print_header
# The specific numbers given in this example assume a certain choice for the attributes  

instance = demo

df = demo.df_val1
df = df[df['set'].isin(["va", "v1"]) & (df['dx'] == 'mel')]

print_header("Eg: Melanoma in expanded validation set (only one image per lesion used)")

to_print = [f"- Note that \'lesion_mult\' is always {instance.val_expansion_factor}",
            "HAM_0005678 represented by three images",
            "Two distinct images of this lesion: ISIC_0031023 and ISIC_0028086",
            f"However, only use ISIC_0031023 ({instance.val_expansion_factor} times)",]

print("\n- ".join(to_print))

display(df)

df = demo.df_val_a
df = df[df['set'].isin(["va", "v1"]) & (df['dx'] == 'mel')]

print_header("Eg: Melanoma in expanded validation set (all images used)")

to_print = [f"- Note that \'lesion_mult\' is always {instance.val_expansion_factor}",
            "HAM_0005678 represented by three images",
            "Two distinct images of this lesion to choose from: ISIC_0031023 and ISIC_0028086",
            "Use ISIC_0031023 once, and ISIC_0028086 twice",]

print("\n- ".join(to_print))

display(df)


EG: MELANOMA IN EXPANDED VALIDATION SET (ONLY ONE IMAGE PER LESION USED)

- Note that 'lesion_mult' is always 3
- HAM_0005678 represented by three images
- Two distinct images of this lesion: ISIC_0031023 and ISIC_0028086
- However, only use ISIC_0031023 (3 times)


Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set
603,HAM_0005678,3,2,ISIC_0031023,3,mel,3,histo,60.0,male,chest,v1
604,HAM_0005678,3,2,ISIC_0031023,3,mel,3,histo,60.0,male,chest,v1
605,HAM_0005678,3,2,ISIC_0031023,3,mel,3,histo,60.0,male,chest,v1
606,HAM_0006722,3,2,ISIC_0031499,3,mel,3,histo,85.0,female,lower extremity,v1
607,HAM_0006722,3,2,ISIC_0031499,3,mel,3,histo,85.0,female,lower extremity,v1
...,...,...,...,...,...,...,...,...,...,...,...,...
1060,HAM_0004081,3,1,ISIC_0031957,3,mel,3,histo,70.0,female,lower extremity,v1
1061,HAM_0004081,3,1,ISIC_0031957,3,mel,3,histo,70.0,female,lower extremity,v1
1062,HAM_0004746,3,2,ISIC_0028764,3,mel,3,histo,65.0,female,back,v1
1063,HAM_0004746,3,2,ISIC_0028764,3,mel,3,histo,65.0,female,back,v1



EG: MELANOMA IN EXPANDED VALIDATION SET (ALL IMAGES USED)

- Note that 'lesion_mult' is always 3
- HAM_0005678 represented by three images
- Two distinct images of this lesion to choose from: ISIC_0031023 and ISIC_0028086
- Use ISIC_0031023 once, and ISIC_0028086 twice


Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set
603,HAM_0005678,3,2,ISIC_0031023,1,mel,3,histo,60.0,male,chest,v1
604,HAM_0005678,3,2,ISIC_0028086,2,mel,3,histo,60.0,male,chest,va
605,HAM_0005678,3,2,ISIC_0028086,2,mel,3,histo,60.0,male,chest,va
606,HAM_0006722,3,2,ISIC_0030443,1,mel,3,histo,85.0,female,lower extremity,va
607,HAM_0006722,3,2,ISIC_0031499,2,mel,3,histo,85.0,female,lower extremity,v1
...,...,...,...,...,...,...,...,...,...,...,...,...
1060,HAM_0004746,3,2,ISIC_0029021,1,mel,3,histo,65.0,female,back,va
1061,HAM_0002525,3,2,ISIC_0025188,2,mel,3,histo,55.0,male,face,va
1062,HAM_0002525,3,2,ISIC_0025188,2,mel,3,histo,55.0,male,face,va
1063,HAM_0001953,3,2,ISIC_0025611,2,mel,3,histo,65.0,male,back,va


<a id='fine-tuning'></a>
## Fine-tuning EfficientNet or ResNet18
↑↑ [Contents](#contents) ↑ [Expanding the validation set](#expanding) ↓ [Small sample for testing code](#small_sample)

At this point we are ready to train a model on our processed data. We define a class called ```cnn``` in our custom ```multiclass_models``` module. Some of the attributes of the ```cnn``` class are list below. The first attribute is ```source```, which is an instance of the ```process``` class: the processed data gets fed into the model, and the instance of the ```cnn``` class effectively inherits all of the attributes of the ```process``` class. (It's also possible to just specify a dataframe as the source, but we won't discuss this.)

The ```model``` attribute specifies whether to use ResNet model archtecture, or EfficientNet architecture. The ```transform``` attribute specifies any transformation we wish to apply to each image before the model is trained on it. If there is a random element to ```transform```, that means the model will 'see' different aspects of a lesion each time an images is passed through it, even if the exact same image is being used multiple times.

Other attributes such as ```batch_size``` and ```base_learning_rate``` are self-explanatory. We use Adam optimzation always. Other attributes like ```filename_suffix``` are related to the creation of files to store output of the model (and record-keeping, such as model attributes), but we won't go into this here.

```unfreeze_all```, if ```True```, will cause all layers of the model to be unfrozen for fine-tuning; otherwise, if ```unfreeze_last``` is ```True```, only the last block and fully-connected layer willl be unfrozen. Note that if ```unfreeze_all``` is ```True```, then all layers will be unfrozen, regardless of whether or not ```unfreeze_last``` is set as ```True``` (it will be overwritten). If not specified, ```unfreeze_all``` defaults to ```True``` (and ```unfreeze_last``` defaults to ```False```).

The ```code_test``` attribute, if ```True```, switches to a mode that uses a small number of images for code-testing purposes, which is precisely what we're doing here.

In [11]:
# Now let's set values for the attributes of our resnet18 class (the model we will use with out processed data).
# One of the attributes has to do with image transformations.

import torchvision.transforms as transforms

transform = transforms.Compose([
    
transforms.RandomCrop((300, 300)),
transforms.Resize((224,224)), # Resize images to fit ResNet input size
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),  # Normalize with ImageNet stats
])    

In [13]:
import pandas as pd
from typing import Union, List, Callable
import torchvision.models as models

source: Union[process, pd.DataFrame] = demo      # Processed data to be fed into model for training.
                                                 # Must either be an instance of the process class, or a dataframe of the same format as source.df if source were an instance of the process class.
model_dir: Path = path["models"]                 # Path to directory where models/model info/model results are stored.
transform: Union[None, 
                 transforms.Compose, 
                 List[Callable]] = transform     # Transform to be applied to images before feeding into neural network.
batch_size: Union[None, int] = 32                # Mini-batch size: default 32.
epochs: Union[None, int] = 10                    # Number of epochs (all layers unfrozen from the start): default 10.
base_learning_rate: Union[None, float] = 1/1000  # Learning rate to start with: default 1/1000. Using Adam optimizer.
filename_stem: Union[None, str] = "rn18"         # For saving model and related files. Default "rn18" (if ResNet model) or "EffNet" (if EfficientNet), or "cnn".
filename_suffix: Union[None, str] = "demo"       # Something descriptive and unique for future reference. Default empty string "".
overwrite: Union[None, bool] = True              # If False, any will generate an unused filename for saving .pth, .csv files etc., but appending a two-digit number.
                                                 # If None, will default to False. Only set to True if confident that training done on previous instances with same filename stem and suffix can be over-written.
code_test: Union[None, bool] = True
# model: Union[None, models.ResNet, models.EfficientNet] = models.efficientnet_b0(weights=models.EfficientNet_B0_Weights.DEFAULT) # Pre-trained model. Default: ResNet18.   
model: Union[None, models.ResNet, models.EfficientNet] = models.resnet18(weights="ResNet18_Weights.DEFAULT")  
unfreeze_all: Union[None, bool] = None,
unfreeze_last: Union[None, bool] = None

In [14]:
# Create an instance of the resnet18 class with attribute values as above.
from multiclass_models import cnn

resnet_demo = cnn(                                   
    source=source,                                           
    model_dir=model_dir,
    transform=transform,
    batch_size=batch_size,
    epochs=epochs,                                          
    base_learning_rate=base_learning_rate,
    filename_stem=filename_stem,
    filename_suffix=filename_suffix,                         
    overwrite=overwrite,
    code_test=code_test,    
    model=model, 
    unfreeze_all=unfreeze_all,
    unfreeze_last=unfreeze_last,
)


CODE TEST MODE

- self.epochs set to 1
- self.Print set to True
- self.filename_suffix set to 'test'
- self.overwrite set to True
- self.df_train, self.df_val1, self.df_val_a replaced with a small number of records
- Change code_test attribute to False and re-create/create new cnn instance after testing is done.

Existing files will be overwritten. 
Base filename: rn18_ta_bal_ufall_1e_demo_test_00
Attributes saved to file: D:\projects\skin-lesion-classification\models\rn18_ta_bal_ufall_1e_demo_test_00_attributes.json


<a id='small_sample'></a>
## Small sample for testing code
↑↑ [Contents](#contents) ↑ [Fine-tuning EfficientNet or ResNet18](#fine-tuning) ↓ [Model architecture and state dictionary](#model_architecture)

We can look at the records in the 'training' and 'validation' sets of our code-testing data. These are created by randomly sampling 10 (respectively, two) records for each lesion class. Note that, therefore, for the code-testing, the two variants of the validation set (one image per lesion versus all images per lesion) will consist of different lesions. Of course, when we're not code-testing, the validation sets represent exactly the same lesions (the only difference lies in the images used to represent them). 

In [15]:
# from utils import print_header

instance = resnet_demo

print_header("Code test: training set")
print(f"{instance.df_train.shape[0]} images".upper())
display(instance.df_train.head())

print_header(f"Code test: validation set (one image per lesion, repeated {instance.source.val_expansion_factor} times)")
print(f"{instance.df_val1.shape[0]} images".upper())
display(instance.df_val1.head())

print_header(f"Code test: validation set ({instance.source.val_expansion_factor} possibly different images per lesion)")
print(f"{instance.df_val_a.shape[0]} images".upper())
display(instance.df_val_a.head())


CODE TEST: TRAINING SET

169 IMAGES


Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set
0,HAM_0003768,3,2,ISIC_0029929,1,bkl,0,histo,80.0,male,upper extremity,t1
1,HAM_0004928,3,2,ISIC_0031424,1,bkl,0,histo,65.0,male,neck,t1
2,HAM_0004928,3,2,ISIC_0029770,2,bkl,0,histo,65.0,male,neck,ta
3,HAM_0004928,3,2,ISIC_0029770,2,bkl,0,histo,65.0,male,neck,ta
4,HAM_0003768,3,2,ISIC_0026634,2,bkl,0,histo,80.0,male,upper extremity,ta



CODE TEST: VALIDATION SET (ONE IMAGE PER LESION, REPEATED 3 TIMES)

42 IMAGES


Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set
0,HAM_0003218,3,1,ISIC_0033305,3,bkl,0,consensus,75.0,male,back,v1
1,HAM_0003218,3,1,ISIC_0033305,3,bkl,0,consensus,75.0,male,back,v1
2,HAM_0003218,3,1,ISIC_0033305,3,bkl,0,consensus,75.0,male,back,v1
3,HAM_0000983,3,1,ISIC_0033490,3,bkl,0,consensus,,unknown,unknown,v1
4,HAM_0000983,3,1,ISIC_0033490,3,bkl,0,consensus,,unknown,unknown,v1



CODE TEST: VALIDATION SET (3 POSSIBLY DIFFERENT IMAGES PER LESION)

42 IMAGES


Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set
0,HAM_0004406,3,2,ISIC_0034125,1,bkl,0,histo,80.0,male,back,va
1,HAM_0004406,3,2,ISIC_0033060,2,bkl,0,histo,80.0,male,back,v1
2,HAM_0004406,3,2,ISIC_0033060,2,bkl,0,histo,80.0,male,back,v1
3,HAM_0001200,3,1,ISIC_0025716,3,bkl,0,histo,85.0,male,face,v1
4,HAM_0001200,3,1,ISIC_0025716,3,bkl,0,histo,85.0,male,face,v1


The ```cnn``` class of the ```multiclass_models``` module naturally revolves around the ```train``` method, which feeds our training set into a model for fine-tuning. Below, we do this for the small batch of images in our code-test training set. For a large number of images, a GPU should be used.

Note that, as we're testing code, the ```train``` method prints out what is going through the dataloader. We can turn off this feature when training on a large number of images (it's the ```Print``` boolean attribute of the ```cnn``` class).

In [13]:
# Train the model on the specified training data by calling the train method:
# from utils import print_header

instance = resnet_demo

print_header("Code test: training and validation")
instance.train()


CODE TEST: TRAINING AND VALIDATION

image_id, label, ohe-label: ISIC_0029770, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0027149, 4, tensor([0., 0., 0., 0., 1.])
image_id, label, ohe-label: ISIC_0030230, 2, tensor([0., 0., 1., 0., 0.])
image_id, label, ohe-label: ISIC_0032114, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0029099, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0029278, 2, tensor([0., 0., 1., 0., 0.])
image_id, label, ohe-label: ISIC_0027598, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0030344, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0031831, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0029713, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0027598, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0032652, 2, tensor([0., 0., 1., 0., 0.])
image_id, label, ohe-label: ISIC_0032652, 2, tensor([0., 0., 1., 0., 0.])
i

image_id, label, ohe-label: ISIC_0024825, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0028994, 2, tensor([0., 0., 1., 0., 0.])
image_id, label, ohe-label: ISIC_0033860, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0028994, 2, tensor([0., 0., 1., 0., 0.])
image_id, label, ohe-label: ISIC_0033571, 2, tensor([0., 0., 1., 0., 0.])
image_id, label, ohe-label: ISIC_0030344, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0029713, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0030344, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0025350, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0029713, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0033065, 3, tensor([0., 0., 0., 1., 0.])
image_id, label, ohe-label: ISIC_0028314, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0030953, 1, tensor([0., 1., 0., 0., 0.])
image_id, label, ohe-label: ISIC_00339

image_id, label, ohe-label: ISIC_0026598, 3, tensor([0., 0., 0., 1., 0.])
image_id, label, ohe-label: ISIC_0026598, 3, tensor([0., 0., 0., 1., 0.])
image_id, label, ohe-label: ISIC_0026598, 3, tensor([0., 0., 0., 1., 0.])
image_id, label, ohe-label: ISIC_0031372, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0031372, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0026313, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0026629, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0026629, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0026629, 0, tensor([1., 0., 0., 0., 0.])
image_id, label, ohe-label: ISIC_0030443, 4, tensor([0., 0., 0., 0., 1.])
image_id, label, ohe-label: ISIC_0031499, 4, tensor([0., 0., 0., 0., 1.])
image_id, label, ohe-label: ISIC_0031499, 4, tensor([0., 0., 0., 0., 1.])
image_id, label, ohe-label: ISIC_0033299, 4, tensor([0., 0., 0., 0., 1.])
image_id, label, ohe-label: ISIC_00332

Epoch losses will be stored in a ```.json``` file, which can be loaded as below (if the ```.json``` file with the exact filename created by the instance of the ```cnn``` class exists: if not, run everything again from the start, including the code-test training).

In [14]:
# from utils import print_header
from multiclass_models import load_dict

# Let's look at the training and validation loss for each epoch:
instance = resnet_demo

print_header("Code test")
print("Loss dictionary (training and validation loss from each epoch)".upper())
to_print = ["- Key \'val1_loss\' refers to validation set in which one image per lesion is used.",
         "Key \'val_a_los\' refers to validation set in which more than one image per lesion is potentially used."]
print("\n- ".join(to_print))

try:
    if instance.epoch_losses is not None:
        display(instance.epoch_losses)
    else:
        retrieved_epoch_losses = load_dict(instance.model_dir, instance._filename + "_epoch_losses")
        display(retrieved_epoch_losses)
except:
    retrieved_epoch_losses = load_dict(instance.model_dir, instance._filename + "_epoch_losses")
    display(retrieved_epoch_losses)


CODE TEST

LOSS DICTIONARY (TRAINING AND VALIDATION LOSS FROM EACH EPOCH)
- Key 'val1_loss' refers to validation set in which one image per lesion is used.
- Key 'val_a_los' refers to validation set in which more than one image per lesion is potentially used.


{'train_loss': array([0.79855193]),
 'val1_loss': array([6.1903882]),
 'val_a_loss': array([8.67319012])}

In [15]:
# from utils import print_header
from multiclass_models import load_dict

# Let's look at the training and validation loss for each epoch:
instance = resnet_demo

print_header("Code test")
print("If we didn't just train the model and the epoch losses dictionary is not in memory, we can load it from a file during training.")

display(load_dict(instance.model_dir, instance._filename + "_epoch_losses"))


CODE TEST

If we didn't just train the model and the epoch losses dictionary is not in memory, we can load it from a file during training.


{'train_loss': [0.798551931977272],
 'val1_loss': [6.190388202667236],
 'val_a_loss': [8.673190116882324]}

<a id='model_architecture'></a>
## Model architecture and state dictionary
↑↑ [Contents](#contents) ↑ [Small sample for testing code](#small_sample) ↓ [Inference: getting probabilities](#inference1)

We can examine our model architecture and parameters before/after training. Our ```cnn``` class of course modifies the fully connected (fc) layer of ResNet-18 or EfficientNetB0 to have the appropriate number of output features based on the number of classes. After training, it saves the model parameters in a ```.pth``` file, with a name given by the ```._filename``` attribute of the instance of the ```cnn``` class. If we want to load the trained model for making inferences or further fine-tuning, we need to call an instance of e.g. resnet18, modify the fc layer, and load our ```.pth``` file, as in the code cell below.

In [16]:
# from utils import print_header
import torch
import torch.nn as nn

instance = resnet_demo

if instance.state_dict is None:
    print("Loading model and state dictionary from file\n".upper())
    file_path_pth = instance.model_dir.joinpath(instance._filename + ".pth")

    # model = models.efficientnet_b0()  
    model = models.resnet18()  
    if isinstance(model,models.ResNet):
        num_ftrs = model.fc.in_features
        model.fc = nn.Linear(num_ftrs, len(instance.label_codes))
    elif isinstance(model,models.EfficientNet):
        num_ftrs = model.classifier[1].in_features
        model.classifier[1] = nn.Linear(num_ftrs, len(instance.label_codes))

    # Load the state dictionary into the model
    state_dict = torch.load(file_path_pth)
    model.load_state_dict(state_dict)

    instance.model = model
    instance.state_dict = state_dict
    
print_header("Code test: model architecture")
print(f"Note: \'out_features = {len(instance.label_codes)}\' at the end".upper())
display(instance.model)


CODE TEST: MODEL ARCHITECTURE

NOTE: 'OUT_FEATURES = 5' AT THE END


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [17]:
print_header("Code test: model state dictionary")
print(str(instance.state_dict)[:1000], "\n ... LOTS OF PARAMETERS ...\n", str(instance.state_dict)[-1000:])


CODE TEST: MODEL STATE DICTIONARY

OrderedDict([('conv1.weight', tensor([[[[-1.1087e-02, -6.7593e-03, -2.4926e-03,  ...,  5.6792e-02,
            1.7313e-02, -1.2530e-02],
          [ 1.0138e-02,  8.7479e-03, -1.1064e-01,  ..., -2.7141e-01,
           -1.2938e-01,  3.3370e-03],
          [-7.8421e-03,  5.8271e-02,  2.9468e-01,  ...,  5.1975e-01,
            2.5613e-01,  6.3238e-02],
          ...,
          [-2.8438e-02,  1.5150e-02,  7.1715e-02,  ..., -3.3331e-01,
           -4.2086e-01, -2.5812e-01],
          [ 2.9627e-02,  4.0148e-02,  6.2112e-02,  ...,  4.1335e-01,
            3.9321e-01,  1.6578e-01],
          [-1.4571e-02, -4.3363e-03, -2.4791e-02,  ..., -1.5096e-01,
           -8.2610e-02, -6.0161e-03]],

         [[-1.1167e-02, -2.6290e-02, -3.4528e-02,  ...,  3.2922e-02,
            9.4665e-04, -2.5523e-02],
          [ 4.5769e-02,  3.4000e-02, -1.0401e-01,  ..., -3.1181e-01,
           -1.6012e-01, -1.1553e-03],
          [-5.1002e-04,  9.8475e-02,  4.0238e-01,  ...,  7.08

<a id='inference1'></a>
## Inference: getting probabilities
↑↑ [Contents](#contents) ↑ [Model architecture and state dictionary](#model_architecture) ↓ [Inference: combining probabilities](#inference2)

Once we have trained a model, we will want to evaluate it by making inferences on lesions/images from our validation set(s), which the model has not been trained on. The ```get_probabilities``` function (in our ```multiclass_models``` module), is the first step towards classifying an image.

In [65]:
# from utils import print_header
from multiclass_models import get_probabilities

instance = resnet_demo

# model = models.efficientnet_b0()  
# model = models.resnet18() 
# if isinstance(model,models.ResNet):
#     num_ftrs = model.fc.in_features
#     model.fc = nn.Linear(num_ftrs, len(instance.label_codes))
# elif isinstance(model,models.EfficientNet):
#     num_ftrs = model.classifier[1].in_features
#     model.classifier[1] = nn.Linear(num_ftrs, len(instance.label_codes))

instance.df_probabilities_val1 = get_probabilities(df=instance.df_val1,
                                                   data_dir=instance.data_dir,
                                                   model_dir=instance.model_dir,
                                                   model=instance.model,
                                                   filename=instance._filename,
                                                   label_codes=instance.label_codes,
                                                   transform=instance.transform,
                                                   batch_size=instance.batch_size,
                                                   Print=False,
                                                   save_as=instance._filename + "_val1",)

instance.df_probabilities_val_a = get_probabilities(df=instance.df_val_a,
                                                    data_dir=instance.data_dir,
                                                    model_dir=instance.model_dir,
                                                    model=instance.model,
                                                    filename=instance._filename,
                                                    label_codes=instance.label_codes,
                                                    transform=instance.transform,
                                                    batch_size=instance.batch_size,
                                                    Print=False,
                                                    save_as=instance._filename + "_val_a",)

print_header("Code test: probabilities, validation set (one image per lesion)")
display_columns = ['lesion_id', 'image_id', 'dx'] + [col for col in instance.df_probabilities_val1.columns if col.startswith('prob')]
display(instance.df_probabilities_val1[display_columns].head())

print_header("Code test: probabilities, validation set (more than one image per lesion)")
display(instance.df_probabilities_val_a[display_columns].head())

Saving probabilities: D:\projects\skin-lesion-classification\models\rn18_ta_bal_test_1e_test_00_val1_probabilities.csv
Saving probabilities: D:\projects\skin-lesion-classification\models\rn18_ta_bal_test_1e_test_00_val_a_probabilities.csv

CODE TEST: PROBABILITIES, VALIDATION SET (ONE IMAGE PER LESION)



Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel
0,HAM_0003218,ISIC_0033305,bkl,0.9572,0.008803,0.024734,0.005357,0.003905761
1,HAM_0003218,ISIC_0033305,bkl,0.857012,0.002237,0.138892,0.001512,0.0003466143
2,HAM_0003218,ISIC_0033305,bkl,0.939657,0.020345,0.023048,0.00929,0.007660649
3,HAM_0000983,ISIC_0033490,bkl,0.160004,0.00034,0.839545,0.000103,8.625674e-06
4,HAM_0000983,ISIC_0033490,bkl,0.051505,3.3e-05,0.948446,1.6e-05,6.859188e-07



CODE TEST: PROBABILITIES, VALIDATION SET (MORE THAN ONE IMAGE PER LESION)



Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel
0,HAM_0004406,ISIC_0034125,bkl,1.0,1.378271e-11,2.201661e-10,7.232391e-10,7.420231e-10
1,HAM_0004406,ISIC_0033060,bkl,0.866039,0.0005915513,0.1323435,0.0009058156,0.0001198957
2,HAM_0004406,ISIC_0033060,bkl,0.997122,0.000177621,0.002428613,0.0002323076,3.960762e-05
3,HAM_0001200,ISIC_0025716,bkl,0.006237,0.003132195,0.9886382,0.001502147,0.0004908901
4,HAM_0001200,ISIC_0025716,bkl,0.699566,0.05104385,0.2367935,0.01172803,0.000868846


We can also, naturally, make predictions for any lesion/image from our original dataset, as well as any image from outside our dataset (even, say, a picture of a cat). We explain in the output of the code cell below.

In [66]:
# from utils import print_header
from multiclass_models import df_from_ids, get_probabilities     

instance = resnet_demo
df = instance.df

print_header("Code test: prediction on individual images or lesions")

to_print = ["- We can make predictions for individual images or lesions.",
            "We only require a dataframe with an \'image_id\' column.",
            "Given the filename of an image, the df_from_ids function will construct such a dataframe.",
            "We can then feed this dataframe into the get_probabilities function.",
            "Here is the result of passing \'filenames = [\'ISIC_0033305\',\'ISIC_0025661\']\' to df_from_ids:",
            "- And here are the corresponding probabilities:",
            "- If we have a lesion_id with associated image_ids, we can also construct a small dataframe.",
            "The df_from_ids function also takes arguments for the number of predictions we want to make for a given image/lesion.",
            "Here is the result of passing \'lesion_ids = \'HAM_0000118\', \'multiplicity = 3\', and \'one_img_per_lesion = False\' to df_from_ids:",
            "- We have filtered all columns except for lesion_id and image_id (knowing the diagnosis defeats the purpose).",
            "- Here are the associated probabilities:",
            "- Notice that the probabilities may vary with each execution of a prediction.",
            "This is because a random transformation may be applied to each image before our model makes a prediction on it."]

df_2img = df_from_ids(filenames=['ISIC_0033305','ISIC_0025661'], # can be a string or a list of strings
                       multiplicity=None,
                       lesion_ids=None,
                       df=df,
                       one_img_per_lesion=None,)

print("\n- ".join(to_print[:5]))

display(df_2img)

df_2img_prob = get_probabilities(df=df_2img,
                  data_dir=instance.data_dir,
                  model_dir=instance.model_dir,
                  model=instance.model,
                  filename=instance._filename,
                  label_codes=instance.label_codes,
                  transform=instance.transform,
                  batch_size=instance.batch_size,
                  Print=False,
                  save_as=None,)   

print(to_print[5])
display(df_2img_prob)

print("\n- ".join(to_print[6:9]))

df_1les = df_from_ids(filenames=None,
                       multiplicity=3,
                       lesion_ids='HAM_0000118', # can be a string or a list of strings
                       df=df,
                       one_img_per_lesion=False,)

display_columns = ['lesion_id', 'image_id'] 
display(df_1les[display_columns])

print("\n- ".join(to_print[9:10]))

df_1les_prob = get_probabilities(df=df_1les,
                  data_dir=instance.data_dir,
                  model_dir=instance.model_dir,
                  model=instance.model,
                  filename=instance._filename,
                  label_codes=instance.label_codes,
                  transform=instance.transform,
                  batch_size=instance.batch_size,
                  Print=False,
                  save_as=None,)   

print(to_print[10])
display_columns = ['lesion_id', 'image_id'] + [col for col in df_1les_prob if col.startswith('prob')]
display(df_1les_prob[display_columns])

print("\n- ".join(to_print[11:]))


CODE TEST: PREDICTION ON INDIVIDUAL IMAGES OR LESIONS

- We can make predictions for individual images or lesions.
- We only require a dataframe with an 'image_id' column.
- Given the filename of an image, the df_from_ids function will construct such a dataframe.
- We can then feed this dataframe into the get_probabilities function.
- Here is the result of passing 'filenames = ['ISIC_0033305','ISIC_0025661']' to df_from_ids:


Unnamed: 0,image_id
0,ISIC_0033305
1,ISIC_0025661


- And here are the corresponding probabilities:


Unnamed: 0,image_id,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel
0,ISIC_0033305,0.965367,7.1e-05,0.034221,0.0003,4.1e-05
1,ISIC_0025661,0.584027,0.408331,0.000895,0.002948,0.003799


- If we have a lesion_id with associated image_ids, we can also construct a small dataframe.
- The df_from_ids function also takes arguments for the number of predictions we want to make for a given image/lesion.
- Here is the result of passing 'lesion_ids = 'HAM_0000118', 'multiplicity = 3', and 'one_img_per_lesion = False' to df_from_ids:


Unnamed: 0,lesion_id,image_id
0,HAM_0000118,ISIC_0027419
1,HAM_0000118,ISIC_0027419
2,HAM_0000118,ISIC_0025030


- We have filtered all columns except for lesion_id and image_id (knowing the diagnosis defeats the purpose).
- Here are the associated probabilities:


Unnamed: 0,lesion_id,image_id,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel
0,HAM_0000118,ISIC_0027419,0.889041,0.11074,8.4e-05,7.6e-05,5.9e-05
1,HAM_0000118,ISIC_0027419,0.750358,0.249013,3.4e-05,0.000402,0.000192
2,HAM_0000118,ISIC_0025030,0.372584,0.618265,0.003361,0.004041,0.001749


- Notice that the probabilities may vary with each execution of a prediction.
- This is because a random transformation may be applied to each image before our model makes a prediction on it.


In [67]:
print("Here's another example of how we'd use df_from_ids on an arbitrary image from outside our dataset.")

display(df_from_ids(filenames='image_from_somewhere', multiplicity=3,))

print("If it were an actual image in the data_dir folder, we could feed this dataframe into the get_probabilities function.")

Here's another example of how we'd use df_from_ids on an arbitrary image from outside our dataset.


Unnamed: 0,image_id
0,image_from_somewhere
1,image_from_somewhere
2,image_from_somewhere


If it were an actual image in the data_dir folder, we could feed this dataframe into the get_probabilities function.


<a id='inference2'></a>
## Inference: combining probabilities
↑↑ [Contents](#contents) ↑ [Inference: getting probabilities](#inference1) ↓ [Inference: combining predictions](#inference3)

Once we have probabilities corresponding to an image, we can combine them to make a prediction. There are various ways we could combine the probabilities, if we have multiple sets of probabilities for a given image or lesion. This is done by the ```aggregate_probabilities``` function of our ```multiclass_models``` module. Note the ```method``` argument that can be passed to the function: it specifies how we want to combine multiple probabilities for a given image/lesion. See the output of the code cell below for a detailed example.

In [70]:
# from utils import print_header
from multiclass_models import aggregate_probabilities

print_header("Code test: combining probabilities")

method = { 'max' : ['mel'], 'min' : ['nv'], 'mean' : ['akiec', 'bcc'] }

print("- We can combine multiple probabilities for a single lesion, if available, by taking the maximum, minimum, or mean.")
print("- Here is the original dataframe:")
display(df_1les_prob)
print("- Here is the dataframe with max mel probability, minimum nv probability, mean akiec and bcc prob\'s, and \'other\' left alone:")
display(aggregate_probabilities(df_1les_prob, method=method))

print("- Here's another example (no lesion_id in this case):")
display(df_2img_prob)
display(aggregate_probabilities(df_2img_prob))


CODE TEST: COMBINING PROBABILITIES

- We can combine multiple probabilities for a single lesion, if available, by taking the maximum, minimum, or mean.
- Here is the original dataframe:


Unnamed: 0,lesion_id,num_images,image_id,dx,label,dx_type,age,sex,localization,set,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel
0,HAM_0000118,2,ISIC_0027419,bkl,0,histo,80.0,male,scalp,ta,0.889041,0.11074,8.4e-05,7.6e-05,5.9e-05
1,HAM_0000118,2,ISIC_0027419,bkl,0,histo,80.0,male,scalp,ta,0.750358,0.249013,3.4e-05,0.000402,0.000192
2,HAM_0000118,2,ISIC_0025030,bkl,0,histo,80.0,male,scalp,t1,0.372584,0.618265,0.003361,0.004041,0.001749


- Here is the dataframe with max mel probability, minimum nv probability, mean akiec and bcc prob's, and 'other' left alone:


Unnamed: 0,lesion_id,num_images,image_id,dx,label,dx_type,age,sex,localization,set,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel
0,HAM_0000118,2,ISIC_0027419,bkl,0,histo,80.0,male,scalp,ta,0.889041,0.326006,0.00116,7.6e-05,0.001749
1,HAM_0000118,2,ISIC_0027419,bkl,0,histo,80.0,male,scalp,ta,0.750358,0.326006,0.00116,7.6e-05,0.001749
2,HAM_0000118,2,ISIC_0025030,bkl,0,histo,80.0,male,scalp,t1,0.372584,0.326006,0.00116,7.6e-05,0.001749


- Here's another example (no lesion_id in this case):


Unnamed: 0,image_id,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel
0,ISIC_0033305,0.965367,7.1e-05,0.034221,0.0003,4.1e-05
1,ISIC_0025661,0.584027,0.408331,0.000895,0.002948,0.003799


Unnamed: 0,image_id,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel
0,ISIC_0033305,0.965367,7.1e-05,0.034221,0.0003,4.1e-05
1,ISIC_0025661,0.584027,0.408331,0.000895,0.002948,0.003799


<a id='inference3'></a>
## Inference: combining predictions
↑↑ [Contents](#contents) ↑ [Inference: combining probabilities](#inference2) ↓ [Evaluation](#evaluation)

Once we have combined possibly multiple probabilities corresponding to a single image/lesion into a single one (via taking the mean, minimum, or maximum), we can make a prediction for the lesion's class. But again, there are options here. By default, the ```final_prediction``` function (```multiclass_models```) below predicts whatever class corresponds to the maximum probability (e.g. if the probability for ```nv``` is 0.8 and this is larger than the probability for the other lesion classes, then ```final_prediction``` will yield ```nv``` (or the encoding for ```nv```). However, we can stipulate that the prediction be given as ```mel``` if the probability for ```mel``` is, say, greater than 0.4. If that's not the case, we can then stipulate that ```bcc``` is predicted if its probability exceeds 0.4, etc. Likewise, we can make it harder to obtain a ```nv``` prediction by setting a threshold of at least 0.6 for that class. This is the purpose of the ordered dictionaries ```threshold_dict_help``` and ```threshold_dict_hinder``` arguments that are passed to ```final_prediction```. 

And even then, there are still further choices we can make. Once we have a prediction for each image corresponding to a lesion, we can combine the predictions according to various 'voting' methods: we might want to declare ```mel``` as soon as one of the predictions is for that class. The default is whichever class is predicted most often is the final prediction (and if there's a tie, break it by a random selection).

Detailed examples are explained in the output of the code cell below.

In [71]:
# from utils import print_header
from collections import OrderedDict
from typing import Dict, List
from multiclass_models import final_prediction

print_header("Code test: making predictions")

raw_probabilities_df: pd.DataFrame = df_1les_prob 
raw_probabilities_df_a: pd.DataFrame = instance.df_probabilities_val_a
aggregate_method: Union[None, Dict[str, List[str]]] = { 'max' : ['mel'], 'min' : ['nv'], 'mean' : ['akiec', 'bcc']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = OrderedDict([('bcc',0.01), ('mel',0.4)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = OrderedDict([('nv',0.6)])    
votes_to_win_dict: Union[None, OrderedDict[str, int]] = OrderedDict([('mel',1)])
label_codes: Dict[int, str] = instance.label_codes
prefix: Union[None, str] = 'prob_'
    
to_print = ["- There are various ways we can form a prediction based on the combined probabilities.",
            "For instance, we can immediately predict mel if the probability of mel is greater than 0.4.",
            "If that's not the case, we can continue down a list, e.g. predicting bcc if probability of bcc is at least 0.45.",
            "We can also require, e.g., the probability of nv to be at least 0.6 before predicting nv.",
            "Once we have predicted classes for each image of a lesion, we can then combined the predictions in various ways.",
            "For instance, we might want to make a final prediction of mel if that is a prediction for at least one of the images.",
            "If not, we might again go through an ordered list, voting e.g. for bcc if at least one image is predicted as bcc.",
            "It doesn't have to be \'at least one prediction\': we could say \'if at least two predictions are for bcc, then...\'.",
            "If we reach the end of this list, we'd proceed to select the most popular prediction as the final one for the lesion.",
            "We don't have to specify a priority list at all: in that case we just take the most popular prediction as the final one.",
            "We could do similar if we had no lesion_id but only image_ids repeated a number of times.",
            "Below, by way of illustration, we've stated that we want to predict bcc if the probability for bcc is at least 0.01.",
            "But then, we've stated that we want to make a final prediction of mel if at least one prediction is for mel.",
            f"The label codes are as follows: {instance.label_codes}",            
            ]

print("\n- ".join(to_print))

display(final_prediction(raw_probabilities_df=raw_probabilities_df, 
                 threshold_dict_help=threshold_dict_help,
                 threshold_dict_hinder=threshold_dict_hinder,
                 votes_to_win_dict=votes_to_win_dict,
                 label_codes=label_codes,))


CODE TEST: MAKING PREDICTIONS

- There are various ways we can form a prediction based on the combined probabilities.
- For instance, we can immediately predict mel if the probability of mel is greater than 0.4.
- If that's not the case, we can continue down a list, e.g. predicting bcc if probability of bcc is at least 0.45.
- We can also require, e.g., the probability of nv to be at least 0.6 before predicting nv.
- Once we have predicted classes for each image of a lesion, we can then combined the predictions in various ways.
- For instance, we might want to make a final prediction of mel if that is a prediction for at least one of the images.
- If not, we might again go through an ordered list, voting e.g. for bcc if at least one image is predicted as bcc.
- It doesn't have to be 'at least one prediction': we could say 'if at least two predictions are for bcc, then...'.
- If we reach the end of this list, we'd proceed to select the most popular prediction as the final one for the l

Unnamed: 0,lesion_id,num_images,image_id,dx,label,dx_type,age,sex,localization,set,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel,pred,pred_final
0,HAM_0000118,2,ISIC_0027419,bkl,0,histo,80.0,male,scalp,ta,0.889041,0.11074,8.4e-05,7.6e-05,5.9e-05,0,0
1,HAM_0000118,2,ISIC_0027419,bkl,0,histo,80.0,male,scalp,ta,0.750358,0.249013,3.4e-05,0.000402,0.000192,0,0
2,HAM_0000118,2,ISIC_0025030,bkl,0,histo,80.0,male,scalp,t1,0.372584,0.618265,0.003361,0.004041,0.001749,1,0


We apply the ```final_prediction``` function to the entirely of both variants of our validation set. Both validation sets will contain multiple images for each lesion if ```val_expansion_factor``` is an integer greater than one. Otherwise, the features of ```final_prediction``` concerning combining probabilities and predictions will only be relevant to the all-images-per-lesion version of the validation set.

In [72]:
# Let's now apply this to our code test validation sets.
# We can reload the probabilities dataframes from csv files saved earlier, if they are not already in memory
file_path_val1 = instance.model_dir.joinpath(instance._filename + "_val1_probabilities.csv")
file_path_val_a = instance.model_dir.joinpath(instance._filename + "_val_a_probabilities.csv")
instance.df_val1_probabilities = pd.read_csv(file_path_val1,index_col=0)
instance.df_val_a_probabilities = pd.read_csv(file_path_val_a,index_col=0)

In [73]:
# from utils import print_header
from multiclass_models import final_prediction
from collections import OrderedDict
from typing import Dict, List

instance = resnet_demo

raw_probabilities_df1: pd.DataFrame = instance.df_probabilities_val1 
raw_probabilities_df_a: pd.DataFrame = instance.df_probabilities_val_a
aggregate_method: Union[None, Dict[str, List[str]]] = { 'max' : ['mel'], 'min' : ['nv'], 'mean' : ['bcc']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = OrderedDict([('mel',0.4), ('bcc',0.45)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = OrderedDict([('nv',0.6)])    
votes_to_win_dict: Union[None, OrderedDict[str, int]] = OrderedDict([('mel',1)])
label_codes: Dict[int, str] = instance.label_codes
prefix: Union[None, str] = 'prob_'

print_header("Code test: combining probabilities and making predictions")

print(f"Validation set: one image per lesion, repeated {instance.source.val_expansion_factor} times".upper())

instance.df_pred_val1 = final_prediction(raw_probabilities_df=raw_probabilities_df1, 
                                         threshold_dict_help=threshold_dict_help,
                                         threshold_dict_hinder=threshold_dict_hinder,
                                         votes_to_win_dict=votes_to_win_dict,
                                         label_codes=label_codes,)

display(instance.df_pred_val1.head())

print(f"\nValidation set: {instance.source.val_expansion_factor} images per lesion, using all images before repeating".upper())

instance.df_pred_val_a = final_prediction(raw_probabilities_df=raw_probabilities_df_a, 
                                         threshold_dict_help=threshold_dict_help,
                                         threshold_dict_hinder=threshold_dict_hinder,
                                         votes_to_win_dict=votes_to_win_dict,
                                         label_codes=label_codes,)

display(instance.df_pred_val_a.head())

print("\nNow we simply drop duplicates (lesion_id)...".upper())

display(instance.df_pred_val1.drop_duplicates(subset='lesion_id')[['lesion_id','label','pred_final']])

display(instance.df_pred_val_a.drop_duplicates(subset='lesion_id')[['lesion_id','label','pred_final']])


CODE TEST: COMBINING PROBABILITIES AND MAKING PREDICTIONS

VALIDATION SET: ONE IMAGE PER LESION, REPEATED 3 TIMES


Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel,pred,pred_final
0,HAM_0003218,3,1,ISIC_0033305,3,bkl,0,consensus,75.0,male,back,v1,0.9572,0.008803,0.024734,0.005357,0.003905761,0,0
1,HAM_0003218,3,1,ISIC_0033305,3,bkl,0,consensus,75.0,male,back,v1,0.857012,0.002237,0.138892,0.001512,0.0003466143,0,0
2,HAM_0003218,3,1,ISIC_0033305,3,bkl,0,consensus,75.0,male,back,v1,0.939657,0.020345,0.023048,0.00929,0.007660649,0,0
3,HAM_0000983,3,1,ISIC_0033490,3,bkl,0,consensus,,unknown,unknown,v1,0.160004,0.00034,0.839545,0.000103,8.625674e-06,2,2
4,HAM_0000983,3,1,ISIC_0033490,3,bkl,0,consensus,,unknown,unknown,v1,0.051505,3.3e-05,0.948446,1.6e-05,6.859188e-07,2,2



VALIDATION SET: 3 IMAGES PER LESION, USING ALL IMAGES BEFORE REPEATING


Unnamed: 0,lesion_id,lesion_mult,num_images,image_id,img_mult,dx,label,dx_type,age,sex,localization,set,prob_other,prob_akiec,prob_bcc,prob_nv,prob_mel,pred,pred_final
0,HAM_0004406,3,2,ISIC_0034125,1,bkl,0,histo,80.0,male,back,va,1.0,1.378271e-11,2.201661e-10,7.232391e-10,7.420231e-10,0,0
1,HAM_0004406,3,2,ISIC_0033060,2,bkl,0,histo,80.0,male,back,v1,0.866039,0.0005915513,0.1323435,0.0009058156,0.0001198957,0,0
2,HAM_0004406,3,2,ISIC_0033060,2,bkl,0,histo,80.0,male,back,v1,0.997122,0.000177621,0.002428613,0.0002323076,3.960762e-05,0,0
3,HAM_0001200,3,1,ISIC_0025716,3,bkl,0,histo,85.0,male,face,v1,0.006237,0.003132195,0.9886382,0.001502147,0.0004908901,2,2
4,HAM_0001200,3,1,ISIC_0025716,3,bkl,0,histo,85.0,male,face,v1,0.699566,0.05104385,0.2367935,0.01172803,0.000868846,0,2



NOW WE SIMPLY DROP DUPLICATES (LESION_ID)...


Unnamed: 0,lesion_id,label,pred_final
0,HAM_0003218,0,0
3,HAM_0000983,0,2
6,HAM_0003267,3,0
9,HAM_0006318,3,0
12,HAM_0005518,0,2
15,HAM_0005663,0,0
18,HAM_0001953,4,0
21,HAM_0002591,4,2
24,HAM_0005713,0,0
27,HAM_0007568,0,0


Unnamed: 0,lesion_id,label,pred_final
0,HAM_0004406,0,0
3,HAM_0001200,0,2
6,HAM_0001756,3,0
9,HAM_0006602,3,0
12,HAM_0007418,0,0
15,HAM_0004065,0,0
18,HAM_0006722,4,0
21,HAM_0000107,4,0
24,HAM_0000940,0,0
27,HAM_0007150,0,0


<a id='evaluation'></a>
## Evaluation
↑↑ [Contents](#contents) ↑ [Inference: combining predictions](#inference3) ↓ [Trivial models](#trivial_models)

Having a single final prediction for each lesion in our validation set, we can evaluate our model. We have some custom functions to display confusion matrices with class-wise precision and recall: such functions and other metric-related functions are to be found in our custom ```evaluation``` module. Details are explained, with examples, in the output of the code cell below. 

We have chosen balanced accuracy as our overall metric for evaluating model performance. However, we also display several other metrics, such as the Matthews correlation coefficient.

In [74]:
# from utils import print_header
from evaluation import weighted_average_f, confusion_matrix_with_metric

instance = resnet_demo
map_labels = instance.label_codes

target1 = instance.df_pred_val1.drop_duplicates(subset='lesion_id')['label'] 
prediction1 = instance.df_pred_val1.drop_duplicates(subset='lesion_id')['pred_final'] 

target_a = instance.df_pred_val_a.drop_duplicates(subset='lesion_id')['label']  
prediction_a = instance.df_pred_val_a.drop_duplicates(subset='lesion_id')['pred_final']  

txp1 = pd.crosstab(target1,prediction1,margins=True,dropna=False)
txp_a = pd.crosstab(target_a,prediction_a,margins=True,dropna=False)

beta = 1
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
weights = 1/instance.df_train['label'].value_counts(normalize=True).sort_index().values # None

instance.cm1 = confusion_matrix_with_metric(AxB=txp1,
                                            lst=None,
                                            full_pad=True,
                                            func=weighted_average_f,
                                            beta=beta,
                                            weights=weights,
                                            percentage=False,
                                            map_labels=map_labels)

instance.cm_a = confusion_matrix_with_metric(AxB=txp_a,
                                            lst=None,
                                            full_pad=True,
                                            func=weighted_average_f,
                                            beta=beta,
                                            weights=weights,
                                            percentage=False,
                                            map_labels=map_labels)

print_header("Code test: confusion matrix (validation set)")

to_print = ["- The overall evaluation metric would appear at the bottom right, if it were defined (this code test set is too small).",
            "It would be a class-wise weighted average fbeta score, beta and weights as specified (default values 1).",
            "One could also pass None, a float, or a different function to the func parameter in confusion_matrix_with_metric."]

print("\n- ".join(to_print))

print(f"\nOne image per lesion, repeated {instance.source.val_expansion_factor} times".upper())
display(instance.cm1.fillna('_'))

print(f"\n{instance.source.val_expansion_factor} images per lesion, using all available images before repeating".upper())
display(instance.cm_a.fillna('_'))


CODE TEST: CONFUSION MATRIX (VALIDATION SET)

- The overall evaluation metric would appear at the bottom right, if it were defined (this code test set is too small).
- It would be a class-wise weighted average fbeta score, beta and weights as specified (default values 1).
- One could also pass None, a float, or a different function to the func parameter in confusion_matrix_with_metric.

ONE IMAGE PER LESION, REPEATED 3 TIMES


predicted,other,akiec,bcc,nv,mel,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,4.0,0,2.0,0,0,6,0.666667
akiec,1.0,0,1.0,0,0,2,0.0
bcc,2.0,0,0.0,0,0,2,0.0
nv,2.0,0,0.0,0,0,2,0.0
mel,1.0,0,1.0,0,0,2,0.0
All,10.0,0,4.0,0,0,14,_
precision,0.4,_,0.0,_,_,_,_



3 IMAGES PER LESION, USING ALL AVAILABLE IMAGES BEFORE REPEATING


predicted,other,akiec,bcc,nv,mel,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,5.0,0,1.0,0,0,6,0.833333
akiec,2.0,0,0.0,0,0,2,0.0
bcc,2.0,0,0.0,0,0,2,0.0
nv,2.0,0,0.0,0,0,2,0.0
mel,2.0,0,0.0,0,0,2,0.0
All,13.0,0,1.0,0,0,14,_
precision,0.384615,_,0.0,_,_,_,_


In [75]:
# from utils import print_header
from evaluation import metric_dictionary
# import pandas as pd

instance = resnet_demo

target1 = instance.df_pred_val1.drop_duplicates(subset='lesion_id')['label'] 
prediction1 = instance.df_pred_val1.drop_duplicates(subset='lesion_id')['pred_final'] 
probabilities1 = instance.df_probabilities_val1.drop_duplicates(subset='lesion_id').filter(regex=r'^prob_')
agg_probabilities1 = instance.df_pred_val1.drop_duplicates(subset='lesion_id').filter(regex=r'^prob_') 

target_a = instance.df_pred_val_a.drop_duplicates(subset='lesion_id')['label']  
prediction_a = instance.df_pred_val_a.drop_duplicates(subset='lesion_id')['pred_final']  
probabilities_a = instance.df_probabilities_val_a.drop_duplicates(subset='lesion_id').filter(regex=r'^prob_')
agg_probabilities_a = instance.df_pred_val_a.drop_duplicates(subset='lesion_id').filter(regex=r'^prob_') 

beta = 1
# Weights inversely proportional to relative class size, giving more importance to smaller classes.
weights = 1/instance.df_train['label'].value_counts(normalize=True).sort_index().values # None

print_header("Code test: other metrics")

to_print = ["- ACC: accuracy",
            "BACC: balanced accuracy",
            "precision: macro-averaged precision (equal weight to each class)",
            "recal: macro-averaged recall (equal weight to each class)",
            "Fbeta: macro-averaged F_beta score (equal weight to each class)",
            "MCC: Matthews correlation coefficient",
            "ROC-AUC mac: macro-averaged ROC-AUC (equal weight to each class)",
            "ROC-AUC wt: weighted-average ROC-AUC (larger class -> more weight)",
            "ROC-AUC wt*: weighted-average ROC-AUC (larger class -> *less weight)",            
            ]

instance.metric_dict1 = metric_dictionary(target=target1, 
                                          prediction=prediction1, 
                                          probabilities=probabilities1)

instance.metric_dict_a = metric_dictionary(target=target_a, 
                                          prediction=prediction_a, 
                                          probabilities=probabilities_a)

print("\n- ".join(to_print))

print("\n One image per lesion".upper())
display(pd.DataFrame(instance.metric_dict1))

print("\n Possibly more than one image per lesion".upper())
display(pd.DataFrame(instance.metric_dict_a))


CODE TEST: OTHER METRICS

- ACC: accuracy
- BACC: balanced accuracy
- precision: macro-averaged precision (equal weight to each class)
- recal: macro-averaged recall (equal weight to each class)
- Fbeta: macro-averaged F_beta score (equal weight to each class)
- MCC: Matthews correlation coefficient
- ROC-AUC mac: macro-averaged ROC-AUC (equal weight to each class)
- ROC-AUC wt: weighted-average ROC-AUC (larger class -> more weight)
- ROC-AUC wt*: weighted-average ROC-AUC (larger class -> *less weight)

 ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.285714,0.133333,0.2,0.133333,0.086957,0.1,0.117647,-0.111803,0.525,0.505952,0.508333



 POSSIBLY MORE THAN ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.357143,0.166667,0.192308,0.166667,0.086207,0.105263,0.135135,-0.16343,0.5125,0.491071,0.508333
