# Model evaluation summary

<a id='contents'></a>
## Contents

* [Overview](#overview)
* [Trivial models](#trivial_models)
* [ResNet-18 default weights](#resnet-18)
* [ResNet-18 unfreeze all layers, train one image/lesion](#resnet-18b)
* [ResNet-18 unfreeze last layers only, train one image/lesion](#resnet-18c)
* [Balanced training set, one image per lesion](#balanced_training)
* [Balanced training set, all images per lesion, random crop](#balanced_training)
* [Balanced training set, all images per lesion, random crop plus color jitter](#balanced_training)
* [Binary classification: mel versus nv](#binary_classification:)

<a id='overview'></a>
## Overview
↑↑ [Contents](#contents) ↓ [Trivial models](#trivial_models)

Metadata processing steps are always the same: we partition the 7470 distinct lesions in a three-to-one training set to validation set ratio. The split is stratified, preserving relative frequencies of the five classes of lesion. When we balance the training set, we downsample ```nv``` and upsample the other four categories so that each class is represented by 2000 images. As the dataset contains multiple different images of some lesions, we can take advantage of this in an attempt to increase variety, if we choose, by using all images per lesion before any image is sampled again. We refer to this as using 'all images per lesion'. The alternative is to use 'one image per lesion' only (sampled multiple times). Similarly, in the validation set, we may use a model to obtain predictions for all images, and for the lesions for which we have multiple images, combine the probabilities into a single prediction for the lesion. Alternatively, we may simply obtain probabilities for one image per lesion. Lastly, we may 'expand' the validation set, by sampling each lesion three times: we can use all available images to reduce repetition, or we can use one image per lesion. (When we 'expand' the validation set, the idea is to apply a random transformation to each image, and combine the three predictions into a single prediction for the lesion.) 

See our [pipeline walkthrough with code demonstration](./01_pipeline_walkthrough_with_code_demonstration.ipynb) for more details.

Below, we summarize the performance of various models, comparing metrics for different choices such as balancing versus not balancing, unfreezing all layers versus unfreezing only the last few layers for fine-tuning, etc. 

In [1]:
# SETUP

import os
from pathlib import Path
import sys

# If we're using Google Colab, we set the environment variable to point to the relevant folder in our Google Drive:
if 'COLAB_GPU' in os.environ:
    from google.colab import drive
    drive.mount('/content/drive')
    os.environ['SKIN_LESION_CLASSIFICATION'] = '/content/drive/MyDrive/Colab Notebooks/skin-lesion-classification'

# Otherwise, we use the environment variable on our local system:
project_environment_variable = "SKIN_LESION_CLASSIFICATION"

# Path to the root directory of the project:
project_path = Path(os.environ.get(project_environment_variable))

# Relative path to /scripts (from where custom modules will be imported):
scripts_path = project_path.joinpath("scripts")

# Add this path to sys.path so that Python will look there for modules:
sys.path.append(str(scripts_path))

# Now import path_step from our custom utils module to create a dictionary to all subdirectories in our root directory:
from utils import path_setup
path = path_setup.subfolders(project_path, Print=False)

<a id='trivial_models'></a>
## Trivial models
↑↑ [Contents](#contents) ↑ [Overview](#overview) ↓ [ResNet-18 default weights](#resnet-18)

By trivial model, we mean one that classifies all lesions as belonging to one class. A model that performs no better than any trivial model is otiose.

The recall for that class will be perfect and the precision will be the relative size of that class in the validation set (which contains 1869 images). Among trivial models, precision will thus be highest in the case of predicting the majority class: ```nv``` in this case, of which there are 1351 in the validation set: 1351/1869 = 72.28%. Accuracy will be the same as precision, and balanced accuracy will be the reciprocal of the number of classes: 0.2 in this case. 

Even though the metrics are trivial to compute, let us produce the confusion matrices etc. from scratch in the code cells below.

In [2]:
from typing import Type, Union      # For type hints
from processing import process      # Custom module for processing metadata

data_dir: Path = path["images"]     # Path to directory containing metadata.csv file
csv_filename: str = "metadata.csv"  # The filename
    
tvr: int = 3              # Ratio of training set to validation set. See discussion below for explanation.
seed: int = 0             # Random seed for parts of the process where randomness is called for.
keep_first: bool = False  # If False, then, for each lesion, we choose a random image to assign to our training set. 
stratified: bool = True   # If True, we stratify classes so that the proportions remain as stable as possible after train/val split. 
                          # If False, the proportions will be roughly similar.

to_classify: Union[list, dict] = ["mel",   # These are the lesion types we are interested in classifying. 
                                  "bcc",   # Any missing ones will be grouped together as the 0-label class: no need to write "other" here.
                                  "akiec", # If 'other' is not desired, use restrict_to attribute above
                                  "nv",]   # Can also be a dictionary, like { 'malignant' : ['mel', 'bcc'], 'benign' : ['nv', 'bkl']}

In [3]:
# Create an instance of the process class with attribute values as above.
trivial = process(data_dir=data_dir,
               csv_filename=csv_filename,
               tvr=tvr,
               seed=seed,
               keep_first=keep_first,
               stratified=stratified,
               to_classify=to_classify,)

- Loaded file 'D:\projects\skin-lesion-classification\images\metadata.csv'.
- Inserted 'num_images' column in dataframe, to the right of 'lesion_id' column.
- Inserted 'label' column in dataframe, to the right of 'dx' column: 
  {'bkl': 0, 'df': 0, 'vasc': 0, 'akiec': 1, 'bcc': 2, 'mel': 3, 'nv': 4}
- Added 'set' column to dataframe, with values 't1', 'v1', 'ta', and 'va', to the right of 'localization' column.
- Basic, overall dataframe (pre-train/test split): self.df
- Training set (not balanced, all images per lesion): self.df_train
- Validation set (not expanded, one image per lesion): self.df_val1
- Validation set (not expanded, use all images of each lesion): self.df_val_a
- Small sample dataframes for code testing: self._df_train_code_test, self._df_val1_code_test, self._df_val_a_code_test


In [4]:
from utils import print_header
import pandas as pd
from multiclass_models import trivial_prediction, final_prediction
from evaluation import weighted_average_f, confusion_matrix_with_metric, metric_dictionary

y_train = trivial.df_train['label']
y_val1 = trivial.df_val1['label']
y_val_a = trivial.df_val_a['label']
label_codes = trivial.label_codes

for label, lesion in label_codes.items():

    _, prediction1, probabilities1 = trivial_prediction(y_train, 
                                                        num_preds=y_val1.shape[0],
                                                        label_codes=label_codes,
                                                        pos_label_code=label,)

    trivial.df_probabilities_val1 = trivial.df_val1.copy()

    for i, dx in label_codes.items():
        trivial.df_probabilities_val1['prob_' + dx] = probabilities1[:,i]

    raw_probabilities_df1 = trivial.df_probabilities_val1 

    trivial.df_pred_val1 = final_prediction(raw_probabilities_df=raw_probabilities_df1, 
                                             label_codes=label_codes,)   

    map_labels = label_codes

    target1 = trivial.df_pred_val1.drop_duplicates(subset='lesion_id')['label'] 
    prediction1 = trivial.df_pred_val1.drop_duplicates(subset='lesion_id')['pred_final'] 

    txp1 = pd.crosstab(target1,prediction1,margins=True,dropna=False)

    beta = 2
    # Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
    weights = 1/trivial.df_train['label'].value_counts(normalize=True).sort_index().values # None

    trivial.cm1 = confusion_matrix_with_metric(AxB=txp1,
                                                lst=None,
                                                full_pad=True,
                                                func=weighted_average_f,
                                                beta=beta,
                                                weights=weights,
                                                percentage=False,
                                                map_labels=map_labels)

    target1 = trivial.df_pred_val1.drop_duplicates(subset='lesion_id')['label'] 
    prediction1 = trivial.df_pred_val1.drop_duplicates(subset='lesion_id')['pred_final'] 
    probabilities1 = trivial.df_probabilities_val1.drop_duplicates(subset='lesion_id').filter(regex=r'^prob_')
    agg_probabilities1 = trivial.df_pred_val1.drop_duplicates(subset='lesion_id').filter(regex=r'^prob_') 

    trivial.metric_dict1 = metric_dictionary(target=target1, 
                                              prediction=prediction1, 
                                              probabilities=probabilities1)

    print_header(f"Trivial prediction: all labels {lesion}")

    display(trivial.cm1.fillna('_'))
    display(pd.DataFrame(trivial.metric_dict1))


TRIVIAL PREDICTION: ALL LABELS OTHER



predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,225.0,0,0,0,0,225,1.0
akiec,57.0,0,0,0,0,57,0.0
bcc,82.0,0,0,0,0,82,0.0
mel,154.0,0,0,0,0,154,0.0
nv,1351.0,0,0,0,0,1351,0.0
All,1869.0,0,0,0,0,1869,_
precision,0.120385,_,_,_,_,_,_


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.120385,0.2,0.120385,0.2,0.029217,0.04298,0.081257,0.0,0.5,0.5,0.5



TRIVIAL PREDICTION: ALL LABELS AKIEC



predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,0,225.0,0,0,0,225,0.0
akiec,0,57.0,0,0,0,57,1.0
bcc,0,82.0,0,0,0,82,0.0
mel,0,154.0,0,0,0,154,0.0
nv,0,1351.0,0,0,0,1351,0.0
All,0,1869.0,0,0,0,1869,_
precision,_,0.030498,_,_,_,_,_


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.030498,0.2,0.030498,0.2,0.007567,0.011838,0.027182,0.0,0.5,0.5,0.5



TRIVIAL PREDICTION: ALL LABELS BCC



predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,0,0,225.0,0,0,225,0.0
akiec,0,0,57.0,0,0,57,0.0
bcc,0,0,82.0,0,0,82,1.0
mel,0,0,154.0,0,0,154,0.0
nv,0,0,1351.0,0,0,1351,0.0
All,0,0,1869.0,0,0,1869,_
precision,_,_,0.043874,_,_,_,_


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.043874,0.2,0.043874,0.2,0.010849,0.016812,0.037324,0.0,0.5,0.5,0.5



TRIVIAL PREDICTION: ALL LABELS MEL



predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,0,0,0,225.0,0,225,0.0
akiec,0,0,0,57.0,0,57,0.0
bcc,0,0,0,82.0,0,82,0.0
mel,0,0,0,154.0,0,154,1.0
nv,0,0,0,1351.0,0,1351,0.0
All,0,0,0,1869.0,0,1869,_
precision,_,_,_,0.082397,_,_,_


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.082397,0.2,0.082397,0.2,0.020183,0.03045,0.061972,0.0,0.5,0.5,0.5



TRIVIAL PREDICTION: ALL LABELS NV



predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,0,0,0,0,225.0,225,0.0
akiec,0,0,0,0,57.0,57,0.0
bcc,0,0,0,0,82.0,82,0.0
mel,0,0,0,0,154.0,154,0.0
nv,0,0,0,0,1351.0,1351,1.0
All,0,0,0,0,1869.0,1869,_
precision,_,_,_,_,0.722846,_,_


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.722846,0.2,0.722846,0.2,0.153053,0.167826,0.185756,0.0,0.5,0.5,0.5


<a id='resnet-18'></a>
## ResNet-18 default weights
↑↑ [Contents](#contents) ↑ [Trivial models](#trivial_models) ↓ [ResNet-18 unfreeze all layers, train one image/lesion](#resnet-18b)

We adapted a ResNet-18 model for our specific classification task by adjusting its output layer to match the number of classes in the problem (five in this case). Out of curiosity more than anything, we evaluated the model with its default weights, i.e. pre-trained on the ImageNet dataset, without any fine-tuning or training on our dataset. As it turns out, the probabilities that arise are always highest for the ```mel``` class, so this is just equivalent to the trivial prediction in which all lesions are classified as ```mel```.

In [56]:
from collections import OrderedDict
from typing import Union, Dict, List
from multiclass_models import final_prediction
import numpy as np

from evaluation import print_model_evaluation

model_name: Union[None, str] = "ResNet-18 default weights"

file_path1: Union[None,Path] = path['models'].joinpath("rn18_t1_10e_defaults_00_val1_probabilities.csv")
file_path_a: Union[None,Path] = path['models'].joinpath("rn18_t1_10e_defaults_00_val_a_probabilities.csv")

aggregate_method: Union[None, Dict[str, List[str]]] = None# { 'max' : ['mel', 'bcc', 'akiec'], 'min' : ['nv'], 'mean' : ['other']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = None#OrderedDict([('mel',0.4), ('bcc', 0.4), ('akiec', 0.4)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = None#OrderedDict([('nv',0.6)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = None #OrderedDict([('mel',1), ('bcc',1), ('akiec',1)])
label_codes: Dict[int, str] = {0: 'other', 1: 'akiec', 2: 'bcc', 3: 'mel', 4: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/df_train['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([ 7.42063492, 29.92      , 19.47916667,  9.00120337,  1.49390853])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


RESNET-18 DEFAULT WEIGHTS: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 1869 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0025661,bkl,0.033951,0.213071,0.017842,0.631357,0.103778
1,HAM_0001466,ISIC_0027850,bkl,0.024454,0.178248,0.011521,0.693197,0.09258
2,HAM_0002761,ISIC_0029068,bkl,0.027908,0.197791,0.014437,0.671389,0.088475
3,HAM_0004234,ISIC_0029396,bkl,0.04419,0.230373,0.025773,0.586212,0.113452
4,HAM_0001949,ISIC_0025767,bkl,0.033891,0.218806,0.017048,0.636273,0.093982



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 2535 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0026769,bkl,0.025068,0.187141,0.012176,0.69197,0.083645
1,HAM_0002730,ISIC_0025661,bkl,0.033951,0.213071,0.017842,0.631357,0.103778
2,HAM_0001466,ISIC_0031633,bkl,0.023868,0.18089,0.012229,0.703247,0.079767
3,HAM_0001466,ISIC_0027850,bkl,0.024454,0.178248,0.011521,0.693197,0.09258
4,HAM_0002761,ISIC_0029176,bkl,0.029747,0.20106,0.01666,0.661379,0.091154



RESNET-18 DEFAULT WEIGHTS: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0025661,bkl,0.033951,0.213071,0.017842,0.631357,0.103778,3,3
1,HAM_0001466,ISIC_0027850,bkl,0.024454,0.178248,0.011521,0.693197,0.09258,3,3
2,HAM_0002761,ISIC_0029068,bkl,0.027908,0.197791,0.014437,0.671389,0.088475,3,3
3,HAM_0004234,ISIC_0029396,bkl,0.04419,0.230373,0.025773,0.586212,0.113452,3,3
4,HAM_0001949,ISIC_0025767,bkl,0.033891,0.218806,0.017048,0.636273,0.093982,3,3


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0026769,bkl,0.025068,0.187141,0.012176,0.69197,0.083645,3,3
1,HAM_0002730,ISIC_0025661,bkl,0.033951,0.213071,0.017842,0.631357,0.103778,3,3
2,HAM_0001466,ISIC_0031633,bkl,0.023868,0.18089,0.012229,0.703247,0.079767,3,3
3,HAM_0001466,ISIC_0027850,bkl,0.024454,0.178248,0.011521,0.693197,0.09258,3,3
4,HAM_0002761,ISIC_0029176,bkl,0.029747,0.20106,0.01666,0.661379,0.091154,3,3



RESNET-18 DEFAULT WEIGHTS: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,0,0,0,225.0,0,225,0.0
akiec,0,0,0,57.0,0,57,0.0
bcc,0,0,0,82.0,0,82,0.0
mel,0,0,0,154.0,0,154,1.0
nv,0,0,0,1351.0,0,1351,0.0
All,0,0,0,1869.0,0,1869,_
precision,_,_,_,0.082397,_,_,_


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,0,0,0,225.0,0,225,0.0
akiec,0,0,0,57.0,0,57,0.0
bcc,0,0,0,82.0,0,82,0.0
mel,0,0,0,154.0,0,154,1.0
nv,0,0,0,1351.0,0,1351,0.0
All,0,0,0,1869.0,0,1869,_
precision,_,_,_,0.082397,_,_,_



RESNET-18 DEFAULT WEIGHTS: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.082397,0.2,0.082397,0.2,0.020183,0.03045,0.061972,0.0,0.455371,0.483577,0.471899



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.082397,0.2,0.082397,0.2,0.020183,0.03045,0.061972,0.0,0.451354,0.483768,0.469588


<a id='resnet-18b'></a>
## ResNet-18 unfreeze all layers, train one image/lesion
↑↑ [Contents](#contents) ↑ [ResNet-18 default weights](#resnet-18) ↓ [ResNet-18 unfreeze last layers only, train one image/lesion](#resnet-18c)

For our first real, non-trivial model, we unfroze all layers of ResNet-18, and trained it on our training set with one image per lesion. We did not apply any data augmentation, other than re-sizing images to 224x224. We trained for 10 epochs.

In [106]:
loss_dict = {"train_loss": [0.9403468154916316, 0.8124544766137816, 0.8684350887276444, 0.8011271460987204, 0.7148915472948416, 0.6817877910957164, 0.6707390202156437, 0.6422293554086619, 0.6390848159525458, 0.6242188617880774], "val1_loss": [1.0525192688140323, 0.9003291478601553, 0.8269837388800363, 0.7788394413793743, 0.7362728248965942, 0.7758881139048075, 0.6938714837592285, 0.6873626185233815, 0.6947621543086686, 0.6788981668414327], "val_a_loss": [1.1254899964667857, 1.0317802450619638, 0.9304099124856293, 0.9027771518449299, 0.8541980290785431, 0.8867944296449423, 0.8221577784133842, 0.8252180779352785, 0.8074154453817755, 0.797820007125847]}
for idx in range(len(loss_dict["train_loss"])):
    print(f"Epoch {idx + 1}: ", end = '')
    for key, value in loss_dict.items():
        print(f"{key}, {loss_dict[key][idx]}", end = ' ')
    print("")

Epoch 1: train_loss, 0.9403468154916316 val1_loss, 1.0525192688140323 val_a_loss, 1.1254899964667857 
Epoch 2: train_loss, 0.8124544766137816 val1_loss, 0.9003291478601553 val_a_loss, 1.0317802450619638 
Epoch 3: train_loss, 0.8684350887276444 val1_loss, 0.8269837388800363 val_a_loss, 0.9304099124856293 
Epoch 4: train_loss, 0.8011271460987204 val1_loss, 0.7788394413793743 val_a_loss, 0.9027771518449299 
Epoch 5: train_loss, 0.7148915472948416 val1_loss, 0.7362728248965942 val_a_loss, 0.8541980290785431 
Epoch 6: train_loss, 0.6817877910957164 val1_loss, 0.7758881139048075 val_a_loss, 0.8867944296449423 
Epoch 7: train_loss, 0.6707390202156437 val1_loss, 0.6938714837592285 val_a_loss, 0.8221577784133842 
Epoch 8: train_loss, 0.6422293554086619 val1_loss, 0.6873626185233815 val_a_loss, 0.8252180779352785 
Epoch 9: train_loss, 0.6390848159525458 val1_loss, 0.6947621543086686 val_a_loss, 0.8074154453817755 
Epoch 10: train_loss, 0.6242188617880774 val1_loss, 0.6788981668414327 val_a_loss,

In [107]:
model_name: Union[None, str] = "ResNet-18 unfreeze all layers, train one image/lesion"

file_path1: Union[None,Path] = path['models'].joinpath("rn18_t1_ufall_10e_base_00_val1_probabilities.csv")
file_path_a: Union[None,Path] = path['models'].joinpath("rn18_t1_ufall_10e_base_00_val_a_probabilities.csv")

aggregate_method: Union[None, Dict[str, List[str]]] = None# { 'max' : ['mel', 'bcc', 'akiec'], 'min' : ['nv'], 'mean' : ['other']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = None#OrderedDict([('mel',0.4), ('bcc', 0.4), ('akiec', 0.4)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = None#OrderedDict([('nv',0.6)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = None #OrderedDict([('mel',1), ('bcc',1), ('akiec',1)])
label_codes: Dict[int, str] = {0: 'other', 1: 'akiec', 2: 'bcc', 3: 'mel', 4: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/df_train['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([ 7.42063492, 29.92      , 19.47916667,  9.00120337,  1.49390853])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


RESNET-18 UNFREEZE ALL LAYERS, TRAIN ONE IMAGE/LESION: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 1869 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0025661,bkl,0.300796,0.18604,0.319144,0.048594,0.145426
1,HAM_0001466,ISIC_0027850,bkl,0.326983,0.072613,0.188737,0.12126,0.290408
2,HAM_0002761,ISIC_0029068,bkl,0.232673,0.038185,0.402253,0.029989,0.2969
3,HAM_0004234,ISIC_0029396,bkl,0.211202,0.0221,0.241529,0.052625,0.472543
4,HAM_0001949,ISIC_0025767,bkl,0.221634,0.028147,0.39415,0.032521,0.323548



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 2535 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0026769,bkl,0.241036,0.102773,0.449137,0.017458,0.189595
1,HAM_0002730,ISIC_0025661,bkl,0.300796,0.18604,0.319144,0.048594,0.145426
2,HAM_0001466,ISIC_0031633,bkl,0.299188,0.120075,0.31067,0.068941,0.201126
3,HAM_0001466,ISIC_0027850,bkl,0.326983,0.072613,0.188737,0.12126,0.290408
4,HAM_0002761,ISIC_0029176,bkl,0.200654,0.054204,0.510056,0.018157,0.216928



RESNET-18 UNFREEZE ALL LAYERS, TRAIN ONE IMAGE/LESION: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0025661,bkl,0.300796,0.18604,0.319144,0.048594,0.145426,2,2
1,HAM_0001466,ISIC_0027850,bkl,0.326983,0.072613,0.188737,0.12126,0.290408,0,0
2,HAM_0002761,ISIC_0029068,bkl,0.232673,0.038185,0.402253,0.029989,0.2969,2,2
3,HAM_0004234,ISIC_0029396,bkl,0.211202,0.0221,0.241529,0.052625,0.472543,4,4
4,HAM_0001949,ISIC_0025767,bkl,0.221634,0.028147,0.39415,0.032521,0.323548,2,2


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0026769,bkl,0.241036,0.102773,0.449137,0.017458,0.189595,2,2
1,HAM_0002730,ISIC_0025661,bkl,0.300796,0.18604,0.319144,0.048594,0.145426,2,2
2,HAM_0001466,ISIC_0031633,bkl,0.299188,0.120075,0.31067,0.068941,0.201126,2,0
3,HAM_0001466,ISIC_0027850,bkl,0.326983,0.072613,0.188737,0.12126,0.290408,0,0
4,HAM_0002761,ISIC_0029176,bkl,0.200654,0.054204,0.510056,0.018157,0.216928,2,2



RESNET-18 UNFREEZE ALL LAYERS, TRAIN ONE IMAGE/LESION: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,50.0,1.0,32.0,6.0,136.0,225,0.222222
akiec,10.0,3.0,27.0,0.0,17.0,57,0.052632
bcc,12.0,1.0,37.0,0.0,32.0,82,0.45122
mel,8.0,0.0,3.0,25.0,118.0,154,0.162338
nv,14.0,0.0,24.0,22.0,1291.0,1351,0.955588
All,94.0,5.0,123.0,53.0,1594.0,1869,_
precision,0.531915,0.6,0.300813,0.471698,0.809912,_,0.220497


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,56.0,1.0,35.0,6.0,127.0,225,0.248889
akiec,11.0,3.0,29.0,0.0,14.0,57,0.052632
bcc,14.0,1.0,40.0,0.0,27.0,82,0.487805
mel,12.0,0.0,3.0,26.0,113.0,154,0.168831
nv,16.0,1.0,30.0,25.0,1279.0,1351,0.946706
All,109.0,6.0,137.0,57.0,1560.0,1869,_
precision,0.513761,0.5,0.291971,0.45614,0.819872,_,0.229754



RESNET-18 UNFREEZE ALL LAYERS, TRAIN ONE IMAGE/LESION: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.752274,0.3688,0.542868,0.3688,0.421998,0.377903,0.367068,0.359007,0.885184,0.88733,0.797379



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.751204,0.380973,0.516349,0.380973,0.421801,0.384209,0.376634,0.37007,0.887617,0.889967,0.800468


<a id='resnet-18c'></a>
## ResNet-18 unfreeze last layers only, train one image/lesion
↑↑ [Contents](#contents) ↑ [ResNet-18 unfreeze all layers, train one image/lesion](#resnet-18b) ↓ [Balanced training set, one image per lesion](#balanced_training)

We unfroze only the last layers (final block and fully-connected layer) of ResNet-18, and trained it on our training set with one image per lesion. We did not apply any data augmentation, other than re-sizing images to 224x224. We trained for 10 epochs.

In [93]:
loss_dict = {"train_loss": [0.6841942672617733, 0.5952099385904148, 0.5661855285923759, 0.5263236108849841, 0.49988894537885, 0.48281896449043415, 0.49773296764628455, 0.48185406921600754, 0.40533364527006843, 0.41797827781093394], "val1_loss": [0.7394442644554301, 0.9502710474402717, 0.6091168883682813, 1.0742951778032013, 0.5876870297814988, 1.0085261004456019, 0.633888594230777, 0.5645148322988555, 0.5903375812557573, 0.644928813738338], "val_a_loss": [0.8377442672732286, 1.16436840054007, 0.7141767045832239, 1.189212791621685, 0.7025116890552454, 1.0914706110954284, 0.7311860972779414, 0.6591545441769995, 0.7012383969638905, 0.7836364168790169]}
for idx in range(len(loss_dict["train_loss"])):
    print(f"Epoch {idx + 1}: ", end = '')
    for key, value in loss_dict.items():
        print(f"{key}, {loss_dict[key][idx]}", end = ' ')
    print("")

Epoch 1: train_loss, 0.6841942672617733 val1_loss, 0.7394442644554301 val_a_loss, 0.8377442672732286 
Epoch 2: train_loss, 0.5952099385904148 val1_loss, 0.9502710474402717 val_a_loss, 1.16436840054007 
Epoch 3: train_loss, 0.5661855285923759 val1_loss, 0.6091168883682813 val_a_loss, 0.7141767045832239 
Epoch 4: train_loss, 0.5263236108849841 val1_loss, 1.0742951778032013 val_a_loss, 1.189212791621685 
Epoch 5: train_loss, 0.49988894537885 val1_loss, 0.5876870297814988 val_a_loss, 0.7025116890552454 
Epoch 6: train_loss, 0.48281896449043415 val1_loss, 1.0085261004456019 val_a_loss, 1.0914706110954284 
Epoch 7: train_loss, 0.49773296764628455 val1_loss, 0.633888594230777 val_a_loss, 0.7311860972779414 
Epoch 8: train_loss, 0.48185406921600754 val1_loss, 0.5645148322988555 val_a_loss, 0.6591545441769995 
Epoch 9: train_loss, 0.40533364527006843 val1_loss, 0.5903375812557573 val_a_loss, 0.7012383969638905 
Epoch 10: train_loss, 0.41797827781093394 val1_loss, 0.644928813738338 val_a_loss, 0

In [61]:
model_name: Union[None, str] = "ResNet-18 unfreeze last layers, train one image/lesion"

file_path1: Union[None,Path] = path['models'].joinpath("rn18_t1_uflast_10e_base_00_val1_probabilities.csv")
file_path_a: Union[None,Path] = path['models'].joinpath("rn18_t1_uflast_10e_base_00_val_a_probabilities.csv")

aggregate_method: Union[None, Dict[str, List[str]]] = None# { 'max' : ['mel', 'bcc', 'akiec'], 'min' : ['nv'], 'mean' : ['other']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = None#OrderedDict([('mel',0.4), ('bcc', 0.4), ('akiec', 0.4)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = None#OrderedDict([('nv',0.6)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = None #OrderedDict([('mel',1), ('bcc',1), ('akiec',1)])
label_codes: Dict[int, str] = {0: 'other', 1: 'akiec', 2: 'bcc', 3: 'mel', 4: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/df_train['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([ 7.42063492, 29.92      , 19.47916667,  9.00120337,  1.49390853])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


RESNET-18 UNFREEZE LAST LAYERS, TRAIN ONE IMAGE/LESION: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 1869 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0025661,bkl,0.050186,0.551444,0.383338,0.008397,0.006635
1,HAM_0001466,ISIC_0027850,bkl,0.561856,0.219038,0.027982,0.184205,0.006919
2,HAM_0002761,ISIC_0029068,bkl,0.104806,0.117343,0.64573,0.003072,0.129048
3,HAM_0004234,ISIC_0029396,bkl,0.047635,0.064842,0.153847,0.06339,0.670287
4,HAM_0001949,ISIC_0025767,bkl,0.951744,8e-05,9.7e-05,0.004153,0.043926



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 2535 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0026769,bkl,0.013689,0.680603,0.304388,0.000793,0.000527
1,HAM_0002730,ISIC_0025661,bkl,0.050186,0.551444,0.383338,0.008397,0.006635
2,HAM_0001466,ISIC_0031633,bkl,0.719853,0.159101,0.029969,0.081175,0.009901
3,HAM_0001466,ISIC_0027850,bkl,0.561856,0.219038,0.027982,0.184205,0.006919
4,HAM_0002761,ISIC_0029176,bkl,0.159595,0.349277,0.472283,0.002441,0.016404



RESNET-18 UNFREEZE LAST LAYERS, TRAIN ONE IMAGE/LESION: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0025661,bkl,0.050186,0.551444,0.383338,0.008397,0.006635,1,1
1,HAM_0001466,ISIC_0027850,bkl,0.561856,0.219038,0.027982,0.184205,0.006919,0,0
2,HAM_0002761,ISIC_0029068,bkl,0.104806,0.117343,0.64573,0.003072,0.129048,2,2
3,HAM_0004234,ISIC_0029396,bkl,0.047635,0.064842,0.153847,0.06339,0.670287,4,4
4,HAM_0001949,ISIC_0025767,bkl,0.951744,8e-05,9.7e-05,0.004153,0.043926,0,0


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0026769,bkl,0.013689,0.680603,0.304388,0.000793,0.000527,1,1
1,HAM_0002730,ISIC_0025661,bkl,0.050186,0.551444,0.383338,0.008397,0.006635,1,1
2,HAM_0001466,ISIC_0031633,bkl,0.719853,0.159101,0.029969,0.081175,0.009901,0,0
3,HAM_0001466,ISIC_0027850,bkl,0.561856,0.219038,0.027982,0.184205,0.006919,0,0
4,HAM_0002761,ISIC_0029176,bkl,0.159595,0.349277,0.472283,0.002441,0.016404,2,2



RESNET-18 UNFREEZE LAST LAYERS, TRAIN ONE IMAGE/LESION: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,170.0,8.0,3.0,8.0,36.0,225,0.755556
akiec,29.0,19.0,3.0,0.0,6.0,57,0.333333
bcc,22.0,23.0,19.0,3.0,15.0,82,0.231707
mel,63.0,1.0,2.0,42.0,46.0,154,0.272727
nv,111.0,3.0,3.0,22.0,1212.0,1351,0.897113
All,395.0,54.0,30.0,75.0,1315.0,1869,_
precision,0.43038,0.351852,0.633333,0.56,0.921673,_,0.359535


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,177.0,5.0,2.0,6.0,35.0,225,0.786667
akiec,33.0,15.0,4.0,0.0,5.0,57,0.263158
bcc,31.0,21.0,18.0,2.0,10.0,82,0.219512
mel,75.0,0.0,1.0,38.0,40.0,154,0.246753
nv,117.0,4.0,3.0,26.0,1201.0,1351,0.888971
All,433.0,45.0,28.0,72.0,1291.0,1869,_
precision,0.408776,0.333333,0.642857,0.527778,0.930287,_,0.325288



RESNET-18 UNFREEZE LAST LAYERS, TRAIN ONE IMAGE/LESION: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.782236,0.498087,0.579448,0.498087,0.533681,0.501211,0.492888,0.53421,0.921955,0.925413,0.860778



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.775281,0.481012,0.568606,0.481012,0.516836,0.480965,0.47291,0.529302,0.921939,0.927203,0.859617


<a id='balanced_training'></a>
## Balanced training set, one image per lesion
↑↑ [Contents](#contents) ↑ [ResNet-18 unfreeze last layers only, train one image/lesion](#resnet-18) ↓ [Balanced training set, all images per lesion, random crop](#balanced_training)

We balanced the training set so that each class was represented by 2000 images, using one image per lesion (i.e. sampling the same image multiple times if necessary). We unfroze the last layers of ResNet-18 and trained it on the resulting balanced dataset of 10000 images. No data augmentation took place, other than re-sizing to 224x224. We trained for 10 epochs.

In [105]:
loss_dict = {"train_loss": [0.7723178447435458, 0.40661499330315726, 0.23246921462039596, 0.14241893370204364, 0.12419252090911848, 0.09358728907559032, 0.03640836093071908, 0.07291126565281718, 0.043546794637362246, 0.05563008315978756], "val1_loss": [0.9402542763855308, 1.072058737088563, 0.8995860062373834, 0.9614278782046621, 1.0363722004465485, 0.9758289296769979, 1.2033462472855887, 1.2390729828729783, 0.9907997921158528, 1.5191293485734958], "val_a_loss": [1.1411655094707385, 1.3135711280163378, 1.0714984280231874, 1.1732232338457833, 1.261856978644937, 1.1758127134933603, 1.4809589933232927, 1.5441848695540101, 1.231908158864826, 1.8223617057192314]}

for idx in range(len(loss_dict["train_loss"])):
    print(f"Epoch {idx + 1}: ", end = '')
    for key, value in loss_dict.items():
        print(f"{key}, {loss_dict[key][idx]}", end = ' ')
    print("")

Epoch 1: train_loss, 0.7723178447435458 val1_loss, 0.9402542763855308 val_a_loss, 1.1411655094707385 
Epoch 2: train_loss, 0.40661499330315726 val1_loss, 1.072058737088563 val_a_loss, 1.3135711280163378 
Epoch 3: train_loss, 0.23246921462039596 val1_loss, 0.8995860062373834 val_a_loss, 1.0714984280231874 
Epoch 4: train_loss, 0.14241893370204364 val1_loss, 0.9614278782046621 val_a_loss, 1.1732232338457833 
Epoch 5: train_loss, 0.12419252090911848 val1_loss, 1.0363722004465485 val_a_loss, 1.261856978644937 
Epoch 6: train_loss, 0.09358728907559032 val1_loss, 0.9758289296769979 val_a_loss, 1.1758127134933603 
Epoch 7: train_loss, 0.03640836093071908 val1_loss, 1.2033462472855887 val_a_loss, 1.4809589933232927 
Epoch 8: train_loss, 0.07291126565281718 val1_loss, 1.2390729828729783 val_a_loss, 1.5441848695540101 
Epoch 9: train_loss, 0.043546794637362246 val1_loss, 0.9907997921158528 val_a_loss, 1.231908158864826 
Epoch 10: train_loss, 0.05563008315978756 val1_loss, 1.5191293485734958 val_

In [67]:
model_name: Union[None, str] = "Balanced, one image/lesion: ResNet-18 last layers unfrozen, no transformation"

file_path1: Union[None,Path] = path['models'].joinpath("rn18_t1_bal_uflast_10e_notfm_00_val1_probabilities.csv")
file_path_a: Union[None,Path] = path['models'].joinpath("rn18_t1_bal_uflast_10e_notfm_00_val_a_probabilities.csv")

aggregate_method: Union[None, Dict[str, List[str]]] = None# { 'max' : ['mel', 'bcc', 'akiec'], 'min' : ['nv'], 'mean' : ['other']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = None#OrderedDict([('mel',0.4), ('bcc', 0.4), ('akiec', 0.4)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = None#OrderedDict([('nv',0.6)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = None #OrderedDict([('mel',1), ('bcc',1), ('akiec',1)])
label_codes: Dict[int, str] = {0: 'other', 1: 'akiec', 2: 'bcc', 3: 'mel', 4: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/df_train['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([ 7.42063492, 29.92, 19.47916667,  9.00120337,  1.49390853])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


BALANCED, ONE IMAGE/LESION: RESNET-18 LAST LAYERS UNFROZEN, NO TRANSFORMATION: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 1869 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0025661,bkl,0.000862,0.8549663,0.1417131,2.421762e-05,0.002435
1,HAM_0001466,ISIC_0027850,bkl,0.045447,0.007546793,0.9458811,0.0009090114,0.000216
2,HAM_0002761,ISIC_0029068,bkl,0.22075,0.09414385,0.6730878,9.995591e-06,0.012008
3,HAM_0004234,ISIC_0029396,bkl,1.9e-05,6.780808e-05,0.001232119,0.00789017,0.990791
4,HAM_0001949,ISIC_0025767,bkl,0.999964,1.919044e-07,7.892933e-07,8.280515e-07,3.4e-05



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 2535 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0026769,bkl,2.9e-05,0.000303,0.999664,3.54499e-07,3.576648e-06
1,HAM_0002730,ISIC_0025661,bkl,0.000862,0.854966,0.141713,2.421762e-05,0.002434572
2,HAM_0001466,ISIC_0031633,bkl,0.340047,0.010757,0.639567,0.004271342,0.00535823
3,HAM_0001466,ISIC_0027850,bkl,0.045447,0.007547,0.945881,0.0009090114,0.0002160815
4,HAM_0002761,ISIC_0029176,bkl,0.00017,0.002487,0.997342,1.752473e-09,8.264573e-07



BALANCED, ONE IMAGE/LESION: RESNET-18 LAST LAYERS UNFROZEN, NO TRANSFORMATION: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0025661,bkl,0.000862,0.8549663,0.1417131,2.421762e-05,0.002435,1,1
1,HAM_0001466,ISIC_0027850,bkl,0.045447,0.007546793,0.9458811,0.0009090114,0.000216,2,2
2,HAM_0002761,ISIC_0029068,bkl,0.22075,0.09414385,0.6730878,9.995591e-06,0.012008,2,2
3,HAM_0004234,ISIC_0029396,bkl,1.9e-05,6.780808e-05,0.001232119,0.00789017,0.990791,4,4
4,HAM_0001949,ISIC_0025767,bkl,0.999964,1.919044e-07,7.892933e-07,8.280515e-07,3.4e-05,0,0


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0026769,bkl,2.9e-05,0.000303,0.999664,3.54499e-07,3.576648e-06,2,1
1,HAM_0002730,ISIC_0025661,bkl,0.000862,0.854966,0.141713,2.421762e-05,0.002434572,1,1
2,HAM_0001466,ISIC_0031633,bkl,0.340047,0.010757,0.639567,0.004271342,0.00535823,2,2
3,HAM_0001466,ISIC_0027850,bkl,0.045447,0.007547,0.945881,0.0009090114,0.0002160815,2,2
4,HAM_0002761,ISIC_0029176,bkl,0.00017,0.002487,0.997342,1.752473e-09,8.264573e-07,2,2



BALANCED, ONE IMAGE/LESION: RESNET-18 LAST LAYERS UNFROZEN, NO TRANSFORMATION: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,98.0,4.0,33.0,11.0,79.0,225,0.435556
akiec,13.0,13.0,23.0,3.0,5.0,57,0.22807
bcc,9.0,2.0,64.0,0.0,7.0,82,0.780488
mel,13.0,2.0,14.0,63.0,62.0,154,0.409091
nv,35.0,3.0,102.0,33.0,1178.0,1351,0.871947
All,168.0,24.0,236.0,110.0,1331.0,1869,_
precision,0.583333,0.541667,0.271186,0.572727,0.885049,_,0.406834


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,116.0,2.0,27.0,12.0,68.0,225,0.515556
akiec,17.0,11.0,24.0,1.0,4.0,57,0.192982
bcc,10.0,2.0,66.0,0.0,4.0,82,0.804878
mel,17.0,3.0,21.0,62.0,51.0,154,0.402597
nv,41.0,6.0,113.0,39.0,1152.0,1351,0.852702
All,201.0,24.0,251.0,114.0,1279.0,1869,_
precision,0.577114,0.458333,0.262948,0.54386,0.900704,_,0.395922



BALANCED, ONE IMAGE/LESION: RESNET-18 LAST LAYERS UNFROZEN, NO TRANSFORMATION: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.757624,0.54503,0.570793,0.54503,0.539139,0.515591,0.518507,0.481777,0.891457,0.88959,0.84957



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.752809,0.553743,0.548592,0.553743,0.525174,0.510267,0.520308,0.492139,0.89114,0.893283,0.84759


<a id='balanced_training'></a>
## Balanced training set, all images per lesion, random crop
↑↑ [Contents](#contents) ↑ [Balanced training set, one image per lesion](#balanced_training) ↓ [Balanced training set, all images per lesion, random crop plus color jitter](#balanced_training)

We balanced the training set so that each class was represented by 2000 images, using all image per lesion before repeating an image. We unfroze the last layers of ResNet-18 and trained it on the resulting balanced dataset of 10000 images. We applied a random 300x300 crop before re-sizing images to 224x224. Also, the validation set was "expanded": each lesion would be represented by three images, with the model's probabilities for each of the three images being combined into a single prediction for the lesion. (A random crop would be applied to each of the three images before the model outputs probabilities.) As with the training set, we could choose the same image three times (one image per lesion), or use all available images (all images per lesion) before repeating one. 

We trained for 10 epochs.

In [92]:
loss_dict = {"train_loss": [0.9991463460861304, 0.7889880284714622, 0.6868163041603832, 0.6122396044647351, 0.5362694739057614, 0.48168310427818056, 0.4359226829042069, 0.40662133136686807, 0.37536838638801545, 0.3589892059136123], "val1_loss": [1.156947471614165, 0.7474318763812665, 0.8180025992597538, 0.8908278138604312, 0.746369184490959, 0.6103228253811143, 0.8802814177046954, 0.6874399251801978, 0.8312929767598689, 0.6630872393130134], "val_a_loss": [1.153050551667217, 0.7414343407594557, 0.8258629115351307, 0.9112645055505378, 0.7416439510333145, 0.6165547028543766, 0.9021512844536285, 0.6956000020727515, 0.840061212781703, 0.6566807949037122]}
for idx in range(len(loss_dict["train_loss"])):
    print(f"Epoch {idx + 1}: ", end = '')
    for key, value in loss_dict.items():
        print(f"{key}, {loss_dict[key][idx]}", end = ' ')
    print("")

Epoch 1: train_loss, 0.9991463460861304 val1_loss, 1.156947471614165 val_a_loss, 1.153050551667217 
Epoch 2: train_loss, 0.7889880284714622 val1_loss, 0.7474318763812665 val_a_loss, 0.7414343407594557 
Epoch 3: train_loss, 0.6868163041603832 val1_loss, 0.8180025992597538 val_a_loss, 0.8258629115351307 
Epoch 4: train_loss, 0.6122396044647351 val1_loss, 0.8908278138604312 val_a_loss, 0.9112645055505378 
Epoch 5: train_loss, 0.5362694739057614 val1_loss, 0.746369184490959 val_a_loss, 0.7416439510333145 
Epoch 6: train_loss, 0.48168310427818056 val1_loss, 0.6103228253811143 val_a_loss, 0.6165547028543766 
Epoch 7: train_loss, 0.4359226829042069 val1_loss, 0.8802814177046954 val_a_loss, 0.9021512844536285 
Epoch 8: train_loss, 0.40662133136686807 val1_loss, 0.6874399251801978 val_a_loss, 0.6956000020727515 
Epoch 9: train_loss, 0.37536838638801545 val1_loss, 0.8312929767598689 val_a_loss, 0.840061212781703 
Epoch 10: train_loss, 0.3589892059136123 val1_loss, 0.6630872393130134 val_a_loss, 

In [73]:
model_name: Union[None, str] = "Balanced, validation expanded 3-fold, all images/lesion: ResNet-18 last layers unfrozen, random crop"

file_path1: Union[None,Path] = path['models'].joinpath("rn18_ta_bal_uflast_10e_rndcrop_00_val1_probabilities.csv")
file_path_a: Union[None,Path] = path['models'].joinpath("rn18_ta_bal_uflast_10e_rndcrop_00_val_a_probabilities.csv")

aggregate_method: Union[None, Dict[str, List[str]]] = None# { 'max' : ['mel', 'bcc', 'akiec'], 'min' : ['nv'], 'mean' : ['other']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = None#OrderedDict([('mel',0.4), ('bcc', 0.4), ('akiec', 0.4)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = None#OrderedDict([('nv',0.6)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = None #OrderedDict([('mel',1), ('bcc',1), ('akiec',1)])
label_codes: Dict[int, str] = {0: 'other', 1: 'akiec', 2: 'bcc', 3: 'mel', 4: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/df_train['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([ 7.42063492, 29.92      , 19.47916667,  9.00120337,  1.49390853])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


BALANCED, VALIDATION EXPANDED 3-FOLD, ALL IMAGES/LESION: RESNET-18 LAST LAYERS UNFROZEN, RANDOM CROP: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 5607 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0025661,bkl,0.196147,0.057795,0.016602,0.724161,0.005296
1,HAM_0002730,ISIC_0025661,bkl,0.247946,0.015202,0.012981,0.716585,0.007287
2,HAM_0002730,ISIC_0025661,bkl,0.365886,0.045827,0.013421,0.569576,0.005289
3,HAM_0001466,ISIC_0027850,bkl,0.582575,0.007791,0.193693,0.185775,0.030167
4,HAM_0001466,ISIC_0027850,bkl,0.000759,4e-06,3.2e-05,0.999,0.000204



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 5607 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0026769,bkl,0.036362,0.9617091,0.00078,0.000977,0.000171
1,HAM_0002730,ISIC_0026769,bkl,0.035796,0.9373081,0.00095,0.025693,0.000254
2,HAM_0002730,ISIC_0025661,bkl,0.552345,0.008390885,0.031259,0.397282,0.010724
3,HAM_0001466,ISIC_0031633,bkl,0.015292,0.0001135414,0.002,0.981527,0.001068
4,HAM_0001466,ISIC_0027850,bkl,0.000478,5.629219e-07,0.000431,0.998866,0.000225



BALANCED, VALIDATION EXPANDED 3-FOLD, ALL IMAGES/LESION: RESNET-18 LAST LAYERS UNFROZEN, RANDOM CROP: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0025661,bkl,0.196147,0.057795,0.016602,0.724161,0.005296,3,3
1,HAM_0002730,ISIC_0025661,bkl,0.247946,0.015202,0.012981,0.716585,0.007287,3,3
2,HAM_0002730,ISIC_0025661,bkl,0.365886,0.045827,0.013421,0.569576,0.005289,3,3
3,HAM_0001466,ISIC_0027850,bkl,0.582575,0.007791,0.193693,0.185775,0.030167,0,3
4,HAM_0001466,ISIC_0027850,bkl,0.000759,4e-06,3.2e-05,0.999,0.000204,3,3


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0026769,bkl,0.036362,0.9617091,0.00078,0.000977,0.000171,1,1
1,HAM_0002730,ISIC_0026769,bkl,0.035796,0.9373081,0.00095,0.025693,0.000254,1,1
2,HAM_0002730,ISIC_0025661,bkl,0.552345,0.008390885,0.031259,0.397282,0.010724,0,1
3,HAM_0001466,ISIC_0031633,bkl,0.015292,0.0001135414,0.002,0.981527,0.001068,3,3
4,HAM_0001466,ISIC_0027850,bkl,0.000478,5.629219e-07,0.000431,0.998866,0.000225,3,3



BALANCED, VALIDATION EXPANDED 3-FOLD, ALL IMAGES/LESION: RESNET-18 LAST LAYERS UNFROZEN, RANDOM CROP: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,145.0,17.0,2.0,45.0,16.0,225,0.644444
akiec,6.0,39.0,7.0,2.0,3.0,57,0.684211
bcc,8.0,15.0,46.0,9.0,4.0,82,0.560976
mel,19.0,5.0,1.0,109.0,20.0,154,0.707792
nv,69.0,18.0,12.0,150.0,1102.0,1351,0.815692
All,247.0,94.0,68.0,315.0,1145.0,1869,_
precision,0.587045,0.414894,0.676471,0.346032,0.962445,_,0.603871


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,153.0,16.0,1.0,39.0,16.0,225,0.68
akiec,6.0,37.0,9.0,2.0,3.0,57,0.649123
bcc,8.0,10.0,49.0,10.0,5.0,82,0.597561
mel,18.0,5.0,1.0,114.0,16.0,154,0.74026
nv,74.0,15.0,10.0,158.0,1094.0,1351,0.809771
All,259.0,83.0,70.0,323.0,1134.0,1869,_
precision,0.590734,0.445783,0.7,0.352941,0.964727,_,0.615033



BALANCED, VALIDATION EXPANDED 3-FOLD, ALL IMAGES/LESION: RESNET-18 LAST LAYERS UNFROZEN, RANDOM CROP: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.771001,0.682623,0.597377,0.682623,0.602441,0.618426,0.649045,0.578273,0.933157,0.934257,0.913871



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.774211,0.695343,0.610837,0.695343,0.616478,0.632802,0.662832,0.588672,0.931156,0.937441,0.910373


Melanoma recall can be improved from 74% to 77%, and and overall balanced accuracy pushed above 70%, but combining probabilities in a way that favours melanoma diagnosis. In fact, when there are multiple images per lesion, we can first combine the probabilities by taking the maximum probability for cancerous lesions, the minimum probability for ```nv```, and the mean for ```other```. Then, we can predict ```mel``` if the maximum ```mel``` probability is at least 0.4, and failing that, predict ```bcc``` if the maximum ```bcc``` probability is at least 0.4, and failing that, apply the same rule to ```akiec```. Moreover, we can require the minimum probability for ```nv``` to be at least 0.6 before predicting that diagnosis. Once we have predictions for the multiple images corresponding to a given lesion, we can proceed to combine those in a way that favours ```mel``` etc. as well: if there is just one image for which ```mel``` is predicted, we make a final prediction of ```mel```. Likewise for ```bcc``` (if there are no ```mel``` predictions), and ```akiec``` (if there are no ```mel``` or ```bcc``` predictions). Failing all of that, we take the most frequent prediction. 

In [76]:
model_name: Union[None, str] = "As above, but combining probabilities/predictions with a bias towards malignant lesions"

aggregate_method: Union[None, Dict[str, List[str]]] = { 'max' : ['mel', 'bcc', 'akiec'], 'min' : ['nv'], 'mean' : ['other']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = OrderedDict([('mel',0.4), ('bcc', 0.4), ('akiec', 0.4)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = OrderedDict([('nv',0.6)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = OrderedDict([('mel',1), ('bcc',1), ('akiec',1)])
label_codes: Dict[int, str] = {0: 'other', 1: 'akiec', 2: 'bcc', 3: 'mel', 4: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/df_train['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([ 7.42063492, 29.92      , 19.47916667,  9.00120337,  1.49390853])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


AS ABOVE, BUT COMBINING PROBABILITIES/PREDICTIONS WITH A BIAS TOWARDS MALIGNANT LESIONS: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 5607 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0025661,bkl,0.196147,0.057795,0.016602,0.724161,0.005296
1,HAM_0002730,ISIC_0025661,bkl,0.247946,0.015202,0.012981,0.716585,0.007287
2,HAM_0002730,ISIC_0025661,bkl,0.365886,0.045827,0.013421,0.569576,0.005289
3,HAM_0001466,ISIC_0027850,bkl,0.582575,0.007791,0.193693,0.185775,0.030167
4,HAM_0001466,ISIC_0027850,bkl,0.000759,4e-06,3.2e-05,0.999,0.000204



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 5607 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0026769,bkl,0.036362,0.9617091,0.00078,0.000977,0.000171
1,HAM_0002730,ISIC_0026769,bkl,0.035796,0.9373081,0.00095,0.025693,0.000254
2,HAM_0002730,ISIC_0025661,bkl,0.552345,0.008390885,0.031259,0.397282,0.010724
3,HAM_0001466,ISIC_0031633,bkl,0.015292,0.0001135414,0.002,0.981527,0.001068
4,HAM_0001466,ISIC_0027850,bkl,0.000478,5.629219e-07,0.000431,0.998866,0.000225



AS ABOVE, BUT COMBINING PROBABILITIES/PREDICTIONS WITH A BIAS TOWARDS MALIGNANT LESIONS: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0025661,bkl,0.196147,0.057795,0.016602,0.724161,0.005296,3,3
1,HAM_0002730,ISIC_0025661,bkl,0.247946,0.015202,0.012981,0.716585,0.007287,3,3
2,HAM_0002730,ISIC_0025661,bkl,0.365886,0.045827,0.013421,0.569576,0.005289,3,3
3,HAM_0001466,ISIC_0027850,bkl,0.582575,0.007791,0.193693,0.185775,0.030167,0,3
4,HAM_0001466,ISIC_0027850,bkl,0.000759,4e-06,3.2e-05,0.999,0.000204,3,3


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0026769,bkl,0.036362,0.9617091,0.00078,0.000977,0.000171,1,1
1,HAM_0002730,ISIC_0026769,bkl,0.035796,0.9373081,0.00095,0.025693,0.000254,1,1
2,HAM_0002730,ISIC_0025661,bkl,0.552345,0.008390885,0.031259,0.397282,0.010724,0,1
3,HAM_0001466,ISIC_0031633,bkl,0.015292,0.0001135414,0.002,0.981527,0.001068,3,3
4,HAM_0001466,ISIC_0027850,bkl,0.000478,5.629219e-07,0.000431,0.998866,0.000225,3,3



AS ABOVE, BUT COMBINING PROBABILITIES/PREDICTIONS WITH A BIAS TOWARDS MALIGNANT LESIONS: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,144.0,18.0,2.0,51.0,10.0,225,0.64
akiec,6.0,39.0,7.0,3.0,2.0,57,0.684211
bcc,7.0,15.0,47.0,10.0,3.0,82,0.573171
mel,18.0,5.0,1.0,119.0,11.0,154,0.772727
nv,97.0,18.0,14.0,205.0,1017.0,1351,0.752776
All,272.0,95.0,71.0,388.0,1043.0,1869,_
precision,0.529412,0.410526,0.661972,0.306701,0.975072,_,0.603244


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,153.0,16.0,1.0,45.0,10.0,225,0.68
akiec,5.0,39.0,9.0,2.0,2.0,57,0.684211
bcc,10.0,10.0,50.0,11.0,1.0,82,0.609756
mel,16.0,6.0,1.0,119.0,12.0,154,0.772727
nv,103.0,17.0,12.0,199.0,1020.0,1351,0.754996
All,287.0,88.0,73.0,376.0,1045.0,1869,_
precision,0.533101,0.443182,0.684932,0.316489,0.976077,_,0.62349



AS ABOVE, BUT COMBINING PROBABILITIES/PREDICTIONS WITH A BIAS TOWARDS MALIGNANT LESIONS: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.730872,0.684577,0.576737,0.684577,0.581226,0.59915,0.637678,0.546201,0.933157,0.934257,0.913871



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.738898,0.700338,0.590756,0.700338,0.596668,0.616245,0.655126,0.559489,0.931156,0.937441,0.910373


<a id='balanced_training'></a>
## Balanced training set, all images per lesion, random crop plus color jitter
↑↑ [Contents](#contents) ↑ [Balanced training set, all images per lesion, random crop](#balanced_training) ↓ [Binary classification: mel versus nv](#binary_classification:)

We balanced the training set so that each class was represented by 2000 images, using all image per lesion before repeating an image. We unfroze the last layers of ResNet-18 and trained it on the resulting balanced dataset of 10000 images. We applied a random 300x300 crop _and_ color jitter (in that order) before re-sizing images to 224x224. Also, the validation set was "expanded": each lesion would be represented by three images, with the model's probabilities for each of the three images being combined into a single prediction for the lesion. (A random crop would be applied to each of the three images before the model outputs probabilities.) As with the training set, we could choose the same image three times (one image per lesion), or use all available images (all images per lesion) before repeating one. 

We trained for 10 epochs (losses below).

In [91]:
loss_dict = {"train_loss": [1.2261508910800702, 1.0288800161105756, 0.941559516964629, 0.8260181626191916, 0.7697516194166848, 0.7009428896652624, 0.6600369117892207, 0.6047064300162343, 0.5590740442752077, 0.5345129312132113], "val1_loss": [0.8239476666785777, 0.7713268345475874, 0.7187916752882302, 0.7436824197118933, 0.8322421129030938, 0.7300132516538724, 0.813856929582967, 1.2769657155414196, 1.1211247820703483, 0.7138704877791249], "val_a_loss": [0.8209363176826049, 0.7629849120381881, 0.7203778320584785, 0.7518738603050058, 0.8353697318075732, 0.7155189809114249, 0.8249124030540274, 1.2862131875855, 1.0868709728291088, 0.7436435564145953]}
for idx in range(len(loss_dict["train_loss"])):
    print(f"Epoch {idx + 1}: ", end = '')
    for key, value in loss_dict.items():
        print(f"{key}, {loss_dict[key][idx]}", end = ' ')
    print("")

Epoch 1: train_loss, 1.2261508910800702 val1_loss, 0.8239476666785777 val_a_loss, 0.8209363176826049 
Epoch 2: train_loss, 1.0288800161105756 val1_loss, 0.7713268345475874 val_a_loss, 0.7629849120381881 
Epoch 3: train_loss, 0.941559516964629 val1_loss, 0.7187916752882302 val_a_loss, 0.7203778320584785 
Epoch 4: train_loss, 0.8260181626191916 val1_loss, 0.7436824197118933 val_a_loss, 0.7518738603050058 
Epoch 5: train_loss, 0.7697516194166848 val1_loss, 0.8322421129030938 val_a_loss, 0.8353697318075732 
Epoch 6: train_loss, 0.7009428896652624 val1_loss, 0.7300132516538724 val_a_loss, 0.7155189809114249 
Epoch 7: train_loss, 0.6600369117892207 val1_loss, 0.813856929582967 val_a_loss, 0.8249124030540274 
Epoch 8: train_loss, 0.6047064300162343 val1_loss, 1.2769657155414196 val_a_loss, 1.2862131875855 
Epoch 9: train_loss, 0.5590740442752077 val1_loss, 1.1211247820703483 val_a_loss, 1.0868709728291088 
Epoch 10: train_loss, 0.5345129312132113 val1_loss, 0.7138704877791249 val_a_loss, 0.74

In [77]:
model_name: Union[None, str] = "Balanced, validation expanded 3-fold, all images/lesion: ResNet-18 last layers unfrozen, random crop plus color jitter"

file_path1: Union[None,Path] = path['models'].joinpath("rn18_ta_bal_uflast_10e_rndcropjit_01_val1_probabilities.csv")
file_path_a: Union[None,Path] = path['models'].joinpath("rn18_ta_bal_uflast_10e_rndcropjit_01_val_a_probabilities.csv")

aggregate_method: Union[None, Dict[str, List[str]]] = None# { 'max' : ['mel', 'bcc', 'akiec'], 'min' : ['nv'], 'mean' : ['other']}
threshold_dict_help: Union[None, OrderedDict[str, float]] = None#OrderedDict([('mel',0.4), ('bcc', 0.4), ('akiec', 0.4)])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = None#OrderedDict([('nv',0.6)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = None #OrderedDict([('mel',1), ('bcc',1), ('akiec',1)])
label_codes: Dict[int, str] = {0: 'other', 1: 'akiec', 2: 'bcc', 3: 'mel', 4: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/df_train['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([ 7.42063492, 29.92      , 19.47916667,  9.00120337,  1.49390853])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


BALANCED, VALIDATION EXPANDED 3-FOLD, ALL IMAGES/LESION: RESNET-18 LAST LAYERS UNFROZEN, RANDOM CROP: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 5607 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0025661,bkl,0.047352,0.000851,0.000754,0.943291,0.007752
1,HAM_0002730,ISIC_0025661,bkl,0.33462,0.106245,0.034429,0.474315,0.050391
2,HAM_0002730,ISIC_0025661,bkl,0.646562,0.048694,0.043296,0.215025,0.046423
3,HAM_0001466,ISIC_0027850,bkl,0.638631,0.105183,0.008891,0.174365,0.072931
4,HAM_0001466,ISIC_0027850,bkl,0.467379,0.00014,0.000118,0.519765,0.012598



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 5607 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv
0,HAM_0002730,ISIC_0026769,bkl,0.319865,0.489471,0.005783,0.173567,0.011314
1,HAM_0002730,ISIC_0026769,bkl,0.274868,0.546251,0.006499,0.165906,0.006476
2,HAM_0002730,ISIC_0025661,bkl,0.133562,0.000708,0.002278,0.837677,0.025775
3,HAM_0001466,ISIC_0031633,bkl,0.186304,0.084551,0.000807,0.72102,0.007318
4,HAM_0001466,ISIC_0027850,bkl,0.067095,1.8e-05,3.2e-05,0.894343,0.038511



BALANCED, VALIDATION EXPANDED 3-FOLD, ALL IMAGES/LESION: RESNET-18 LAST LAYERS UNFROZEN, RANDOM CROP: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0025661,bkl,0.047352,0.000851,0.000754,0.943291,0.007752,3,3
1,HAM_0002730,ISIC_0025661,bkl,0.33462,0.106245,0.034429,0.474315,0.050391,3,3
2,HAM_0002730,ISIC_0025661,bkl,0.646562,0.048694,0.043296,0.215025,0.046423,0,3
3,HAM_0001466,ISIC_0027850,bkl,0.638631,0.105183,0.008891,0.174365,0.072931,0,3
4,HAM_0001466,ISIC_0027850,bkl,0.467379,0.00014,0.000118,0.519765,0.012598,3,3


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_other,prob_akiec,prob_bcc,prob_mel,prob_nv,pred,pred_final
0,HAM_0002730,ISIC_0026769,bkl,0.319865,0.489471,0.005783,0.173567,0.011314,1,1
1,HAM_0002730,ISIC_0026769,bkl,0.274868,0.546251,0.006499,0.165906,0.006476,1,1
2,HAM_0002730,ISIC_0025661,bkl,0.133562,0.000708,0.002278,0.837677,0.025775,3,1
3,HAM_0001466,ISIC_0031633,bkl,0.186304,0.084551,0.000807,0.72102,0.007318,3,3
4,HAM_0001466,ISIC_0027850,bkl,0.067095,1.8e-05,3.2e-05,0.894343,0.038511,3,3



BALANCED, VALIDATION EXPANDED 3-FOLD, ALL IMAGES/LESION: RESNET-18 LAST LAYERS UNFROZEN, RANDOM CROP: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,103.0,22.0,6.0,61.0,33.0,225,0.457778
akiec,7.0,35.0,3.0,7.0,5.0,57,0.614035
bcc,12.0,18.0,34.0,12.0,6.0,82,0.414634
mel,16.0,5.0,3.0,102.0,28.0,154,0.662338
nv,44.0,22.0,5.0,154.0,1126.0,1351,0.833457
All,182.0,102.0,51.0,336.0,1198.0,1869,_
precision,0.565934,0.343137,0.666667,0.303571,0.9399,_,0.508536


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,other,akiec,bcc,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
other,96.0,25.0,7.0,71.0,26.0,225,0.426667
akiec,9.0,35.0,3.0,5.0,5.0,57,0.614035
bcc,17.0,16.0,35.0,7.0,7.0,82,0.426829
mel,13.0,6.0,4.0,104.0,27.0,154,0.675325
nv,55.0,19.0,7.0,145.0,1125.0,1351,0.832717
All,190.0,101.0,56.0,332.0,1190.0,1869,_
precision,0.505263,0.346535,0.625,0.313253,0.945378,_,0.509124



BALANCED, VALIDATION EXPANDED 3-FOLD, ALL IMAGES/LESION: RESNET-18 LAST LAYERS UNFROZEN, RANDOM CROP: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.749064,0.596448,0.563842,0.596448,0.553619,0.551497,0.568662,0.516344,0.898182,0.908291,0.861048



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.746388,0.595114,0.547086,0.595114,0.542033,0.545279,0.565916,0.513809,0.892117,0.903874,0.853041


<a id='binary_classification:'></a>
## Binary classification: mel versus nv
↑↑ [Contents](#contents) ↑ [Balanced training set, all images per lesion, random crop plus color jitter](#balanced_training)

We restricted the dataset to records with ```dx``` equal to ```mel``` or ```nv``` only, then trained ResNet-18 (all layers unfrozen) on the training set (balanced, 2000 images of each class, using one image per lesion) for 10 epochs. No transformation applied to images other than re-sizing and normalization. The model gave the probability of ```nv``` as more than 50% for each image in the validation set. However, if we lowered the threshold for ```mel``` classification to 30%, the balanced accuracy approaches 80%.

In [98]:
loss_dict = {"train_loss": [0.3069790564213239, 0.2560142851467674, 0.24186766565057402, 0.23501715449788046, 0.23697822678385053, 0.23798765210395165, 0.22586538283326102, 0.24267132374200415, 0.2264327605565389, 0.22700351735590196], "val1_loss": [0.26597966835834086, 0.2674321634719187, 0.24957667954731733, 0.257772229379043, 0.24050388250422353, 0.2595634488000845, 0.2994754124956671, 0.5925716940111366, 0.34239149932788376, 0.2619993652527531], "val_a_loss": [0.3287006087465993, 0.3436150558747261, 0.31422101012280873, 0.31543604032166545, 0.2923741217067976, 0.307068191799185, 0.3987054240874826, 0.5768244018480829, 0.3932816789226396, 0.2884001349917643]}

for idx in range(len(loss_dict["train_loss"])):
    print(f"Epoch {idx + 1}: ", end = '')
    for key, value in loss_dict.items():
        print(f"{key}, {loss_dict[key][idx]}", end = ' ')
    print("")

Epoch 1: train_loss, 0.3069790564213239 val1_loss, 0.26597966835834086 val_a_loss, 0.3287006087465993 
Epoch 2: train_loss, 0.2560142851467674 val1_loss, 0.2674321634719187 val_a_loss, 0.3436150558747261 
Epoch 3: train_loss, 0.24186766565057402 val1_loss, 0.24957667954731733 val_a_loss, 0.31422101012280873 
Epoch 4: train_loss, 0.23501715449788046 val1_loss, 0.257772229379043 val_a_loss, 0.31543604032166545 
Epoch 5: train_loss, 0.23697822678385053 val1_loss, 0.24050388250422353 val_a_loss, 0.2923741217067976 
Epoch 6: train_loss, 0.23798765210395165 val1_loss, 0.2595634488000845 val_a_loss, 0.307068191799185 
Epoch 7: train_loss, 0.22586538283326102 val1_loss, 0.2994754124956671 val_a_loss, 0.3987054240874826 
Epoch 8: train_loss, 0.24267132374200415 val1_loss, 0.5925716940111366 val_a_loss, 0.5768244018480829 
Epoch 9: train_loss, 0.2264327605565389 val1_loss, 0.34239149932788376 val_a_loss, 0.3932816789226396 
Epoch 10: train_loss, 0.22700351735590196 val1_loss, 0.2619993652527531 

In [103]:
model_name: Union[None, str] = "Binary classification: mel versus nv"

file_path1: Union[None,Path] = path['models'].joinpath("rn18_t1_ufall_10e_melnv_base_00_val1_probabilities.csv")
file_path_a: Union[None,Path] = path['models'].joinpath("rn18_t1_ufall_10e_melnv_base_00_val_a_probabilities.csv")

aggregate_method: Union[None, Dict[str, List[str]]] = None#{ 'max' : ['mel'], 'min' : ['nv'],}
threshold_dict_help: Union[None, OrderedDict[str, float]] = None#OrderedDict([('mel',0.3),])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = None#OrderedDict([('nv',0.7)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = None#OrderedDict([('mel',1),])
label_codes: Dict[int, str] = {0: 'mel', 1: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/.df['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([7.02425876, 1.16599553])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


BINARY CLASSIFICATION: MEL VERSUS NV: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 1505 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_mel,prob_nv
0,HAM_0001751,ISIC_0024698,nv,0.218357,0.781643
1,HAM_0005678,ISIC_0031023,mel,0.411312,0.588688
2,HAM_0005191,ISIC_0031177,mel,0.370893,0.629107
3,HAM_0004476,ISIC_0030417,mel,0.176949,0.823051
4,HAM_0000876,ISIC_0032396,mel,0.289566,0.710434



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 1970 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_mel,prob_nv
0,HAM_0001751,ISIC_0024698,nv,0.218357,0.781643
1,HAM_0005678,ISIC_0031023,mel,0.411312,0.588688
2,HAM_0005678,ISIC_0028086,mel,0.435894,0.564106
3,HAM_0005191,ISIC_0031177,mel,0.370893,0.629107
4,HAM_0004476,ISIC_0030417,mel,0.176949,0.823051



BINARY CLASSIFICATION: MEL VERSUS NV: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_mel,prob_nv,pred,pred_final
0,HAM_0001751,ISIC_0024698,nv,0.218357,0.781643,1,1
1,HAM_0005678,ISIC_0031023,mel,0.411312,0.588688,1,1
2,HAM_0005191,ISIC_0031177,mel,0.370893,0.629107,1,1
3,HAM_0004476,ISIC_0030417,mel,0.176949,0.823051,1,1
4,HAM_0000876,ISIC_0032396,mel,0.289566,0.710434,1,1


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_mel,prob_nv,pred,pred_final
0,HAM_0001751,ISIC_0024698,nv,0.218357,0.781643,1,1
1,HAM_0005678,ISIC_0031023,mel,0.411312,0.588688,1,1
2,HAM_0005678,ISIC_0028086,mel,0.435894,0.564106,1,1
3,HAM_0005191,ISIC_0031177,mel,0.370893,0.629107,1,1
4,HAM_0004476,ISIC_0030417,mel,0.176949,0.823051,1,1



BINARY CLASSIFICATION: MEL VERSUS NV: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
mel,0,154.0,154,0.0
nv,0,1351.0,1351,1.0
All,0,1505.0,1505,_
precision,_,0.897674,_,_


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
mel,0,154.0,154,0.0
nv,0,1351.0,1351,1.0
All,0,1505.0,1505,_
precision,_,0.897674,_,_



BINARY CLASSIFICATION: MEL VERSUS NV: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.897674,0.5,0.897674,0.5,0.458215,0.473039,0.488855,0.0,,,



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.897674,0.5,0.897674,0.5,0.458215,0.473039,0.488855,0.0,,,


In [99]:
model_name: Union[None, str] = "Binary classification: mel versus nv"

aggregate_method: Union[None, Dict[str, List[str]]] = { 'max' : ['mel'], 'min' : ['nv'],}
threshold_dict_help: Union[None, OrderedDict[str, float]] = OrderedDict([('mel',0.3),])
threshold_dict_hinder: Union[None, OrderedDict[str, float]] = OrderedDict([('nv',0.7)])
votes_to_win_dict: Union[None, OrderedDict[str, int]] = OrderedDict([('mel',1),])
label_codes: Dict[int, str] = {0: 'mel', 1: 'nv'}
prefix: Union[None, str] = 'prob_'
# Weights inversely proportional to relative class size in the training set, giving more importance to smaller classes.
# weights = 1/.df['label'].value_counts(normalize=True).sort_index().values # None
weights: Union[None, np.ndarray] = np.array([7.02425876, 1.16599553])
    
print_model_evaluation(model_name=model_name,
                       file_path1=file_path1, 
                       file_path_a=file_path_a,
                       aggregate_method=aggregate_method,
                       threshold_dict_help=threshold_dict_help,
                       threshold_dict_hinder=threshold_dict_hinder,
                       votes_to_win_dict=votes_to_win_dict, 
                       label_codes=label_codes,
                       prefix=prefix,
                       weights=weights,)


BINARY CLASSIFICATION: MEL VERSUS NV: PROBABILITIES

VALIDATION SET: ONE IMAGE PER LESION

Header: full dataframe has 1505 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_mel,prob_nv
0,HAM_0001751,ISIC_0024698,nv,0.218357,0.781643
1,HAM_0005678,ISIC_0031023,mel,0.411312,0.588688
2,HAM_0005191,ISIC_0031177,mel,0.370893,0.629107
3,HAM_0004476,ISIC_0030417,mel,0.176949,0.823051
4,HAM_0000876,ISIC_0032396,mel,0.289566,0.710434



VALIDATION SET: ALL IMAGES PER LESION

Header: full dataframe has 1970 rows. Columns are also restricted for display purposes.


Unnamed: 0,lesion_id,image_id,dx,prob_mel,prob_nv
0,HAM_0001751,ISIC_0024698,nv,0.218357,0.781643
1,HAM_0005678,ISIC_0031023,mel,0.411312,0.588688
2,HAM_0005678,ISIC_0028086,mel,0.435894,0.564106
3,HAM_0005191,ISIC_0031177,mel,0.370893,0.629107
4,HAM_0004476,ISIC_0030417,mel,0.176949,0.823051



BINARY CLASSIFICATION: MEL VERSUS NV: PREDICTIONS

VALIDATION SET, ONE IMAGE PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_mel,prob_nv,pred,pred_final
0,HAM_0001751,ISIC_0024698,nv,0.218357,0.781643,1,1
1,HAM_0005678,ISIC_0031023,mel,0.411312,0.588688,0,0
2,HAM_0005191,ISIC_0031177,mel,0.370893,0.629107,0,0
3,HAM_0004476,ISIC_0030417,mel,0.176949,0.823051,1,1
4,HAM_0000876,ISIC_0032396,mel,0.289566,0.710434,1,1


VALIDATION SET, ALL IMAGES PER LESION: COMBINING PROBABILITIES, MAKING PREDICTIONS, COMBINING PREDICTIONS


Unnamed: 0,lesion_id,image_id,dx,prob_mel,prob_nv,pred,pred_final
0,HAM_0001751,ISIC_0024698,nv,0.218357,0.781643,1,1
1,HAM_0005678,ISIC_0031023,mel,0.411312,0.588688,0,0
2,HAM_0005678,ISIC_0028086,mel,0.435894,0.564106,0,0
3,HAM_0005191,ISIC_0031177,mel,0.370893,0.629107,0,0
4,HAM_0004476,ISIC_0030417,mel,0.176949,0.823051,1,1



BINARY CLASSIFICATION: MEL VERSUS NV: CONFUSION MATRICES

CONFUSION MATRIX: VALIDATION SET, ONE IMAGE PER LESION


predicted,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
mel,107.0,47.0,154,0.694805
nv,155.0,1196.0,1351,0.88527
All,262.0,1243.0,1505,_
precision,0.408397,0.962188,_,0.65067


CONFUSION MATRIX: VALIDATION SET, ALL IMAGES PER LESION


predicted,mel,nv,All,recall
actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
mel,115.0,39.0,154,0.746753
nv,184.0,1167.0,1351,0.863805
All,299.0,1206.0,1505,_
precision,0.384615,0.967662,_,0.664624



BINARY CLASSIFICATION: MEL VERSUS NV: METRICS


ONE IMAGE PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.865781,0.790038,0.685293,0.790038,0.695423,0.718276,0.754497,0.463646,,,



ALL IMAGES PER LESION


Unnamed: 0,ACC,BACC,precision,recall,F1/2,F1,F2,MCC,ROC-AUC mac,ROC-AUC wt,ROC-AUC wt*
0,0.851827,0.805279,0.676139,0.805279,0.685433,0.710257,0.755584,0.463773,,,
