# Using Deep Transfer Learning to detect COVID-19

Can we use Deep Learning to analyze a set of chest XRays of known COVID-19 patients to detect when a patient has COVID-19?

The work and information in this notebook is inspired by a [blog post](https://www.pyimagesearch.com/2020/03/16/detecting-covid-19-in-x-ray-images-with-keras-tensorflow-and-deep-learning/) by Adrian Rosebrock on his website [PyImageSearch](www.pyimagesearch.com)

In that blog post, Adrian shows how to apply transfer learning using the [VGG16 Convolution Neural Network](https://neurohive.io/en/popular-networks/vgg16/) using the network weights from the ImageNet dataset.  

My contribution to that great blog post is to go into some detail on what transfer learning is and why it was so powerful in this application.  I will then show how we can use other Deep Learning Convolutional networks for transfer learning and their performance on the same dataset.

Lastly we will look at how a model trained on the COVID-19 XRay images does against images of normal chest XRays and chest XRays of those with Pneumonia to see if the model can accurately detect that chest XRay is NOT COVID-19.


<font color="red" size="5">Like Adrian, I want to be very clear that the information here, and derived from [Adrian's](https://www.pyimagesearch.com/2020/03/16/detecting-covid-19-in-x-ray-images-with-keras-tensorflow-and-deep-learning/) blog post is meant for Deep Learning educational purposes only.  In no way is it implied that this technique should be used for the identification of COVID-19.</font>

## DataSets

Two datasets will be used for this analysis.  One one source collection of chest X-Rays of COVID-19 patients hosted on Github.  There other is from the Kaggle site which contains chest X-Rays of normal lung and those with pneumonia.

### COVID-19 Image Dataset

There is a Github repo started by Joseph Paul Cohen called [covid-chestxray-dataset](https://github.com/ieee8023/covid-chestxray-dataset).  Joesph Paul Cohen has created a GitHub repo to collect chest X-Rays of anonymized COVID-9 patients.  He is also collecting other respiratory X-Rays such as MERs and SARs.

![Covid19Share](./notebook_images/covid19-share-image.png)

From the README.md file in that Github repo:

`We are building a database of COVID-19 cases with chest X-ray or CT images. We are looking for COVID-19 cases as well as MERS, SARS, and ARDS.`

We will use this dataset to have a model learn what a chest XRay looks like that has COVID-19.

This dataset is updated frequently.  New images were added as I was working with the dataset so check back often.

### Kaggle Chest X-Ray Images (Pneumonia)

[Kaggle](www.kaggle.com) is an online community of people interested in data science.  It allows users to find, publish, explore and build models around datasets made available to the public.  

The dataset we will use is the [Chest X-Ray Images (Pneumonia)](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) dataset.  This dataset has chest X-Ray images of normal lungs as well as chest X-Ray images of lungs with Pneumonia.

We can use this dataset to train the model on what normal chest X-Rays look like.

### Visual Inspect of Datasets

<br>
<table>
  <tr style="text-align:center; background-color:white">
    <td> <img src="./notebook_images/covid19-collage.png" alt="COVID" style="width: 400px;"/></td>
    <td> <img src="./notebook_images/normal-collage.png" alt="NORMAL" style="width: 400px;"/> </td>
    <td> <img src="./notebook_images/pneumonia-collage.png" alt="PNEU" style="width: 400px;"/> </td>
  </tr>
  <tr style="width:100%;">
    <td style="width:20%;"> <p style="font-family:overpass;font-size:16px;text-align:center;color:#303030;font-weight:300;">COVID-19</p> </td>
    <td style="width:20%;"> <p style="font-family:overpass;font-size:16px;text-align:center;color:#303030;font-weight:300;">NORMAL</p> </td>
    <td style="width:20%;"> <p style="font-family:overpass;font-size:16px;text-align:center;color:#303030;font-weight:300;">PNEUMONIA</p> </td>

  </tr>
</table>

<br>


### Caveats

I would like to stress that the datasets being used are not vetted by myself or as far as I know, anyone with expertise in the field.  We are using datasets from disparate sources, collected at different times with different procedures.  I have no way of knowing if the image is really of a COVID-19 Chest X-Ray, or some other ailament that resembles COVID-19.

So take this exercise as an interesting use case of applying Deep Transfer Learning to a set of images for classification.  

The other aspect to consider is that there are not many examples of chest X-Rays of COVID-19 patients.  The lack of data will impact the degree of trust we can have in the results.  The more data we can collect, the better the training and high degree of confidence we will have in the models.  Until then, we can only work with what we have.

## What is Deep Transfer Learning - Fine Tuning

There are two types of transfer learning; feature extraction and fine tuning.

Fine tuning is when the fully connected network (FCN) layer of a convolutional neural network (CNNs) is removed and retrained with a new FCN layer.  But first - lets talk about CNNs.

Convolutional Neural Networks (CNNs) are commonly used when working with images and Deep Learning.

Think of a convolution as a small, 3x3 or 5x5, etc, mathematical matrix that is applied to an image to alter the image.  We use convolutions all time when using image processing software to sharpen or blur an image.  The goal of training a CNN is to determine the values of the matrix convolutions that will alter the image in such a way as to expose features of the image that the FCN layer can use for classification.

A CNN model will be made up of some number of convolutional layers.  Each layer will have a different number of kernels ( small matrix we talked about ), and a final fully connected network (FCN) layer that will be used to perform the actual classification step.

The initial convolutional layers, also called the convolutional base, act as a feature extraction layer.  This layer is attempting to identify the features in a dataset and for an image that might be interesting parts of the images. According to Francois Chollet the creator of Keras from his book Deep Learning with Python,

`... the representations learned by the convolutional base as likely to be more generic [than the fully connected layer] and therefore more reusable; the feature maps of the convnet are presence maps of generic concepts over a picture, which is likely to be useful regardless of the conputer-vision problem at hand.`

This means that the convolution layers can be trained to identify interesting features based on how the model was trained.  This does imply that the model was trained on images with some commonality to the new problem.

`So if your new dataset differs a lot from the dataset on which the original model was trained, you may be better off using on the first few layers of the model to do feature extraction, rather than using the entire convolutional base`.

The representation learned by the fully connected network layer will be specific to the new dataset that the model is trained on, as it will only contain information about the set of possible outcomes.





![CNN Arch](notebook_images/A-convolutional-neural-networks-CNN.png)

<br>
<br>

<span><font size="2">
    A New Method for Face Recognition Using Convolutional Neural Network - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/A-convolutional-neural-networks-CNN_fig6_321286547 [accessed 18 Mar, 2020]
    </font></span>

This means we can use a trained model leveraging all of the time and data it took for training the convolutional part, and just remove the FCN layer.  

A properly trained CNN requires a lot of data and CPU/GPU time.  If our dataset is small, we cannot start from a fresh CNN model and try to train it and hope for good results. 

![cnnlayer](notebook_images/ccn_layer.png)

Leveraging the frozen CNN layer, we just have to train a new fully connected layer.  This requires much less CPU and while we would like as much data as is possible, we can obtain very good results with less data on the fully connected layer.

![cnnlayer](notebook_images/cnn_new_fcn.png)

## Applying Deep Transfer Learning

We will be using Tensorflow and Keras to build out a model for the COVID-19 Chest X-Ray detection.  

Keras comes with a number of models with their pre-configured architecture.  You can find out more about the Keras models [here](https://keras.io/applications/).  Pre-configured means that the exact architecture in terms of the number of layers and number of kernels is already configured.  But these models have no weights.  It is the training process that determines the weights to use, however, we can use the weights that have already be determined to work well using large amounts of image data.  For that we will use the imagenet weights for each of the models.

[ImageNet](http://imagenet.stanford.edu) is an image database of millions of images.  These images were used to train the model architectures and those best model weights can be made available so that we do not have to train a model from nothing.



We are going to look at 4 Keras models:

* VGG16 This was also the model that Adrian used in his [blog post](https://www.pyimagesearch.com/2020/03/16/detecting-covid-19-in-x-ray-images-with-keras-tensorflow-and-deep-learning/).  More information can be found [here](https://arxiv.org/abs/1409.1556)

* VGG19  More information can be found [here](https://arxiv.org/abs/1409.1556)

* ResNet50  More information can be found [here](https://arxiv.org/abs/1512.03385)

* ResNet50V2  More information can be found [here](https://arxiv.org/abs/1603.05027)

The exact details of the models can be found from the links above

We are going to perform the following on each of the 4 models:

* Train the model on COVID-19 images and the same number of NORMAL Chest X-Ray images from the Kaggle dataset

* Display the Confusion Matrix

* Display the Accuracy

* Sensitivity aka, Recall or True Positive Rate.  This is calculated as: TP/(TP + FN)

Sensitivity is the models ability to predit true positives of each category. 


Sensitivity states, when we predict a chest X-Ray is a COVID-19 patient, how often did we get that right.  For example, if this number is 93%, this mean 93% of the time we correctly predicted the X-Ray as COVID-19.  This also means that 7% of the time we falsely said it was COVID-19, but really was NOT.  For this scenario, we would rather error on the side that the person had COVID-19 and quarentine even if it turns out they did not.


* Specificity aka, True Negative Rate.  This is calculated as: TN/(TN+FP)


Specificity is the metric that evaluates the models ability to predict true negatives of each category.  


In this case NORMAL Chest X-Rays. For example, if the specificity is 93%, then 93% of the time we correctly predicted the X-Ray was NOT COVID-19.  This also means that 7% of the time we falsely indicated an X-Ray was not indicative of COVID-19 but in fact it was.  This scenario is a little more problematic if someone was told they were free from COVID-19, but in fact had it and inadvertantly spread the virus.





## Implementation Details

In [None]:
from build_covid_dataset import create_covid_dataset
from sample_kaggle_dataset import create_kaggle_dataset
dataset_root = './dataset/0318'  # because the dataset is being added to all of the time make it easy to change

### COVID-19 Dataset

Go to github and clone the repository:

https://github.com/ieee8023/covid-chestxray-dataset

The function `create_covid_dataset`, will read the manifest file and pull out all of the COVID19 images and copy those to the specified output directory.


In [None]:
covid_github_dir = '/Volumes/MacBackup/covid-chestxray-dataset'
covid_output_dir = f'{dataset_root}/covid'

In [None]:
covid_file_count = create_covid_dataset(covid_github_dir, covid_output_dir)
covid_file_count

### Kaggle Dataset

You will have to have a Kaggle account to download the dataset.  

The function `create_kaggle_dataset` will take a random sample from the dataset directory specified and put those images into the output directory.  For this experiment we are only taking images from the NORMAL training set.

In [None]:
kaggle_dataset_dir = '/Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/train/NORMAL'
normal_output_dir = f'{dataset_root}/normal'

In [None]:
create_kaggle_dataset(kaggle_dataset_dir, normal_output_dir, covid_file_count)

### Setting up VGG16 for transfer learning

To setup the Deep Learning models for transfer learning, we have to remove the fully connected layer.  Create the VGG16 class and specify the weights as 'imagenet' weights and include_top=False.  This will pre-initialize all of the weights to be those trained on the ImageNet dataset, and remove the top FCN layer.

```python
baseModel = VGG16(weights="imagenet", include_top=False,
     						input_tensor=Input(shape=(224, 224, 3)))
```

The first parameter, `weights` is set to imagenet.  This means we want to use the kernel values for all of the convolutional matricies used to train the very large ImageNet dataset.  Doing so means we can leverage all of the training done previously on a huge dataset so we do not have to do that.

The second parameter, `include_top` is set to False.  This remove the FCN layer from the VGG16 convolutional neural network.

The third parameter, `input_tensor` is set to the Input shape of the images.

Now that we have removed the FCN layer, we need to add our own FCN layer that wont have any weights.  The new weights for the FCN layer will need to be learned by training the model on the new Chest X-Ray data.

```python
    # construct the head of the model that will be placed on top of the
    # the base model
    headModel = baseModel.output
    headModel = AveragePooling2D(pool_size=(4, 4))(headModel)
    headModel = Flatten(name="flatten")(headModel)
    headModel = Dense(64, activation="relu")(headModel)
    headModel = Dropout(0.5)(headModel)
    headModel = Dense(2, activation="softmax")(headModel)
    # place the head FC model on top of the base model (this will become
    # the actual model we will train)
    model = Model(inputs=baseModel.input, outputs=headModel)

```

Here we start with the baseModel, which is the VGG16 model architecture, initialized with the 'imagenet' weights and we are going to add a new fully connected network (FCN) layer of 64 nodes, followed by a drop out layer randomly removing 1/2 the nodes to reduce overfitting then feed that into a 2 nodes output layer.

The 2 nodes represent the probability of a COVID-19 X-Ray or a Normal X-Ray.

At this point we have reconstructed the VGG16 and we are ready to retrain.


The FCN defined above is taken directly from the blog post, but there is nothing particularly special about that configuration.  This is an area for research to determine the optimal FCN.

Finally we create a model which is the combination of the baseModel which is the CNN and the new FCN model for the outputs.

Recall however that when we train the model, we do <b>NOT</b> want to re-train the weights for the baseModel.  We want to use the imagenet weights.  To do that we need to freeze the baseModel.

```python
    for layer in baseModel.layers:
        layer.trainable = False

```

Keras has a number of models, and it becomes very easy for us to try many different models to see which will perform best.  

To evaluate multiple models, I have refactored the original implementation to take a collection of models.

```python
        MODELS = [
            {
                "base_model": VGG16(weights="imagenet", include_top=False,
                                    input_tensor=Input(shape=(224, 224, 3))),
                "name": "vgg16"
            },
            {
                "base_model": VGG19(weights="imagenet", include_top=False,
                                    input_tensor=Input(shape=(224, 224, 3))),
                "name": "vgg19"
            },
            {
                "base_model": ResNet50(weights="imagenet", include_top=False,
                                       input_tensor=Input(shape=(224, 224, 3))),
                "name": "resnet50"

            },
            {
                "base_model": ResNet50V2(weights="imagenet", include_top=False,
                                       input_tensor=Input(shape=(224, 224, 3))),
                "name": "resnet50v2"

            }
        ]

```

We are going to evaluate 4 models, using the same FCN to see how each model performs against the dataset.

During testing of the models, some were not well behaved and the model was fit with a ModelCheckpoint to save the 'best' model.  The results shown below are for the 'best' model, not the predicts after the last epoch.

You can find the full implementation of my version of the training script in my Github repository.

For a full explaination of Adrians approach please see his [blog post](https://www.pyimagesearch.com/2020/03/16/detecting-covid-19-in-x-ray-images-with-keras-tensorflow-and-deep-learning/) .  

### Train the models

Even though we are only training the small FCN layers, it can take some time.  If you do not want to train all 4 models, then I recommend you pass a models parameter to the `train_covid_model` function with a single value.  For example:

```python
        MODELS = [
            {
                "base_model": ResNet50V2(weights="imagenet", include_top=False,
                                       input_tensor=Input(shape=(224, 224, 3))),
                "name": "resnet50v2"

            }
        ]

train_covid_model('./dataset/0318', MODELS)

```

To train the models, see the file `train_covid19.py`.  This can be run from the command line or imported and you can call the `train_covid_model` function.


In [None]:
from train_covid19 import train_covid_models

In [None]:

# NOTE: calling this train method will attempt to train 4 different models on the dataset 
#       specified at the dataset_root variables.  This could take some time to run, so only uncomment 
#      if you really need to

#train_covid_models(f'{dataset_root}')

The output of the training will be the following:

* The best model as determine by the lowest validation loss.

* The last model executed at the end of the Epochs.  Keep in mind that the best model is not always the last one trained

* A chart showing loss and accuracy over the training Epochs.

### Predicting on the Kaggle Test Dataset

Understanding how the model works against the training/testing set is the first step but how it performs on unseen data is the real test.

Unfortunately - we do not have a large collection of COVID-19 Chest X-Rays images because we had to use all of them for training.  When the Github repo can collect 100 COVID-19 images, it would be good to hold some back for final validation.

Instead, what we can do is test that the model will correctly predict the normal and pneumonia X-Rays from Kaggle as NOT COVID-19.

The `predict_with_model` function takes the path to a single model, and the path to the non-COVID-19 chest X-Ray images and will make predictions and show the overall results.


In [1]:
from predict_covid19 import predict_with_model

In [2]:
model_path = './models/best-resnet50v2-0319-model.h5'
normal_data_path = '/Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/NORMAL'

In [3]:
predict_with_model(model_path, normal_data_path)

[COVID, NORMAL]
[[0.5 0.5]
 [0.  1. ]
 [0.1 0.9]
 [0.7 0.3]
 [0.  1. ]
 [0.  1. ]
 [0.4 0.6]
 [0.7 0.3]
 [0.9 0.1]
 [1.  0. ]
 [1.  0. ]
 [0.9 0.1]
 [0.9 0.1]
 [0.5 0.5]
 [0.  1. ]
 [0.9 0.1]
 [0.6 0.4]
 [0.  1. ]
 [0.1 0.9]
 [0.8 0.2]
 [0.  1. ]
 [0.9 0.1]
 [0.1 0.9]
 [0.2 0.8]
 [0.9 0.1]
 [1.  0. ]
 [0.1 0.9]
 [0.8 0.2]
 [0.1 0.9]
 [0.9 0.1]
 [0.9 0.1]
 [0.2 0.8]
 [0.9 0.1]
 [1.  0. ]
 [1.  0. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.2 0.8]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.4 0.6]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.6 0.4]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.7

## Results

#### Precision

What percentage of the predictions where correct?  

precision = TP/(TP+FP)

#### Recall

What percentage of the positive cases (actual COVID-19) did you catch?

recall = TP/(TP+FN)

#### f1-score

What percentage of positive (COVID-19) predictions were correct?

The F1 score is a weighted harmonic mean of precision and recall such that the best score is 1.0 and the worst is 0.0.

f1-score = (2 * precision * recall)/(precision + recall)

#### support

Is the number of instances of each class. 

#### Confusion Matrix

Matrix of Actual outcomes vs Predicted outcomes.

As side note, the 0 and 1 in the confusion matrix is not an index, but the predicted value of COVID or not.

```text
                  Covid
                  Predicted
                 1   0 
Covid       1 [[14  0]
Actual      0 [ 1  13]]


                Covid
             Predicted
             1   0 
Covid    1 [[TP  FN]
Actual   0 [ FP  TN]]

```

Using VGG16 numbers for COVID:

- precision = TP/(TP+FP) = 14/(14+1) = 0.93

- recall = TP/(TP+FN) = 14/(14+0) = 1.0 


#### sensitivity, aka Recall or True Positive Rate ( COVID-19 Positive )

sensitivity = TP/(TP + FN)

How often was the model correct when the model predicted COVID-19.

The percentage of positive COVID-19 cases that are correctly identified.  For example, if the sensitivity is 80%, then 20% of the time the model predicted COVID-19, when the X-Ray did not a COVID-19 patient.

20% of the time the model falsely predicted COVID-19 and was actually NOT COVID-19.


#### specificity, True Negative Rate ( COVID-19 Negative )

specificity = TN/(TN+FP)

Measures the proportion of actual negatives that are correctly identified.  In other words the percentage of healthy people that the model accuractely stated did not have COVID-19.  For example, if the specificity is 80%, that means 20% of the time the model was wrong to classify the X-Ray as negative, when in fact it was a positive COVID-19.  

20% of the time the model falsely predicted Healthy and was actually COVID-19.



### VGG16

```text
Epoch 00025: val_loss improved from 0.10472 to 0.10365, saving model to ./models/best-vgg16-0319-model.h5
13/13 - 44s - loss: 0.1071 - accuracy: 0.9709 - val_loss: 0.1037 - val_accuracy: 0.9643

[INFO] evaluating network...
              precision    recall  f1-score   support

       covid       0.93      1.00      0.97        14
      normal       1.00      0.93      0.96        14

    accuracy                           0.96        28
   macro avg       0.97      0.96      0.96        28
weighted avg       0.97      0.96      0.96        28

[[14  0]
 [ 1 13]]
acc: 0.9643
sensitivity: 1.0000
specificity: 0.9286

Finished Model: vgg16 took 1107.2530229091644 seconds

```

![vgg16](./notebook_images/vgg16-0319-plot.png)

### VGG19


```text
Epoch 00025: val_loss did not improve from 0.14942
13/13 - 54s - loss: 0.1701 - accuracy: 0.9612 - val_loss: 0.2053 - val_accuracy: 0.9643

[INFO] evaluating network...
              precision    recall  f1-score   support

       covid       0.93      0.93      0.93        14
      normal       0.93      0.93      0.93        14

    accuracy                           0.93        28
   macro avg       0.93      0.93      0.93        28
weighted avg       0.93      0.93      0.93        28

[[13  1]
 [ 1 13]]
acc: 0.9286
sensitivity: 0.9286
specificity: 0.9286

Finished Model: vgg19 took 1340.9297320842743 seconds

```

![vgg19](./notebook_images/vgg19-0319-plot.png)

### ResNet50


```text
Epoch 00025: val_loss did not improve from 0.79014
13/13 - 27s - loss: 0.0108 - accuracy: 1.0000 - val_loss: 0.8588 - val_accuracy: 0.5000
  'precision', 'predicted', average, warn_for)
              precision    recall  f1-score   support

       covid       0.50      1.00      0.67        14
      normal       0.00      0.00      0.00        14

    accuracy                           0.50        28
   macro avg       0.25      0.50      0.33        28
weighted avg       0.25      0.50      0.33        28

[[14  0]
 [14  0]]
acc: 0.5000
sensitivity: 1.0000
specificity: 0.0000

Finished Model: resnet50 took 677.8684051036835 seconds

```

![rn](./notebook_images/resnet50-0319-plot.png)

### ResNet50V2


```text
Epoch 00025: val_loss did not improve from 0.07534
13/13 - 23s - loss: 0.1394 - accuracy: 0.9417 - val_loss: 0.7662 - val_accuracy: 0.8214
[INFO] evaluating network...
              precision    recall  f1-score   support

       covid       1.00      0.93      0.96        14
      normal       0.93      1.00      0.97        14

    accuracy                           0.96        28
   macro avg       0.97      0.96      0.96        28
weighted avg       0.97      0.96      0.96        28

[[13  1]
 [ 0 14]]
acc: 0.9643
sensitivity: 0.9286
specificity: 1.0000

Finished Model: resnet50v2 took 584.7564489841461 seconds
```

![rnv2](./notebook_images/resnet50v2-0319-plot.png)

### Results Summary

In [7]:
import pandas as pd
results_df = pd.DataFrame(data=[[0.9642857142857143, 1.0, 0.9285714285714286, 'vgg16'],
                    [0.9285714285714286, 0.9285714285714286, 0.9285714285714286, 'vgg19'],
                    [0.5, 1.0, 0.0, 'resnet50'],
                    [0.9642857142857143, 0.9285714285714286, 1.0, 'resnet50v2']],
 columns=["accuracy", "sensitivity","specificity","model name"]

)

In [8]:
results_df.head()

Unnamed: 0,accuracy,sensitivity,specificity,model name
0,0.964286,1.0,0.928571,vgg16
1,0.928571,0.928571,0.928571,vgg19
2,0.5,1.0,0.0,resnet50
3,0.964286,0.928571,1.0,resnet50v2


Reviewing the results it is clear we cn remove `resnet50` but `vgg16` and `resnet50v2` are good candidates.


### Predicting on the Kaggle Test Dataset

Understanding how the model works against the training/testing set is the first step but how it performs on unseen data is the real test.

Unfortunately - we do not have a large collection of COVID-19 Chest X-Rays images because we had to use all of them for training.  When the Github repo can collect 100 COVID-19 images, it would be good to hold some back for final validation.

Instead, what we can do is test that the model will correctly predict the normal and pneumonia X-Rays from Kaggle as NOT COVID-19.

The `predict_with_model` function takes the path to a single model, and the path to the non-COVID-19 chest X-Ray images and will make predictions and show the overall results.

Below is an example of running the best `resnet50v2` model against the NORMAL Kaggle Chest X-Rays.  You can do this for each of the models and both for NORMAL AND PNEUMONIA.

In [9]:
from predict_covid19 import predict_with_model

In [10]:
model_path = './models/best-resnet50v2-0319-model.h5'
normal_data_path = '/Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/NORMAL'

In [11]:
predict_with_model(model_path, normal_data_path)

[COVID, NORMAL]
[[0.7 0.3]
 [0.  1. ]
 [0.2 0.8]
 [0.8 0.2]
 [0.  1. ]
 [0.  1. ]
 [0.7 0.3]
 [0.9 0.1]
 [1.  0. ]
 [0.9 0.1]
 [0.9 0.1]
 [0.9 0.1]
 [0.9 0.1]
 [0.7 0.3]
 [0.  1. ]
 [0.9 0.1]
 [0.9 0.1]
 [0.1 0.9]
 [0.3 0.7]
 [0.9 0.1]
 [0.  1. ]
 [0.8 0.2]
 [0.3 0.7]
 [0.3 0.7]
 [0.8 0.2]
 [1.  0. ]
 [0.1 0.9]
 [0.9 0.1]
 [0.1 0.9]
 [1.  0. ]
 [0.9 0.1]
 [0.4 0.6]
 [0.9 0.1]
 [1.  0. ]
 [1.  0. ]
 [0.  1. ]
 [0.1 0.9]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.3 0.7]
 [0.1 0.9]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.1 0.9]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.3 0.7]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.6 0.4]
 [0.  1. ]
 [0.1 0.9]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.6 0.4]
 [0.  1. ]
 [0.1 0.9]
 [0.  1. ]
 [0.  1. ]
 [0.  1. ]
 [0.7

#### Summary

```text
PREDICT NORMAL

Model: ./models/best-vgg16-0319-model.h5
Dataset: /Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/NORMAL
Normal Accuracy: 0.7649572649572649

Model: ./models/best-vgg19-0319-model.h5
Dataset: /Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/NORMAL
Normal Accuracy: 0.7991452991452992

Model: ./models/best-resnet50-0319-model.h5
Dataset: /Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/NORMAL
Normal Accuracy: 0.0

Model: ./models/best-resnet50v2-0319-model.h5
Dataset: /Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/NORMAL
Normal Accuracy: 0.8974358974358975

PREDICT PNEUMONIA

Model: ./models/best-vgg16-0319-model.h5
Dataset: /Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/PNEUMONIA
PNEUMONIA Accuracy: 0.5615384615384615

Model: ./models/best-vgg19-0319-model.h5
Dataset: /Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/PNEUMONIA
PNEUMONIA Accuracy: 0.382051282051282

Model: ./models/best-resnet50-0319-model.h5
Dataset: /Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/PNEUMONIA
PNEUMONIA Accuracy: 0.0

Model: ./models/best-resnet50v2-0319-model.h5
Dataset: /Volumes/MacBackup/kaggle-chest-x-ray-images/chest_xray/test/PNEUMONIA
PNEUMONIA Accuracy: 0.9179487179487179

```

Looking at the results above, you can see that the best overall model was the `resnet50v2`.  

This seems a little surprising just using the Train/Test Validation metrics.