# Notebook to train a model to diagnose thoracic pathology from chest X-rays

The purpose of this Jupyter notebook is to demonstrate how we can build a AI-based Radiologist system using Apache Spark and Analytics Zoo to detect pneumonia and other diseases from chest x-ray images. The X-rays are made available by the United States’ National Institutes of Health (NIH). The dataset contains over 120,000 images of frontal chest x-rays, each potentially labeled with one or more of fourteen different thoracic pathologies. We show how to build a multi-label image classification model in a distributed Apache Spark infrastructure, and demonstrate how to build complex image transformations and deep learning pipelines using Analytics Zoo with scalability and ease of use.

For instructions on prerequisites for this notebook, refer to the GitHub readme.

## Import the required packages
The following modules are for this notebook.

In [1]:
import warnings
# Ignoring the warnings to improve readability of the notebook
warnings.filterwarnings("ignore", message="numpy.dtype size changed")

import random
import time
import numpy as np
from math import ceil
#from bigdl.dllib.optim.optimizer import SGD, SequentialSchedule, Warmup, Poly,\Plateau, EveryEpoch, 
#TrainSummary,\ValidationSummary, SeveralIteration, Step
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.sql import SparkSession, SQLContext
from pyspark.sql.functions import col, udf
from pyspark.sql.types import DoubleType
from pyspark.storagelevel import StorageLevel
from bigdl.dllib.nncontext import *
from bigdl.dllib.feature.image.imagePreprocessing import *
from bigdl.dllib.feature.common import ChainedPreprocessing
from bigdl.dllib.keras.layers import Input, Flatten, Dense, GlobalAveragePooling2D, Dropout
from bigdl.dllib.keras.metrics import AUC
from bigdl.dllib.keras.optimizers import Adam
from bigdl.dllib.keras.models import Model
from bigdl.dllib.net.net_load import Net
from bigdl.dllib.nnframes import NNEstimator, NNImageReader
from bigdl.dllib.keras.objectives import BinaryCrossEntropy
from pyspark.sql.types import StringType, ArrayType
import matplotlib.pyplot as plt



## Transfer learning and loading pre-trained models
We use transfer learning for training the model. In the following cell, we show how to load a pre-trained Inception, ResNet-50, VGG, and a DenseNet model. These models are pre-trained with ImageNet dataset and are available [here](https://analytics-zoo.github.io/0.4.0/#ProgrammingGuide/image-classification/). Only one of the models is used in the actual training. You can switch between the different models below by calling the appropritate function to  see how they perform.

*get_resent_model* function below is used to load an __ResNet-50__ Model. The function accepts two parameters:
- *model_path* - This is the path in your HDFS where the model pretrained model is located
- *label_length* - This is the number of labels for a given task. For this exercise, the Xrays can have 14 diseases. *label-length* is always 14.

The function does the following:
-  *Net.load_bigdl()* - loads a BigDL model. _Net_ package can be used to load models from other frameworks like Caffe, Torch and TensorFlow. This returns a _Model_.
- *new_graph()* removes layers after "pool5"
- *Input()* creates a new layer for the Xray images. The images are resized to 224x224 and have three channels
- The input layer is added to the model using *to_keras*
- We then flatten the neural network, add dropout and apply regularization

    

In [2]:
from bigdl.dllib.keras.layers import *

# Function to load a ResNet50 model
def build_model(label_length):
    model = Sequential()
    model.add(Conv2D(32, 3, 3, input_shape=(3, 224, 224)))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=[2, 2]))

    model.add(Conv2D(32, 3, 3))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=[2, 2]))

    model.add(Conv2D(64, 3, 3))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=[2, 2]))

    model.add(Flatten())
    model.add(Dense(64))
    model.add(Activation("relu"))
    model.add(Dropout(0.5))
    model.add(Dense(label_length, activation="sigmoid"))
    #model.add(Activation("sigmoid"))

    return model


## Calculate the AUC-ROC for a disease

The following function calculates the ROC for disease *k*. We use ML Pipeline *BinaryClassificationEvaluator* for this.

In [3]:
def get_auc_for_kth_class(k, df, label_col="label", prediction_col="prediction"):
    get_Kth = udf(lambda a: a[k], DoubleType())
    extracted_df = df.withColumn("kth_label", get_Kth(col(label_col))) \
        .withColumn("kth_prediction", get_Kth(col(prediction_col))) \
        .select('kth_label', 'kth_prediction')
    roc_score = BinaryClassificationEvaluator(rawPredictionCol='kth_prediction',
                                              labelCol='kth_label', metricName="areaUnderROC").evaluate(extracted_df)
    return roc_score

## Evaluating the model and plot AUC 

In [4]:
def evaluate_and_plot(testDF):
    predictionDF = nnModel.transform(testDF).persist(storageLevel=StorageLevel.DISK_ONLY)
    label_texts= ["Atelectasis", "Cardiomegaly", "Effusion", "Infiltration", "Mass", "Nodule", "Pneumonia",
                   "Pneumothorax", "Consolidation", "Edema", "Emphysema", "Fibrosis", "Pleural_Thickening", "Hernia"]
    label_map = {k: v for v, k in enumerate(label_texts)}
    chexnet_order = ["Atelectasis", "Cardiomegaly", "Effusion", "Infiltration", "Mass", "Nodule", "Pneumonia", "Pneumothorax", "Consolidation",
     "Edema", "Emphysema", "Fibrosis", "Pleural_Thickening", "Hernia"]
    total_auc = 0.0
    roc_auc_label =dict()
    for i in chexnet_order:
        roc_score = get_auc_for_kth_class(label_map[i], predictionDF)
        total_auc += roc_score
        print('{:>12} {:>25} {:>5} {:<20}'.format('ROC score for ', i, ' is: ', roc_score))
        roc_auc_label[i]=(roc_score)
    print("Average AUC: ", total_auc / float(label_length))

## Main program


In [5]:
random.seed(1234)
batch_size = 12 #1024 
num_epoch = 15
# 
#    model_path - Path for the pre-trained model file, data and the location to save the model after training. 
#                 The model path must match the function you are calling (ResNet-50, VGG or DenseNet)
#    image_path - Path to all images
#    label_path - Path to the label file (Data_Entry_2017.csv) available from NIH
#    save_path = Path to save the model and intermediate results 
image_path = "/opt/application/data/output"
kaggle_path = "/opt/application/data_kaggle"
label_path = "/opt/application/data"
model_path = "/opt/application/data/model" 

# Get Spark Context
sparkConf = create_spark_conf().setAppName("Chest X-ray Training")
sc = init_nncontext(sparkConf)
spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()

# Make sure the batchsize is a multiple of (Number of executors * Number of cores)
numexecutors = len(sc._jsc.sc().statusTracker().getExecutorInfos()) - 1
numcores = int(sc.getConf().get('spark.executor.cores','1'))

print("Number of Executors = " +str(numexecutors))
print("Number of Cores = " + str(numcores))
print("Batch Size = " + str(batch_size))
#if batch_size%(numexecutors*numcores)==0:
#    print("Batchsize is a multiple of (Number of Executors * Number of cores. Good to proceed")
#else:
#    print("Batchsize is NOT a multiple of (Number of Executors * Number of cores). Do not proceed !")

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


2022-04-08 11:14:17 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-04-08 11:14:18,781 Thread-4 WARN The bufferSize is set to 4000 but bufferedIo is false: false
2022-04-08 11:14:18,783 Thread-4 WARN The bufferSize is set to 4000 but bufferedIo is false: false
2022-04-08 11:14:18,784 Thread-4 WARN The bufferSize is set to 4000 but bufferedIo is false: false
2022-04-08 11:14:18,784 Thread-4 WARN The bufferSize is set to 4000 but bufferedIo is false: false
22-04-08 11:14:18 [Thread-4] INFO  Engine$:121 - Auto detect executor number and executor cores number
22-04-08 11:14:18 [Thread-4] INFO  Engine$:123 - Executor number is 1 and executor cores number is 6
22-04-08 11:14:19 [Thread-4] INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 15
2022-04-08 11:14:19 WARN  SparkContext:69 - Using an existing SparkContext; some configuration may not take effect.
22-04-08 11:14:19 [Thread-4] INFO  Engine$:446 -

## Load the data
We then load the dataset. NIH has __[released](https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community)__ the chest xray has two sets (training and test). We have created a [notebook](ConvertXray-ConvertImages.ipynb) to read the Xray images from NIH an save them as training and test datasets (in two folders /trainingDF and /testDF). In the below code, we read these dataframes and combine them to a single Spark dataframe. We then sploy them to the actual training and validation Dataframes for our model. We use *ramdomSplit* to split the data.



In [6]:

label_texts = ["Atelectasis", "Cardiomegaly", "Effusion", "Infiltration", "Mass", "Nodule", "Pneumonia", "Pneumothorax","Consolidation", "Edema", "Emphysema", "Fibrosis", "Pleural_Thickening", "Hernia"]


label_map = {k: v for v, k in enumerate(label_texts)}

label_length = len(label_texts)

def text_to_label(text):
    arr = [0.0] * len(label_texts)
    for l in text.split("|"):
        if l != "No Finding":
            arr[label_map[l]] = 1.0
    return arr

getLabel = udf(lambda x: text_to_label(x), ArrayType(DoubleType()))
getName = udf(lambda row: os.path.basename(row[0]), StringType())
imageDF = NNImageReader.readImages(image_path, sc, resizeH=256, resizeW=256, image_codec=1) \
    .withColumn("Image_Index", getName(col('image')))

labelDF = spark.read.load(label_path + "/Data_Entry_2017_v2020.csv", format="csv", sep=",", inferSchema="true", header="true") \
        .select("Image_Index", "Finding_Labels") \
        .withColumn("label", getLabel(col('Finding_Labels'))) \
        #.withColumnRenamed('Image Index', 'Image_Index')

totalDF = imageDF.join(labelDF, on="Image_Index", how="inner")
#.withColumnRenamed("Finding Labels", "Finding_Labels")

#(trainingDF, validationDF) = totalDF.randomSplit([0.8, 0.2])
trainingDF=totalDF
validationDF=totalDF
print("Number of training images: ", trainingDF.count())
print("Number of validation images: ", validationDF.count())

                                                                                

Number of training images:  12
Number of validation images:  12


In [None]:
from pyspark.sql.functions import lit,array

label_texts = ["bacteria", "Normal", "virus"]

label_map = {k: v for v, k in enumerate(label_texts)}

label_length = len(label_texts)

test_path = kaggle_path + "/Test"
train_path = kaggle_path + "/Train"
val_path = kaggle_path + "/Val"

trainingDF_bacteria=NNImageReader.readImages(train_path+ "/bacteria", sc, resizeH=256, resizeW=256, image_codec=1) \
                    .withColumn('label', array(lit(1.0), lit(0.0), lit(0.0)))
trainingDF_normal=NNImageReader.readImages(train_path+ "/Normal", sc, resizeH=256, resizeW=256, image_codec=1) \
                    .withColumn('label', array(lit(0.0), lit(1.0), lit(0.0)))
trainingDF_virus=NNImageReader.readImages(train_path+ "/virus", sc, resizeH=256, resizeW=256, image_codec=1) \
                    .withColumn('label', array(lit(0.0), lit(0.0), lit(1.0)))


trainingDF = trainingDF_bacteria.union(trainingDF_normal).union(trainingDF_virus)

validationDF_bacteria=NNImageReader.readImages(val_path+ "/bacteria", sc, resizeH=256, resizeW=256, image_codec=1) \
                        .withColumn('label', array(lit(1.0), lit(0.0), lit(0.0)))
validationDF_normal=NNImageReader.readImages(val_path+ "/Normal", sc, resizeH=256, resizeW=256, image_codec=1) \
                    .withColumn('label', array(lit(0.0), lit(1.0), lit(0.0)))
validationDF_virus=NNImageReader.readImages(val_path+ "/virus", sc, resizeH=256, resizeW=256, image_codec=1) \
                    .withColumn('label', array(lit(0.0), lit(0.0), lit(1.0)))

validationDF = validationDF_bacteria.union(validationDF_normal).union(validationDF_virus)

print("Number of training images: ", trainingDF.count())
print("Number of validation images: ", validationDF.count())

[Stage 10:>                                                         (0 + 8) / 8]

In [None]:
label_texts

def evaluate_and_plot(testDF):
    predictionDF = nnModel.transform(testDF).persist(storageLevel=StorageLevel.DISK_ONLY)
    total_auc = 0.0
    roc_auc_label =dict()
    for i in label_texts:
        roc_score = get_auc_for_kth_class(label_map[i], predictionDF)
        total_auc += roc_score
        print('{:>12} {:>25} {:>5} {:<20}'.format('ROC score for ', i, ' is: ', roc_score))
        roc_auc_label[i]=(roc_score)
    print("Average AUC: ", total_auc / float(label_length))

## Load the pre-trained model and optimiser
We first load the pre-trained model. We use ResNet in the below example. It can be changed to any of the above defined models. We then load the optimiser

In [None]:
# Load the pretrained model
#xray_model = get_resnet_model(model_path, label_length)
xray_model = build_model(label_length)


## Image pre-processing

We build *ChainedPreprocessing* to combine the following preprocessing.
- *RowToImageFeature* - converts a Spark row to a BigDL ImageFeature
- *ImageCenterCrop* - resizes the image to 224 x 224
- *ImageHFlip* - Randomly flips 50% of the image horizontally
- *ImageBrightness* - Randomly adjust the brigthness of 50% of the images
- *ImageChannelNormalize* - Normalize the images by subtracting the mean value of the ImageNet images

In [None]:
transformer = ChainedPreprocessing(
            [RowToImageFeature(), ImageCenterCrop(224, 224), ImageRandomPreprocessing(ImageHFlip(), 0.5),
             ImageRandomPreprocessing(ImageBrightness(0.0, 32.0), 0.5),
             ImageChannelNormalize(123.68, 116.779, 103.939), ImageMatToTensor(), ImageFeatureToTensor()])

### Define the Classifier

In [12]:
batch_size = 24 #1024 
num_epoch = 1

classifier = NNEstimator(xray_model, BinaryCrossEntropy(), transformer) \
            .setBatchSize(batch_size) \
            .setMaxEpoch(num_epoch) \
            .setFeaturesCol("image") \
            .setCachingSample(False) \
            .setValidation(EveryEpoch(), validationDF, [AUC()], batch_size) \
            .setOptimMethod(Adam())


creating: createZooKerasBinaryCrossEntropy
creating: createSeqToTensor
creating: createFeatureLabelPreprocessing
creating: createNNEstimator
creating: createEveryEpoch
creating: createAUC
creating: createAdam


### Train the model

In [13]:
%%time
nnModel = classifier.fit(trainingDF)

22-04-08 11:03:06 [Thread-4] INFO  InternalDistriOptimizer$:987 - Sequential[cf42e75] isTorch is false
22-04-08 11:03:06 [Thread-4] INFO  DistriOptimizer$:826 - caching training rdd ...


                                                                                

22-04-08 11:05:12 [Thread-4] INFO  DistriOptimizer$:652 - Cache thread models...
22-04-08 11:05:12 [Executor task launch worker for task 0.0 in stage 21.0 (TID 42)] INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 68
22-04-08 11:05:12 [Executor task launch worker for task 0.0 in stage 21.0 (TID 42)] INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 68
22-04-08 11:05:12 [Executor task launch worker for task 0.0 in stage 21.0 (TID 42)] INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 68
22-04-08 11:05:12 [Executor task launch worker for task 0.0 in stage 21.0 (TID 42)] INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 68
22-04-08 11:05:12 [Executor task launch worker for task 0.0 in stage 21.0 (TID 42)] INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 68
22-04-08 11:05:12 [Executor task launch worker for task 0.0 in stage 21.0 (TID 42)] INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 68
22-04-08 11:05:12 [Executor task launch worker for task 0.0 in stage 21.0

                                                                                

22-04-08 11:05:18 [Thread-4] INFO  DistriOptimizer$:168 - Count dataset complete. Time elapsed: 5.9276869s
22-04-08 11:05:18 [Thread-4] INFO  DistriOptimizer$:176 - config  {
	computeThresholdbatchSize: 100
	maxDropPercentage: 0.0
	warmupIterationNum: 200
	isLayerwiseScaled: false
	dropPercentage: 0.0
 }
22-04-08 11:05:18 [Thread-4] INFO  DistriOptimizer$:180 - Shuffle data
22-04-08 11:05:18 [Thread-4] INFO  DistriOptimizer$:183 - Shuffle data complete. Takes 0.0064058s


                                                                                

22-04-08 11:05:20 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 12/3900][Iteration 1][Wall Clock 1.7215938s] Trained 12.0 records in 1.7215938 seconds. Throughput is 6.970285 records/second. Loss is 9.7834425. 
22-04-08 11:05:20 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 24/3900][Iteration 2][Wall Clock 2.062314s] Trained 12.0 records in 0.3407202 seconds. Throughput is 35.219513 records/second. Loss is 16.885624. 


                                                                                

22-04-08 11:05:21 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 36/3900][Iteration 3][Wall Clock 2.8241109s] Trained 12.0 records in 0.7617969 seconds. Throughput is 15.752231 records/second. Loss is 10.745397. 


                                                                                

22-04-08 11:05:49 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 48/3900][Iteration 4][Wall Clock 31.1618067s] Trained 12.0 records in 28.3376958 seconds. Throughput is 0.4234642 records/second. Loss is 10.745397. 
22-04-08 11:05:50 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 60/3900][Iteration 5][Wall Clock 31.4708014s] Trained 12.0 records in 0.3089947 seconds. Throughput is 38.835617 records/second. Loss is 16.118095. 
22-04-08 11:05:50 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 72/3900][Iteration 6][Wall Clock 31.7597966s] Trained 12.0 records in 0.2889952 seconds. Throughput is 41.523182 records/second. Loss is 17.653152. 
22-04-08 11:05:50 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 84/3900][Iteration 7][Wall Clock 32.074583s] Trained 12.0 records in 0.3147864 seconds. Throughput is 38.121086 records/second. Loss is 11.512925. 
22-04-08 11:05:51 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 96/3900][Iteration 8][Wall Clock 32.5133858s] Trained 12.0 records in

                                                                                

22-04-08 11:05:53 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 168/3900][Iteration 14][Wall Clock 34.7891522s] Trained 12.0 records in 0.7996682 seconds. Throughput is 15.006224 records/second. Loss is 14.583039. 
22-04-08 11:05:53 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 180/3900][Iteration 15][Wall Clock 35.0875588s] Trained 12.0 records in 0.2984066 seconds. Throughput is 40.21359 records/second. Loss is 16.339521. 
22-04-08 11:05:54 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 192/3900][Iteration 16][Wall Clock 35.3626013s] Trained 12.0 records in 0.2750425 seconds. Throughput is 43.62962 records/second. Loss is 16.885624. 
22-04-08 11:05:54 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 204/3900][Iteration 17][Wall Clock 35.6648523s] Trained 12.0 records in 0.302251 seconds. Throughput is 39.7021 records/second. Loss is 16.182451. 
22-04-08 11:05:54 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 216/3900][Iteration 18][Wall Clock 35.940178s] Trained 12.0 record

                                                                                

22-04-08 11:05:56 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 288/3900][Iteration 24][Wall Clock 38.0497952s] Trained 12.0 records in 0.7597751 seconds. Throughput is 15.7941475 records/second. Loss is 13.815511. 
22-04-08 11:05:57 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 300/3900][Iteration 25][Wall Clock 38.316356s] Trained 12.0 records in 0.2665608 seconds. Throughput is 45.01787 records/second. Loss is 14.583039. 
22-04-08 11:05:57 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 312/3900][Iteration 26][Wall Clock 38.5764653s] Trained 12.0 records in 0.2601093 seconds. Throughput is 46.134453 records/second. Loss is 22.492996. 
22-04-08 11:05:57 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 324/3900][Iteration 27][Wall Clock 38.8615456s] Trained 12.0 records in 0.2850803 seconds. Throughput is 42.093403 records/second. Loss is 16.885626. 
22-04-08 11:05:57 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 336/3900][Iteration 28][Wall Clock 39.1327066s] Trained 12.0 r

                                                                                

22-04-08 11:06:00 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 348/3900][Iteration 29][Wall Clock 41.8896665s] Trained 12.0 records in 2.7569599 seconds. Throughput is 4.3526206 records/second. Loss is 7.675284. 
22-04-08 11:06:00 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 360/3900][Iteration 30][Wall Clock 42.1624704s] Trained 12.0 records in 0.2728039 seconds. Throughput is 43.98764 records/second. Loss is 17.653152. 
22-04-08 11:06:01 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 372/3900][Iteration 31][Wall Clock 42.4720106s] Trained 12.0 records in 0.3095402 seconds. Throughput is 38.767178 records/second. Loss is 14.583039. 
22-04-08 11:06:01 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 384/3900][Iteration 32][Wall Clock 42.737019s] Trained 12.0 records in 0.2650084 seconds. Throughput is 45.281586 records/second. Loss is 12.280454. 
22-04-08 11:06:01 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 396/3900][Iteration 33][Wall Clock 43.0410556s] Trained 12.0 rec

                                                                                

22-04-08 11:07:01 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 2748/3900][Iteration 229][Wall Clock 102.1760339s] Trained 12.0 records in 3.2080689 seconds. Throughput is 3.7405682 records/second. Loss is 15.350568. 
22-04-08 11:07:01 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 2760/3900][Iteration 230][Wall Clock 102.4503546s] Trained 12.0 records in 0.2743207 seconds. Throughput is 43.744423 records/second. Loss is 16.885624. 
22-04-08 11:07:01 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 2772/3900][Iteration 231][Wall Clock 102.7226547s] Trained 12.0 records in 0.2723001 seconds. Throughput is 44.069027 records/second. Loss is 16.885626. 
22-04-08 11:07:01 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 2784/3900][Iteration 232][Wall Clock 103.0156206s] Trained 12.0 records in 0.2929659 seconds. Throughput is 40.9604 records/second. Loss is 14.583039. 
22-04-08 11:07:02 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 2796/3900][Iteration 233][Wall Clock 103.3213831s] 

                                                                                

22-04-08 11:07:15 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3264/3900][Iteration 272][Wall Clock 116.5966883s] Trained 12.0 records in 2.2240936 seconds. Throughput is 5.3954563 records/second. Loss is 12.280454. 
22-04-08 11:07:15 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3276/3900][Iteration 273][Wall Clock 116.8511102s] Trained 12.0 records in 0.2544219 seconds. Throughput is 47.165752 records/second. Loss is 13.105987. 
22-04-08 11:07:16 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3288/3900][Iteration 274][Wall Clock 117.1280989s] Trained 12.0 records in 0.2769887 seconds. Throughput is 43.32307 records/second. Loss is 13.815511. 
22-04-08 11:07:16 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3300/3900][Iteration 275][Wall Clock 117.3959143s] Trained 12.0 records in 0.2678154 seconds. Throughput is 44.80698 records/second. Loss is 10.745397. 
22-04-08 11:07:16 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3312/3900][Iteration 276][Wall Clock 117.6792156s] 

                                                                                

22-04-08 11:07:19 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3336/3900][Iteration 278][Wall Clock 120.3311104s] Trained 12.0 records in 2.2566261 seconds. Throughput is 5.317673 records/second. Loss is 12.280454. 
22-04-08 11:07:19 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3348/3900][Iteration 279][Wall Clock 120.6034611s] Trained 12.0 records in 0.2723507 seconds. Throughput is 44.060837 records/second. Loss is 14.583039. 
22-04-08 11:07:19 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3360/3900][Iteration 280][Wall Clock 120.8654045s] Trained 12.0 records in 0.2619434 seconds. Throughput is 45.811424 records/second. Loss is 13.815511. 
22-04-08 11:07:20 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3372/3900][Iteration 281][Wall Clock 121.1064084s] Trained 12.0 records in 0.2410039 seconds. Throughput is 49.791725 records/second. Loss is 13.815511. 
22-04-08 11:07:20 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 1 3384/3900][Iteration 282][Wall Clock 121.3504644s]

                                                                                

22-04-08 11:07:34 [Thread-4] INFO  DistriOptimizer$:178 - [Epoch 1 3900/3900][Iteration 325][Wall Clock 133.24563s] validate model throughput is 116.71296 records/second
22-04-08 11:07:34 [Thread-4] INFO  DistriOptimizer$:181 - [Epoch 1 3900/3900][Iteration 325][Wall Clock 133.24563s] AucScore is (Average score: 0.5, count: 279)
score for each class is:
0.5 
0.5 
0.5 

22-04-08 11:07:34 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 2 12/3900][Iteration 326][Wall Clock 133.6965396s] Trained 12.0 records in 0.2743947 seconds. Throughput is 43.732624 records/second. Loss is 9.977869. 
22-04-08 11:07:35 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 2 24/3900][Iteration 327][Wall Clock 133.9783702s] Trained 12.0 records in 0.2818306 seconds. Throughput is 42.578766 records/second. Loss is 8.9369135. 
22-04-08 11:07:35 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 2 36/3900][Iteration 328][Wall Clock 134.2820797s] Trained 12.0 records in 0.3037095 seconds. Throughput is 39.51144 records/s

                                                                                

22-04-08 11:08:24 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 2 2148/3900][Iteration 504][Wall Clock 183.3708562s] Trained 12.0 records in 2.2884894 seconds. Throughput is 5.2436337 records/second. Loss is 15.350568. 
22-04-08 11:08:24 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 2 2160/3900][Iteration 505][Wall Clock 183.6351102s] Trained 12.0 records in 0.264254 seconds. Throughput is 45.410854 records/second. Loss is 17.653152. 
22-04-08 11:08:25 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 2 2172/3900][Iteration 506][Wall Clock 183.9137321s] Trained 12.0 records in 0.2786219 seconds. Throughput is 43.06912 records/second. Loss is 14.583039. 
22-04-08 11:08:25 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 2 2184/3900][Iteration 507][Wall Clock 184.1692915s] Trained 12.0 records in 0.2555594 seconds. Throughput is 46.955814 records/second. Loss is 16.118095. 
22-04-08 11:08:25 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 2 2196/3900][Iteration 508][Wall Clock 184.4339325s] 

                                                                                

22-04-08 11:09:05 [Thread-4] INFO  DistriOptimizer$:178 - [Epoch 2 3900/3900][Iteration 650][Wall Clock 222.2850476s] validate model throughput is 125.376015 records/second
22-04-08 11:09:05 [Thread-4] INFO  DistriOptimizer$:181 - [Epoch 2 3900/3900][Iteration 650][Wall Clock 222.2850476s] AucScore is (Average score: 0.5, count: 279)
score for each class is:
0.5 
0.5 
0.5 

22-04-08 11:09:06 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 3 12/3900][Iteration 651][Wall Clock 225.0576635s] Trained 12.0 records in 0.2632454 seconds. Throughput is 45.584843 records/second. Loss is 10.745397. 
22-04-08 11:09:06 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 3 24/3900][Iteration 652][Wall Clock 225.327215s] Trained 12.0 records in 0.2695515 seconds. Throughput is 44.5184 records/second. Loss is 11.512925. 
22-04-08 11:09:06 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 3 36/3900][Iteration 653][Wall Clock 225.6219706s] Trained 12.0 records in 0.2947556 seconds. Throughput is 40.711693 recor

                                                                                

22-04-08 11:11:04 [Thread-4] INFO  DistriOptimizer$:178 - [Epoch 3 3900/3900][Iteration 975][Wall Clock 340.6279669s] validate model throughput is 121.763504 records/second
22-04-08 11:11:04 [Thread-4] INFO  DistriOptimizer$:181 - [Epoch 3 3900/3900][Iteration 975][Wall Clock 340.6279669s] AucScore is (Average score: 0.5, count: 279)
score for each class is:
0.5 
0.5 
0.5 

22-04-08 11:11:04 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 4 12/3900][Iteration 976][Wall Clock 343.3506747s] Trained 12.0 records in 0.3887617 seconds. Throughput is 30.867239 records/second. Loss is 13.815511. 
22-04-08 11:11:04 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 4 24/3900][Iteration 977][Wall Clock 343.7445055s] Trained 12.0 records in 0.3938308 seconds. Throughput is 30.469938 records/second. Loss is 8.442812. 
22-04-08 11:11:05 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 4 36/3900][Iteration 978][Wall Clock 344.1341199s] Trained 12.0 records in 0.3896144 seconds. Throughput is 30.799683 rec

KeyboardInterrupt: 

22-04-08 11:12:59 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 4 3804/3900][Iteration 1292][Wall Clock 457.9159226s] Trained 12.0 records in 0.3901018 seconds. Throughput is 30.761202 records/second. Loss is 13.815511. 
22-04-08 11:12:59 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 4 3816/3900][Iteration 1293][Wall Clock 458.2759996s] Trained 12.0 records in 0.360077 seconds. Throughput is 33.326206 records/second. Loss is 13.815511. 
22-04-08 11:12:59 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 4 3828/3900][Iteration 1294][Wall Clock 458.6269339s] Trained 12.0 records in 0.3509343 seconds. Throughput is 34.194435 records/second. Loss is 15.350568. 
22-04-08 11:13:00 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 4 3840/3900][Iteration 1295][Wall Clock 459.0125246s] Trained 12.0 records in 0.3855907 seconds. Throughput is 31.121082 records/second. Loss is 16.118095. 
22-04-08 11:13:00 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 4 3852/3900][Iteration 1296][Wall Clock 459.3737

                                                                                

22-04-08 11:13:04 [Thread-4] INFO  DistriOptimizer$:178 - [Epoch 4 3900/3900][Iteration 1300][Wall Clock 460.859985s] validate model throughput is 126.53436 records/second
22-04-08 11:13:04 [Thread-4] INFO  DistriOptimizer$:181 - [Epoch 4 3900/3900][Iteration 1300][Wall Clock 460.859985s] AucScore is (Average score: 0.5, count: 279)
score for each class is:
0.5 
0.5 
0.5 

22-04-08 11:13:04 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 5 12/3900][Iteration 1301][Wall Clock 463.6141068s] Trained 12.0 records in 0.3780632 seconds. Throughput is 31.740725 records/second. Loss is 12.337432. 
22-04-08 11:13:04 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 5 24/3900][Iteration 1302][Wall Clock 463.9764139s] Trained 12.0 records in 0.3623071 seconds. Throughput is 33.121075 records/second. Loss is 16.118095. 
22-04-08 11:13:05 [Thread-4] INFO  DistriOptimizer$:433 - [Epoch 5 36/3900][Iteration 1303][Wall Clock 464.3415027s] Trained 12.0 records in 0.3650888 seconds. Throughput is 32.868717 

### Evaluate the model and plot AUC accuracy for Validation Data

In [25]:
print("Evaluating the model on validation data:")
evaluate_and_plot(validationDF)

Evaluating the model on validation data:
22-04-08 10:14:34 [Thread-4] INFO  NNModel:730 - Batch per thread: 2; Total number of cores: 6; Global batch size: 12
22-04-08 10:14:35 [Executor task launch worker for task 0.0 in stage 257.0 (TID 149)] INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 768


                                                                                

ROC score for                Atelectasis  is:  0.0                 
ROC score for               Cardiomegaly  is:  0.5                 
ROC score for                   Effusion  is:  0.5                 
ROC score for               Infiltration  is:  0.5                 
ROC score for                       Mass  is:  0.0                 
ROC score for                     Nodule  is:  0.0                 
ROC score for                  Pneumonia  is:  0.0                 
ROC score for               Pneumothorax  is:  0.0                 
ROC score for              Consolidation  is:  0.0                 
ROC score for                      Edema  is:  0.0                 
ROC score for                  Emphysema  is:  0.5                 
ROC score for                   Fibrosis  is:  0.0                 
ROC score for         Pleural_Thickening  is:  0.0                 
ROC score for                     Hernia  is:  0.5                 
Average AUC:  0.17857142857142858


### Save the model for inference

In [27]:
save_path = model_path + '/xray_model_classif'
nnModel.model.saveModel(save_path + ".bigdl", save_path + ".bin", True)
print('Model saved at: ', save_path)

Model saved at:  /opt/application/data/model/xray_model_classif
