<a href="https://colab.research.google.com/github/williamsaraiva/cancer_diag_with_keras/blob/master/cancer_diag_with_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classificação de Objetos

[Classificação de Imagens](https://en.wikipedia.org/wiki/Computer_vision#Recognition) (ou Identificação de Imagens)um dos principais casos de uso para deep learning. O objetivo dessa tarefa é trainar um modelo capaz de identificr objetos de interesse em uma imagem. 

### Classificação de Melanoma

Neste notebook, construiremos um modelo para identificar melanomas malignos. 

<img src="https://github.com/williamsaraiva/cancer_diag_with_keras/blob/master/exemp-google.jpg?raw=true" width="700" height="450" align="center"/>

Nós usaremos o [ISIC-Archive -  The International Skin Imaging Collaboration: Melanoma Project ](https://www.isic-archive.com/#!/topWithHeader/onlyHeaderTop/gallery) dataset para treino. Com mais de 20000 imagens de melanomas benignos e mais de 2000 imagens para melanomas malignos. 

Utilizamos modelos pré treinados [Using Pre-Trained Models](https://keras.rstudio.com/articles/applications.html) e [E construiremos as camadas com o modelo VGG19](https://www.kaggle.com/keras/vgg19) para classificar as imagens como benigno ou maligno.


Nos iremos:
- Preprocessar imagens;
- construir novas camadas em cima do modelo VGG19 usando Keras e Tensorflow
- Estimar a eficiência  do nosso modelo com um conjunto de teste.



Vamos nessa! 🚀

## Setup Inicial

Vamos instalar e importar algumas bibliotecas importantes para o projeto. Como estamos utilizando a infra-estrutura do google colab vamos precisar montar um disco virtual com o drive para acessar os diretórios.

In [1]:
from google.colab import drive

drive.mount('/content/drive/', force_remount=True)


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive/


In [2]:
!pip install appdirs
!pip install cycler
!pip install decorator
!pip install h5py
!pip install pydot
!pip install Keras
!pip install matplotlib
!pip install networkx
!pip install numpy
!pip install olefile
!pip install packaging
!pip install Pillow
!pip install protobuf
!pip install pydevd
!pip install pydot
!pip install pyparsing
!pip install python-dateutil
!pip install pytz
!pip install PyWavelets
!pip install PyYAML
!pip install scikit-image
!pip install scipy
!pip install six
!pip install tensorflow
!pip install Theano
!pip install tqdm
!pip install Werkzeug


!apt install python-pydot python-pydot-ng graphviz

Collecting appdirs
  Downloading https://files.pythonhosted.org/packages/56/eb/810e700ed1349edde4cbdc1b2a21e28cdf115f9faf263f6bbf8447c1abf3/appdirs-1.4.3-py2.py3-none-any.whl
Installing collected packages: appdirs
Successfully installed appdirs-1.4.3
Collecting pydot
  Downloading https://files.pythonhosted.org/packages/50/da/68cee64ad379462abb743ffb665fa34b214df85d263565ad2bd512c2d935/pydot-1.3.0-py2.py3-none-any.whl
Installing collected packages: pydot
Successfully installed pydot-1.3.0
Collecting packaging
  Downloading https://files.pythonhosted.org/packages/89/d1/92e6df2e503a69df9faab187c684585f0136662c12bb1f36901d426f3fab/packaging-18.0-py2.py3-none-any.whl
Installing collected packages: packaging
Successfully installed packaging-18.0
Collecting pydevd
[?25l  Downloading https://files.pythonhosted.org/packages/fe/47/77aaa3552aa638cb01c397fe0938b42ff995f2e1bdacd1041fdea7a2fedb/pydevd-1.4.0.tar.gz (1.1MB)
[K    100% |████████████████████████████████| 1.1MB 18.5MB/s 
[?25hBuildin

In [0]:
import glob
import os
import sys
import random
import tqdm
import keras
import json

import numpy as np

import scipy.ndimage
import scipy.misc

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec


from collections import defaultdict
#from train import DataGenerator

from keras.layers import Input, Average
from keras.layers.core import Dense, Flatten, Dropout
from keras.layers.merge import Concatenate
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import GlobalAveragePooling2D, GlobalMaxPooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model, load_model





## Pré processamento dos dados

Lê os arquivos de imagem, decoda jpeg, altera o tamanho, e armazena em um arquivo npz para que essa operação não seja atuada mais que uma vez. 🙂

In [11]:
IMG_SIZE = (256, 256)

if __name__ == '__main__':
    
    data_dir = '/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/test'

    class_name_to_pixels = defaultdict(list)

    input_data = list(glob.glob(os.path.join(data_dir, '**/*.jpg')))

    for image_path in tqdm.tqdm(input_data):
        image_pixels = scipy.ndimage.imread(image_path)
        resized_image_pixels = scipy.misc.imresize(image_pixels, IMG_SIZE)
        image_basepath, _ = os.path.splitext(image_path)
        np.savez(image_basepath+'.npz', pixels=resized_image_pixels, compressed=True)


`imread` is deprecated in SciPy 1.0.0.
Use ``matplotlib.pyplot.imread`` instead.
  if sys.path[0] == '':
`imresize` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``skimage.transform.resize`` instead.
  del sys.path[0]

  0%|          | 1/431 [00:00<04:38,  1.55it/s][A
  0%|          | 2/431 [00:01<04:23,  1.63it/s][A
  1%|          | 3/431 [00:01<03:50,  1.86it/s][A
  1%|          | 4/431 [00:01<03:29,  2.04it/s][A
  1%|          | 5/431 [00:02<03:14,  2.19it/s][A
  1%|▏         | 6/431 [00:02<02:58,  2.38it/s][A
  2%|▏         | 7/431 [00:03<03:08,  2.25it/s][A
  2%|▏         | 8/431 [00:03<03:18,  2.14it/s][A
  2%|▏         | 9/431 [00:04<03:05,  2.27it/s][A
  2%|▏         | 10/431 [00:04<02:50,  2.47it/s][A
  3%|▎         | 11/431 [00:05<03:19,  2.10it/s][A
  3%|▎         | 12/431 [00:05<03:11,  2.18it/s][A
  3%|▎         | 13/431 [00:05<03:03,  2.27it/s][A
  3%|▎         | 14/431 [00:06<02:54,  2.38it/s][A
  3%|▎         | 15/431 [00:06<02:51,  2.42

## Data

O data-set ISIC-Melanoma-Project está anexado ao nosso projeto. Estará disponivel em `/content/drive/My Drive/Colab Notebooks/input/cancer/pics/`.

*Nota:* Todos os parametros estão configurados conforme o padrão sugerido pela documentação do  [Keras](https://keras.io/)

Construiremos um modelo([VGG19](https://keras.io/applications/#vgg19)), organizar e carregar os dados e rodar o modelo de treino.

*Nota:* O paramêtro `epochs` é a quantidade de vezes que a rede irá iterar por todo o data set. Note que com poucas iterações já se obtém um bom resultado.

In [7]:
def get_model(pretrained_model, all_classes_names):
    if pretrained_model == 'inception':
        model_base = keras.applications.inception_v3.InceptionV3(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
        output = Flatten()(model_base.output)
    elif pretrained_model == 'xception':
        model_base = keras.applications.xception.Xception(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
        output = Flatten()(model_base.output)
    elif pretrained_model == 'resnet50':
        model_base = keras.applications.resnet50.ResNet50(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
        output = Flatten()(model_base.output)
    elif pretrained_model == 'vgg19':
        model_base = keras.applications.vgg19.VGG19(include_top=False, input_shape=(*IMG_SIZE, 3), weights='imagenet')
        output = Flatten()(model_base.output)
    elif pretrained_model == 'all':
        input = Input(shape=(*IMG_SIZE, 3))
        inception_model = keras.applications.inception_v3.InceptionV3(include_top=False, input_tensor=input, weights='imagenet')
        xception_model = keras.applications.xception.Xception(include_top=False, input_tensor=input, weights='imagenet')
        resnet_model = keras.applications.resnet50.ResNet50(include_top=False, input_tensor=input, weights='imagenet')

        flattened_outputs = [Flatten()(inception_model.output),
                             Flatten()(xception_model.output),
                             Flatten()(resnet_model.output)]
        output = Concatenate()(flattened_outputs)
        model_base = Model(input, output)

    output = BatchNormalization()(output)
    output = Dropout(0.5)(output)
    output = Dense(128, activation='relu')(output)
    output = BatchNormalization()(output)
    output = Dropout(0.5)(output)
    output = Dense(len(all_classes_names), activation='softmax')(output)
    model = Model(model_base.input, output)
    for layer in model_base.layers:
        layer.trainable = False
    model.summary(line_length=200)

    # Generate a plot of a model
    import pydot
    pydot.find_graphviz = lambda: True
    from keras.utils import plot_model
    plot_model(model, show_shapes=True, to_file='/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/model_pdfs/{}.pdf'.format(pretrained_model))

    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

BATCH_SIZE = 64
IMG_SIZE = (256, 256)

image_datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=.15,
    height_shift_range=.15,
    shear_range=0.15,
    zoom_range=0.15,
    channel_shift_range=1,
    horizontal_flip=True,
    vertical_flip=False,)

class DataEncoder():
    def __init__(self, all_classes_names):
        self.all_classes_names = all_classes_names

    def one_hot_index(self, class_name):
        return self.all_classes_names.index(class_name)

    def one_hot_decode(self, predicted_labels):
        return dict(zip(self.all_classes_names, predicted_labels))

    def one_hot_encode(self, class_name):
        one_hot_encoded_vector = np.zeros(len(self.all_classes_names))
        idx = self.one_hot_index(class_name)
        one_hot_encoded_vector[idx] = 1
        return one_hot_encoded_vector


class DataGenerator():
    def __init__(self, data_path):
        self.data_path = data_path
        self.partition_to_class_name_to_npz_paths = {
            'train': defaultdict(list),
            'validation': defaultdict(list),
            'test': defaultdict(list),
        }
        self.all_classes_names = set()
        npz_file_listing = list(glob.glob(os.path.join(data_path, '**/*.npz')))
        for npz_path in npz_file_listing:
            class_name = os.path.basename(os.path.dirname(npz_path))
            self.all_classes_names.add(class_name)
            if hash(npz_path) % 10 < 7:
                partition = 'train'
            elif 7 <= hash(npz_path) % 10 < 9:
                partition = 'validation'
            elif 9 == hash(npz_path) % 10:
                partition = 'test'
            else:
                raise Exception("partition not assigned")
            self.partition_to_class_name_to_npz_paths[partition][class_name].append(npz_path)
        self.encoder = DataEncoder(sorted(list(self.all_classes_names)))


    def _pair_generator(self, partition, augmented=True):
        while True:
            for class_name, npz_paths in self.partition_to_class_name_to_npz_paths[partition].items():
                npz_path = random.choice(npz_paths)
                pixels = np.load(npz_path)['pixels']
                one_hot_encoded_labels = self.encoder.one_hot_encode(class_name)
                if augmented:
                    augmented_pixels = next(image_datagen.flow(np.array([pixels])))[0].astype(np.uint8)
                    yield augmented_pixels, one_hot_encoded_labels
                else:
                    yield pixels, one_hot_encoded_labels


    def batch_generator(self, partition, batch_size, augmented=True):
        while True:
            data_gen = self._pair_generator(partition, augmented)
            pixels_batch, one_hot_encoded_class_name_batch = zip(*[next(data_gen) for _ in range(batch_size)])
            pixels_batch = np.array(pixels_batch)
            one_hot_encoded_class_name_batch = np.array(one_hot_encoded_class_name_batch)
            yield pixels_batch, one_hot_encoded_class_name_batch


if __name__ == '__main__':

    pretrained_model = 'vgg19' #choices={'inception', 'xception', 'resnet50', 'all', 'vgg19'})
    data_dir = '/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/pics/'
    weight_directory = '/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/weight/'
    tensorboard_directory = '/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/logdir/'
    
    epochs = 1
    
    tensorboard_callback = keras.callbacks.TensorBoard(log_dir=tensorboard_directory, 
                                                       histogram_freq=0,
                                                       write_graph=True,
                                                       write_images=False)
    save_model_callback = keras.callbacks.ModelCheckpoint(os.path.join(weight_directory, 'weights.{epoch:02d}.h5'),
                                                          verbose=3,
                                                          save_best_only=False,
                                                          save_weights_only=False,
                                                          mode='auto',
                                                          period=1)

    data_generator = DataGenerator(data_dir)
    model = get_model(pretrained_model, data_generator.encoder.all_classes_names)

    model.fit_generator(
        data_generator.batch_generator('train', batch_size=BATCH_SIZE),
        steps_per_epoch=200,
        epochs=epochs,
        validation_data=data_generator.batch_generator('validation', batch_size=BATCH_SIZE, augmented=False),
        validation_steps=10,
        callbacks=[save_model_callback, tensorboard_callback],
        workers=4,
        pickle_safe=True,
    )

________________________________________________________________________________________________________________________________________________________________________________________________________
Layer (type)                                                                              Output Shape                                                                    Param #                       
input_2 (InputLayer)                                                                      (None, 256, 256, 3)                                                             0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
block1_conv1 (Conv2D)                                                                     (None, 256, 256, 64)                                                            1792                      



Epoch 1/1

Epoch 00001: saving model to /content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/weight/weights.01.h5


##Visualização

Aqui conseguimos visualizar os resultados para o nosso dataset, dado um arquivo de peso do modelo. 

Diretório contendo todas as imagens de entrada

Diretório de saída para gráficos gerados.

In [13]:
num_columns = 6
num_rows = 3

def plot_row_item(image_ax, labels_ax, pixels, top_classes_names, top_class_probabilities):
    image_ax.imshow(pixels, interpolation='nearest', aspect='auto')
    y_pos = np.arange(len(top_classes_names))*0.11
    labels_ax.barh(y_pos, top_class_probabilities, height=0.1, align='center',
            color='cyan', ecolor='black')
    labels_ax.set_xlim([0,1])
    labels_ax.set_yticks(y_pos)
    labels_ax.set_yticklabels(top_classes_names, position=(1,0))
    labels_ax.invert_yaxis()
    labels_ax.tick_params(
        axis='both',
        which='both',
        bottom='off',
        top='off',
        labelbottom='off')
    image_ax.axis('off')

def plot_prediction(pixels, model, data_encoder):
    fig = plt.figure()
    inner = gridspec.GridSpec(2, 1, wspace=0.05, hspace=0, height_ratios=[5, 1.2])
    image_ax = plt.Subplot(fig, inner[0])
    labels_ax = plt.Subplot(fig, inner[1])

    predicted_labels = model.predict(np.array([pixels]), batch_size=1)
    class_name_to_probability = data_encoder.one_hot_decode(predicted_labels[0].astype(np.float64))
    top_class_probability = sorted(class_name_to_probability.items(),
                                       key=lambda item_tup: item_tup[1],
                                       reverse=True)[:3]
    top_classes_names, top_class_probabilities = zip(*top_class_probability)
    character_idx = data_encoder.one_hot_index(top_classes_names[0])

    plot_row_item(image_ax, labels_ax, pixels, top_classes_names, top_class_probabilities)

    fig.add_subplot(image_ax)
    fig.add_subplot(labels_ax)
    return fig


if __name__ =='__main__':


    weight_file = '/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/weight/weights.01.h5'
    data_directory = '/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/pics/'
    output_directory = '/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/output/'
    image_path = list(glob.glob(os.path.join('/content/drive/My Drive/Colab Notebooks/DeepLearningImpactaOPE/cancer/test/benigno/', '*.npz')))


    model = load_model(weight_file)
    data_encoder = DataGenerator(data_directory).encoder

    print("{} input image(s) found. Beginning prediction plotting.".format(len(image_path)))

    for image_path in tqdm.tqdm(image_path, unit='image'):
        pixels = np.load(image_path)['pixels']
        fig = plot_prediction(pixels, model, data_encoder)
        plt.savefig(os.path.join(output_directory, os.path.basename(image_path) + 'predictions.png'))
        plt.close(fig)
        



  0%|          | 0/216 [00:00<?, ?image/s][A

216 input image(s) found. Beginning prediction plotting.



  0%|          | 1/216 [00:00<01:51,  1.92image/s][A
  1%|          | 2/216 [00:00<01:34,  2.26image/s][A
  1%|▏         | 3/216 [00:01<01:23,  2.56image/s][A
  2%|▏         | 4/216 [00:01<01:14,  2.85image/s][A
  2%|▏         | 5/216 [00:01<01:08,  3.08image/s][A
  3%|▎         | 6/216 [00:01<01:04,  3.24image/s][A
  3%|▎         | 7/216 [00:02<01:01,  3.42image/s][A
  4%|▎         | 8/216 [00:02<00:59,  3.52image/s][A
  4%|▍         | 9/216 [00:02<00:57,  3.60image/s][A
  5%|▍         | 10/216 [00:02<00:56,  3.64image/s][A
  5%|▌         | 11/216 [00:03<00:55,  3.71image/s][A
  6%|▌         | 12/216 [00:03<00:54,  3.77image/s][A
  6%|▌         | 13/216 [00:03<00:53,  3.78image/s][A
  6%|▋         | 14/216 [00:03<00:54,  3.73image/s][A
  7%|▋         | 15/216 [00:04<00:53,  3.74image/s][A
  7%|▋         | 16/216 [00:04<00:53,  3.73image/s][A
  8%|▊         | 17/216 [00:04<00:53,  3.74image/s][A
  8%|▊         | 18/216 [00:05<00:53,  3.71image/s][A
  9%|▉         | 1