# VQA - MED

This notebook demostraits the efforts made for Visual Q&A based on the data set from VQA-Med 2018 contest

## Abstract

The inputs for VQA are:
1. The question text 
2. The image

The question text is being embedded into a feature vector using a pre-traing [globe file](https://nlp.stanford.edu/projects/glove/). 

In a similar manner the image is being processed using a pre trained deep NN (e.g. [VGG](http://qr.ae/TUTEKo) with initial wights of a pretrained [imagenet model](https://en.wikipedia.org/wiki/ImageNet))


## The plan

0. [Preperations and helpers](#Preperations-and-helpers)
1. [Collecting pre processing item](#Collecting-pre-processing-item)
2. [Preprocessing and creating meta data](#Preprocessing-and-creating-meta-data)
3. [Creating the model](#Creating-the-model)
4. [Training the model](#Training-the-model)
5. [Testing the model](#Testing-the-model)


### Preperations and helpers

The following are just helpers & utils imports - feel free to skip...

In [53]:
from parsers.utils import VerboseTimer
from utils.os_utils import File, print_progress
import time, datetime

def get_time_stamp():
    now = time.time()
    ts = datetime.datetime.fromtimestamp(now).strftime('%Y%m%d_%H%M_%S')
    return ts


### Collecting pre processing item

###### Download pre trained items & store their location

In [2]:
#TODO: Add down loading for glove file

In [35]:
import os
seq_length =    26
embedding_dim = 300

glove_path =                    os.path.abspath('data/glove.6B.{0}d.txt'.format(embedding_dim))
embedding_matrix_filename =     os.path.abspath('data/ckpts/embeddings_{0}.h5'.format(embedding_dim))
ckpt_model_weights_filename =   os.path.abspath('data/ckpts/model_weights.h5')




DEFAULT_IMAGE_WIEGHTS = 'imagenet'
#  Since VGG was trained as a image of 224x224, every new image
# is required to go through the same transformation
image_size_by_base_models = {'imagenet': (224, 224)}


In [4]:
import os
# Fail fast...
suffix = "Failing fast:\n"
assert os.path.isfile(glove_path), suffix+"glove file does not exists:\n{0}".format(glove_path)
# assert os.path.isfile(embedding_matrix_filename), suffix+"Embedding matrix file does not exist:\n{0}".format(embedding_matrix_filename)
assert os.path.isfile(ckpt_model_weights_filename), suffix+"glove file does not exists:\n{0}".format(ckpt_model_weights_filename)

print('Validated file locations')

Validated file locations


##### Set locations for pre-training items to-be created

In [5]:
# Pre process results files
data_prepo_meta            = os.path.abspath('data/my_data_prepro.json')
data_prepo_meta_validation = os.path.abspath('data/my_data_prepro_validation.json')
# Location of embediing pre trained matrix
embedding_matrix_filename  = os.path.abspath('data/ckpts/embeddings_{0}.h5'.format(embedding_dim))

# The location to dump models to
vqa_models_folder          = "C:\\Users\\Public\\Documents\\Data\\2018\\vqa_models"



### Preprocessing and creating meta data

We will use this function for creating meta data:

In [6]:
from vqa_logger import logger 
import itertools
import string
from utils.os_utils import File #This is a simplehelper file of mine...

def create_meta(meta_file_location, df):
        logger.debug("Creating meta data ('{0}')".format(meta_file_location))
        def get_unique_words(col):
            single_string = " ".join(df[col])
            exclude = set(string.punctuation)
            s_no_panctuation = ''.join(ch for ch in single_string if ch not in exclude)
            unique_words = set(s_no_panctuation.split(" ")).difference({'',' '})
            print("column {0} had {1} unique words".format(col,len(unique_words)))
            return unique_words

        cols = ['question', 'answer']
        unique_words = set(itertools.chain.from_iterable([get_unique_words(col) for col in cols]))
        print("total unique words: {0}".format(len(unique_words)))

        metadata = {}
        metadata['ix_to_word'] = {str(word): int(i) for i, word in enumerate(unique_words)}
        metadata['ix_to_ans'] = {ans:i for ans, i in enumerate(set(df['answer']))}
        # {int(i):str(word) for i, word in enumerate(unique_words)}

        File.dump_json(metadata,meta_file_location)
        return metadata

And lets create meta data for training & validation sets:

In [7]:
from collections import namedtuple
dbg_file_csv_train = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Train\\VQAMed2018Train-QA.csv'
dbg_file_xls_train = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Train\\VQAMed2018Train-QA_post_pre_process_intermediate.xlsx'#"'C:\\\\Users\\\\avitu\\\\Documents\\\\GitHub\\\\VQA-MED\\\\VQA-MED\\\\Cognitive-LUIS-Windows-master\\\\Sample\\\\VQA.Python\\\\dumped_data\\\\vqa_data.xlsx'
dbg_file_xls_processed_train = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Train\\VQAMed2018Train-QA_post_pre_process.xlsx'
train_embedding_path = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Train\\VQAMed2018Train-images\\embbeded_images.hdf'
images_path_train = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Train\\VQAMed2018Train-images'


dbg_file_csv_validation = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Valid\\VQAMed2018Valid-QA.csv'
dbg_file_xls_validation = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Valid\\VQAMed2018Valid-QA_post_pre_process_intermediate.xlsx'
dbg_file_xls_processed_validation = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Valid\\VQAMed2018Valid-QA_post_pre_process.xlsx'
validation_embedding_path = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Valid\\VQAMed2018Valid-images\\embbeded_images.hdf'
images_path_validation = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Valid\\VQAMed2018Valid-images'


dbg_file_csv_test = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Test\\VQAMed2018Test-QA.csv'
dbg_file_xls_test = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Test\\VQAMed2018Test-QA_post_pre_process_intermediate.xlsx'
dbg_file_xls_processed_test = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Test\\VQAMed2018Test-QA_post_pre_process.xlsx'
test_embedding_path = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Test\\VQAMed2018Test-images\\embbeded_images.hdf'
images_path_test = 'C:\\Users\\Public\\Documents\\Data\\2018\\VQAMed2018Test\\VQAMed2018Test-images'

DataLocations = namedtuple('DataLocations', ['data_tag', 'raw_csv', 'raw_xls', 'processed_xls','images_path'])
train_data = DataLocations('train', dbg_file_csv_train,dbg_file_xls_train,dbg_file_xls_processed_train, images_path_train)
validation_data = DataLocations('validation', dbg_file_csv_validation, dbg_file_xls_validation, dbg_file_xls_processed_validation, images_path_validation)
test_data = DataLocations('test', dbg_file_csv_test, dbg_file_xls_test, dbg_file_xls_processed_test, images_path_test)

Get the data itself, Note the only things required in dataframe are:
1. image_name
2. question
3. answer


In [8]:
from parsers.VQA18 import Vqa18Base
df_train = Vqa18Base.get_instance(train_data.processed_xls).data            
df_val = Vqa18Base.get_instance(validation_data.processed_xls).data
df_train.head(2)


20:48:25,934 matplotlib DEBUG ## $HOME=C:\Users\avitu
20:48:25,935 matplotlib DEBUG ## matplotlib data path c:\local\Anaconda3-4.1.1-Windows-x86_64\envs\conda_env\lib\site-packages\matplotlib\mpl-data
20:48:25,941 matplotlib DEBUG ## loaded rc file c:\local\Anaconda3-4.1.1-Windows-x86_64\envs\conda_env\lib\site-packages\matplotlib\mpl-data\matplotlibrc
20:48:25,944 matplotlib DEBUG ## matplotlib version 2.2.2
20:48:25,945 matplotlib DEBUG ## interactive is False
20:48:25,946 matplotlib DEBUG ## platform is win32


Unnamed: 0,mri,brain,answer,hematoma,neck,liver,tokenized_question,ct,abdomen,row_id,image_name,tumor,question,tokenized_answer
0,True,False,lesion at tail of pancreas,False,False,False,what doe MRI show ?,False,False,1,rjv03401,True,what does mri show?,tumor at tail pancreas
1,True,False,in distal pancreas,False,False,False,where doe axial seCTion MRI abdomen show hypoe...,False,True,2,AIAN-14-313-g002,False,where does axial section mri abdomen show hypo...,distal pancreas


In [9]:
print("----- Creating training meta -----")
meta_train = create_meta(data_prepo_meta, df_train)

print("\n----- Creating validation meta -----")
meta_validation = create_meta(data_prepo_meta, df_val)

meta_train

20:48:26,955 pythonVQA DEBUG ## Creating meta data ('C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\Cognitive-LUIS-Windows-master\Sample\VQA.Python\data\my_data_prepro.json')
20:48:27,70 pythonVQA DEBUG ## Creating meta data ('C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\Cognitive-LUIS-Windows-master\Sample\VQA.Python\data\my_data_prepro.json')


----- Creating training meta -----
column question had 3317 unique words
column answer had 3255 unique words
total unique words: 3578

----- Creating validation meta -----
column question had 399 unique words
column answer had 669 unique words
total unique words: 881


{'ix_to_word': {'alcyceal': 0,
  'spinal': 1,
  'swelling': 2,
  'mapping': 3,
  'outflow': 4,
  'favorable': 5,
  'lead': 6,
  'hydroureter': 7,
  'horseshoe': 8,
  'misplaced': 9,
  'V': 10,
  'interposition': 11,
  'portography': 12,
  'atlantooccipital': 13,
  'reduced': 14,
  'nerve': 15,
  'ileocolic': 16,
  'ileoanal': 17,
  'multifidus': 18,
  'giant': 19,
  'bud': 20,
  'flat': 21,
  'spair': 22,
  'Reaccumulation': 23,
  '10': 24,
  'operative': 25,
  'evisceration': 26,
  'interpole': 27,
  'based': 28,
  'occlusion': 29,
  'demarcated': 30,
  'nonocclusive': 31,
  'contents': 32,
  'day': 33,
  'reduction': 34,
  'collections': 35,
  'severe': 36,
  'layering': 37,
  'septations': 38,
  'peripheral': 39,
  'translocation': 40,
  'ascites': 41,
  'xmr': 42,
  'placental': 43,
  'radiodense': 44,
  'groin': 45,
  'csf': 46,
  'glioblastome': 47,
  'lid': 48,
  'neurography': 49,
  'subsequent': 50,
  'slowly': 51,
  'femur': 52,
  'zone': 53,
  'T1WI': 54,
  'sagital': 55,
  

### Creating the model

#### The functions the gets the model:

##### Get Embedding:

In [10]:
import numpy as np
import random
import h5py
def prepare_embeddings(metadata):
    embedding_filename = embedding_matrix_filename
    num_words = len(metadata['ix_to_word'].keys())
    dim_embedding = embedding_dim



    logger.debug("Embedding Data...")
    # texts = df['question']

    embeddings_index = {}
    i = -1
    line = "NO DATA"


    glove_line_count = File.file_len(glove_path, encoding="utf8")
    def process_line(i, line):
        print_progress(i, glove_line_count)
        try:
            values = line.split()
            word = values[0]
            coefs = np.asarray(values[1:], dtype='float32')
            embeddings_index[word] = coefs
            print_progress(i+1, glove_line_count)
        except Exception as ex:
            logger.error(
                "An error occurred while working on glove file [line {0}]:\n"
                "Line text:\t{1}\nGlove path:\t{2}\n"
                "{3}".format(
                    i, line, glove_path, ex))
            raise


    # with open(glove_path, 'r') as glove_file:
    with VerboseTimer("Embedding"):
        with open(glove_path, 'r', encoding="utf8") as glove_file:
            [process_line(i=i, line=line)for i, line in enumerate(glove_file)]



    embedding_matrix = np.zeros((num_words, dim_embedding))
    word_index = metadata['ix_to_word']

    with VerboseTimer("Creating matrix"):
        embedding_tupl = ((word, i, embeddings_index.get(word)) for word, i in word_index.items())
        embedded_with_values = [(word, i, embedding_vector) for word, i, embedding_vector in embedding_tupl if embedding_vector is not None]

        for word, i, embedding_vector in embedded_with_values:
            embedding_matrix[i] = embedding_vector


    e = {tpl[0] for tpl in embedded_with_values}
    w = set(word_index.keys())
    words_with_no_embedding = w-e
    rnd = random.sample(words_with_no_embedding , 5)
    logger.debug("{0} words did not have embedding. e.g.:\n{1}".format(len(words_with_no_embedding),rnd))

    with VerboseTimer("Dumping matrix"):
        with h5py.File(embedding_filename, 'w') as f:
            f.create_dataset('embedding_matrix', data=embedding_matrix)

    return embedding_matrix



  from ._conv import register_converters as _register_converters


If the embedding already exists, save yourself the time and just load it.  
Otherwise - calculate it

In [11]:
if os.path.exists(embedding_matrix_filename):
    logger.debug("Embedding Data already exists. Loading...")
    with h5py.File(embedding_matrix_filename) as f:
        embedding_train = np.array(f['embedding_matrix'])    
else:
    logger.debug("Calculating Embedding...")
    embedding_train = prepare_embeddings(meta_train)
    
embedding_matrix = embedding_train

20:48:27,313 pythonVQA DEBUG ## Embedding Data already exists. Loading...


And lets take a look:

In [12]:
embedding_matrix

array([[-0.44398999,  0.12817   , -0.25246999, ..., -0.20043001,
        -0.082191  , -0.06255   ],
       [ 0.08561   ,  0.077471  , -1.01680005, ..., -0.30044001,
         0.012508  ,  0.24875   ],
       [-0.16277   ,  0.033858  , -0.39416999, ...,  0.20255999,
        -0.17546999, -0.30397999],
       ...,
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [-0.21359   ,  0.85279   ,  0.48688999, ..., -0.19047   ,
        -0.058526  , -0.49094   ],
       [ 0.72328001, -0.1178    , -0.022166  , ...,  0.49592999,
        -0.16937999, -0.58451003]])

And lets wrap it with related information:

In [14]:
from vqa_flow.data_structures import EmbeddingData
def get_embedding_data(embedding_matrix, meta_data):    
    dim = embedding_dim
    s_length = seq_length    
    return EmbeddingData(embedding_matrix=embedding_matrix,embedding_dim=dim, seq_length=s_length, meta_data=meta_data)

embedding_train = get_embedding_data(embedding_matrix, meta_train)
str(embedding_train)

'EmbeddingData(Embedding length:3578, Embedding dim: 300, seq length: 26, meta length: 2)'

Define how to build the word-to vector branch:

In [28]:
def word_2_vec_model(embedding_matrix, num_words, embedding_dim, seq_length, input_tensor):
        # notes:
        # num works: scalar represents size of original corpus
        # embedding_dim : dim reduction. every input string will be encoded in a binary fashion using a vector of this length
        # embedding_matrix (AKA embedding_initializers): represents a pre trained network

        LSTM_UNITS = 512
        DENSE_UNITS = 1024
        DENSE_ACTIVATION = 'relu'


        logger.debug("Creating Embedding model")
        x = Embedding(num_words, embedding_dim, weights=[embedding_matrix], input_length=seq_length,trainable=False)(input_tensor)
        x = LSTM(units=LSTM_UNITS, return_sequences=True, input_shape=(seq_length, embedding_dim))(x)
        x = BatchNormalization()(x)
        x = LSTM(units=LSTM_UNITS, return_sequences=False)(x)
        x = BatchNormalization()(x)
        x = Dense(units=DENSE_UNITS, activation=DENSE_ACTIVATION)(x)
        model = x
        logger.debug("Done Creating Embedding model")
        return model

In the same manner, define how to build the image representation branch:

In [39]:
from keras.applications.vgg19 import VGG19
from keras.layers import Dense, GlobalAveragePooling2D#, Input, Dropout
def get_image_model(base_model_weights=DEFAULT_IMAGE_WIEGHTS, out_put_dim=1024):
    base_model_weights = base_model_weights

    # base_model = VGG19(weights=base_model_weights,include_top=False)
    base_model = VGG19(weights=base_model_weights, include_top=False)
    base_model.trainable = False

    x = base_model.output
    # add a global spatial average pooling layer
    x = GlobalAveragePooling2D(name="image_model_average_pool")(x)
    # let's add a fully-connected layer
    x = Dense(out_put_dim, activation='relu',name="image_model_dense")(x)
    # and a logistic layer -- let's say we have 200 classes
    # predictions = Dense(200, activation='softmax')(x)
    model = x
    
    return base_model.input , model

Before we start, just for making sure, lets clear the session:

In [60]:
from keras import backend as keras_backend
keras_backend.clear_session()

And finally, building the model itself:

In [61]:
import keras.layers as keras_layers
#Available merge strategies:
# keras_layers.multiply, keras_layers.add, keras_layers.concatenate, 
# keras_layers.average, keras_layers.co, keras_layers.dot, keras_layers.maximum
            
merge_strategy = keras_layers.concatenate

In [62]:
from keras import Model, models, Input, callbacks
from keras.utils import plot_model, to_categorical
from keras.layers import Dense, Embedding, LSTM, BatchNormalization#, GlobalAveragePooling2D, Merge, Flatten

def get_vqa_model(embedding_data=None):        
        embedding_matrix = embedding_data.embedding_matrix
        num_words = embedding_data.num_words
        num_classes = embedding_data.num_classes

        DENSE_UNITS = 1000
        DENSE_ACTIVATION = 'relu'

        OPTIMIZER = 'rmsprop'
        LOSS = 'categorical_crossentropy'
        METRICS = 'accuracy'

        image_model, lstm_model, fc_model = None, None, None
        try:

            lstm_input_tensor = Input(shape=(embedding_dim,), name='embedding_input')

            logger.debug("Getting embedding (lstm model)")
            lstm_model = word_2_vec_model(embedding_matrix=embedding_matrix, num_words=num_words, embedding_dim=embedding_dim,
                                               seq_length=seq_length, input_tensor=lstm_input_tensor)

            logger.debug("Getting image model")
            out_put_dim = lstm_model.shape[-1].value
            image_input_tensor, image_model = get_image_model(out_put_dim=out_put_dim)


            logger.debug("merging final model")
            fc_tensors = merge_strategy(inputs=[image_model, lstm_model])
            fc_tensors = BatchNormalization()(fc_tensors)
            fc_tensors = Dense(units=DENSE_UNITS, activation=DENSE_ACTIVATION)(fc_tensors)
            fc_tensors = BatchNormalization()(fc_tensors)
            fc_tensors = Dense(units=num_classes, activation='softmax')(fc_tensors)

            fc_model = Model(input=[lstm_input_tensor, image_input_tensor], output=fc_tensors)
            fc_model.compile(optimizer=OPTIMIZER, loss=LOSS, metrics=[METRICS])
        except Exception as ex:
            logger.error("Got an error while building vqa model:\n{0}".format(ex))
            models = [(image_model, 'image_model'), (lstm_model, 'lstm_model'), (fc_model, 'lstm_model')]
            for m, name in models:
                if m is not None:
                    logger.error("######################### {0} model details: ######################### ".format(name))
                    try:
                        m.summary(print_fn=logger.error)
                    except Exception as ex2:
                        logger.warning("Failed to print summary for {0}:\n{1}".format(name, ex2))
            raise

        return fc_model

model = get_vqa_model(embedding_data=embedding_train)
model

21:18:51,987 pythonVQA DEBUG ## Getting embedding (lstm model)
21:18:51,989 pythonVQA DEBUG ## Creating Embedding model
21:18:53,871 pythonVQA DEBUG ## Done Creating Embedding model
21:18:53,872 pythonVQA DEBUG ## Getting image model
21:18:54,855 pythonVQA DEBUG ## merging final model


<keras.engine.training.Model at 0x22a057b5208>

And the summary of our model:

In [48]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            (None, None, None, 3 0                                            
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, None, None, 6 1792        input_3[0][0]                    
__________________________________________________________________________________________________
block1_conv2 (Conv2D)           (None, None, None, 6 36928       block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_pool (MaxPooling2D)      (None, None, None, 6 0           block1_conv2[0][0]               
__________________________________________________________________________________________________
block2_con

We better save it:

In [52]:
def print_model_summary_to_file(fn, model):
    # Open the file
    with open(fn,'w') as fh:
        # Pass the file handle in as a lambda function to make it callable
        model.summary(print_fn=lambda x: fh.write(x + '\n'))
        

ts = get_time_stamp()

now_folder = os.path.abspath('{0}\\{1}\\'.format(vqa_models_folder, ts))
model_fn = os.path.join(now_folder, 'vqa_model.h5')
model_image_fn = os.path.join(now_folder, 'model_vqa.png5')
summary_fn = os.path.join(now_folder, 'model_summary.txt')
logger.debug("saving model to: '{0}'".format(model_fn))

try:
    File.validate_dir_exists(now_folder)
    model.save(model_fn)  # creates a HDF5 file 'my_model.h5'
    logger.debug("model saved")
except Exception as ex:
    logger.error("Failed to save model:\n{0}".format(ex))

try:
    logger.debug("Writing history")
    print_model_summary_to_file(summary_fn, model)
    logger.debug("Done Writing History")
    logger.debug("Plotting model")
    plot_model(model, to_file=model_image_fn)
    logger.debug("Done Plotting")
except Exception as ex:
    logger.warning("{0}".format(ex))

21:11:07,543 pythonVQA DEBUG ## saving model to: 'C:\Users\Public\Documents\Data\2018\vqa_models\20180605_2111_07\vqa_model.h5'
21:11:07,870 pythonVQA DEBUG ## model saved
21:11:07,871 pythonVQA DEBUG ## Writing history
21:11:07,876 pythonVQA DEBUG ## Done Writing History
21:11:07,877 pythonVQA DEBUG ## Plotting model


### Training the model