# Training Model

In this note book we will be training the model.  
When training the model, we have several parameters we can use for getting better results:
1. **augmentations** - How many augmentations should be used for each question-image pair.
2. **batch_size**
3. **epochs**
4. **use_class_weight** - should class weights be used for compensating for skewed data.


### Some main functions we used:

In [1]:
import IPython
from IPython.display import display_html
from common.functions import get_highlighted_function_code
from common.functions import get_features, sentences_to_hot_vector, hot_vector_to_words
from classes.DataGenerator import DataGenerator

code_generate_data = get_highlighted_function_code(DataGenerator._generate_data, remove_comments=True)
code_get_features = get_highlighted_function_code(get_features, remove_comments=True)
code_hot_vector = get_highlighted_function_code(sentences_to_hot_vector, remove_comments=True)

print('Getting the label using a hot vector\n')
IPython.display.display(code_generate_data)

print('\n\nThe underlying method:\n')
IPython.display.display(code_hot_vector)

print('\n\nGetting the features using question embeding concatenation:\n')
IPython.display.display(code_get_features)

Using TensorFlow backend.


Getting the label using a hot vector





The underlying method:





Getting the features using question embeding concatenation:



---
## The code:

In [2]:
from classes.vqa_model_trainer import VqaModelTrainer
from common.model_utils import get_trainable_params_distribution
from common.settings import data_access as data_access_api
from data_access.api import SpecificDataAccess
from data_access.model_folder import ModelFolder
from common.utils import VerboseTimer

In [3]:
import logging
import vqa_logger 
logger = logging.getLogger(__name__)

In [4]:
model_location = 'C:\\Users\\Public\\Documents\\Data\\2019\\models\\20190503_1357_08\\'
model_folder = ModelFolder(model_location)
model_folder

ModelFolder(folder="C:\\Users\\Public\\Documents\\Data\\2019\\models\\20190503_1357_08")

### Loading the model to train:

In [5]:
question_category = model_folder.question_category
kw_args = {
'augmentations': 20,
'batch_size': 32,
'epochs': 1,
'question_category': question_category,
'use_class_weight':False,
}

data_access = SpecificDataAccess(data_access_api.folder, question_category=question_category, group=None)
mt = VqaModelTrainer(model_folder, data_access=data_access, **kw_args)

[2019-05-03 14:58:08][common.utils][DEBUG] Starting 'Loading Model'
[2019-05-03 14:58:11][common.utils][DEBUG] Loading Model: 0:00:03.177274


#### Lets take a look at the parameters:

In [6]:
get_trainable_params_distribution(mt.model)
# mt.model.summary()

Got a total of 50,565 trainable parameters


Unnamed: 0,index,layer,trainable_params,pretty_value
0,5,post_concat_dense1_8/kernel:0,40960,40960
1,3,embedding_batch_normalization/beta:0,4608,4608
2,7,embedding_batch_normalization/gamma:0,4608,4608


#### Take a look at the meta data:

In [7]:
meta = data_access.load_meta()
df_meta_answers = meta['answers']
df_words = meta['words']
df_data = data_access.load_processed_data()

def display_side_by_side(*args):
    html_str=''
    for df in args:
        html_str+=df.to_html()
    display_html(html_str.replace('table','table style="display:inline"'),raw=True)
    
display_side_by_side(df_meta_answers.sample(10),df_words.sample(10))



[2019-05-03 14:58:11][data_access.api][DEBUG] loading processed data from:
C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\VQA.Python\data\model_input.parquet
[2019-05-03 14:58:11][data_access.api][DEBUG] loading parquet from:
C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\VQA.Python\data\model_input.parquet
[2019-05-03 14:58:11][common.utils][DEBUG] Starting 'Loading parquet'
[2019-05-03 14:58:14][common.utils][DEBUG] Loading parquet: 0:00:03.188204
[2019-05-03 14:58:14][common.utils][DEBUG] Starting 'Converting to pandas'
[2019-05-03 14:58:15][common.utils][DEBUG] Converting to pandas: 0:00:00.569857


Unnamed: 0,processed_answer,question_category
573,femoral neck stress fractures,Abnormality
1186,peripheral vascular disease,Abnormality
417,cryptococcus cryptococcoma presumed,Abnormality
470,diffuse rhabdomyosarcoma,Abnormality
509,elbow dislocation,Abnormality
290,cervical hemangioblastoma,Abnormality
944,meniscal ossicle,Abnormality
1286,pseudofracture of the cervical vertebral body,Abnormality
302,chiari malformation cervical occipital encepha...,Abnormality
652,glioblastome multiforme gbm,Abnormality

Unnamed: 0,word,question_category
171,balo,Abnormality
366,colles,Abnormality
1376,ovary,Abnormality
8,abscesses,Abnormality
511,dish,Abnormality
1484,plate,Abnormality
1669,scaphocephaly,Abnormality
1161,meningioma,Abnormality
1475,placement,Abnormality
306,children,Abnormality


### Train the model

In [8]:
history = mt.train()

[2019-05-03 14:58:15][data_access.api][DEBUG] loading processed data from:
C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\VQA.Python\data\model_input.parquet
[2019-05-03 14:58:15][data_access.api][DEBUG] loading parquet from:
C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\VQA.Python\data\model_input.parquet
[2019-05-03 14:58:15][common.utils][DEBUG] Starting 'Loading parquet'
[2019-05-03 14:58:18][common.utils][DEBUG] Loading parquet: 0:00:02.584861
[2019-05-03 14:58:18][common.utils][DEBUG] Starting 'Converting to pandas'
[2019-05-03 14:58:18][common.utils][DEBUG] Converting to pandas: 0:00:00.019675
[2019-05-03 14:58:18][data_access.api][DEBUG] Loading augmentations:
C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\VQA.Python\data\augmentations.parquet
[2019-05-03 14:58:18][data_access.api][DEBUG] loading parquet from:
C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\VQA.Python\data\augmentations.parquet
[2019-05-03 14:58:18][common.utils][DEBUG] Starting 'Loading parquet'
[2019-05-0

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

Exception ignored in: 'pandas._libs.lib.is_bool_array'
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'


[2019-05-03 14:58:18][data_access.api][DEBUG] loading processed data from:
C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\VQA.Python\data\model_input.parquet
[2019-05-03 14:58:18][data_access.api][DEBUG] loading parquet from:
C:\Users\avitu\Documents\GitHub\VQA-MED\VQA-MED\VQA.Python\data\model_input.parquet
[2019-05-03 14:58:18][common.utils][DEBUG] Starting 'Loading parquet'
[2019-05-03 14:58:19][common.utils][DEBUG] Loading parquet: 0:00:00.572948
[2019-05-03 14:58:19][common.utils][DEBUG] Starting 'Converting to pandas'
[2019-05-03 14:58:19][common.utils][DEBUG] Converting to pandas: 0:00:00.004995
[2019-05-03 14:58:20][common.utils][DEBUG] Starting 'Training Model'
[2019-05-03 14:58:20][classes.vqa_model_trainer][DEBUG] Expected shape: [(None, 4608, 1), (None, None, None, 3)]
[2019-05-03 14:58:20][classes.vqa_model_trainer][DEBUG] ---------------------------------------------------------------------------
[2019-05-03 14:58:20][classes.vqa_model_trainer][DEBUG] Actual training sha

### Save trained model:

In [9]:
with VerboseTimer("Saving trained Model"):
    model_folder = mt.save(mt.model, mt.model_folder, history)


[2019-05-03 15:12:06][common.utils][DEBUG] Starting 'Saving trained Model'
[2019-05-03 15:12:06][common.utils][DEBUG] Starting 'Saving trained Model'
[2019-05-03 15:12:06][data_access.model_folder][DEBUG] model saved
[2019-05-03 15:12:06][data_access.model_folder][DEBUG] saving prediction vector
[2019-05-03 15:12:06][data_access.model_folder][DEBUG] saved prediction vector
[2019-05-03 15:12:06][data_access.model_folder][DEBUG] Writing Summary
[2019-05-03 15:12:06][data_access.model_folder][DEBUG] Done Writing Summary
[2019-05-03 15:12:06][data_access.model_folder][DEBUG] Saving image
[2019-05-03 15:12:07][data_access.model_folder][DEBUG] Image saved ('C:\Users\Public\Documents\Data\2019\models\20190503_1512_06\model.png')
[2019-05-03 15:12:07][data_access.model_folder][DEBUG] Saving History
[2019-05-03 15:12:07][data_access.model_folder][DEBUG] History saved to 'C:\Users\Public\Documents\Data\2019\models\20190503_1512_06\model_history.pkl'
[2019-05-03 15:12:07][common.utils][DEBUG] Sav

In [10]:
print (model_folder.model_path)

C:\Users\Public\Documents\Data\2019\models\20190503_1512_06\vqa_model.h5
