# Train and test using the complete DCC
In this notebook we'll first train a biLSTM using the complete DCC. We'll use MetaCAT's eval() function to evaluate the model on the complete DCC again, and we'll also use a custom evaluation function to process every example seperately and save those results in a result dataframe.

In [1]:
import numpy as np
from pathlib import Path
from medcat.meta_cat import MetaCAT
from medcat.config_meta_cat import ConfigMetaCAT
from medcat.tokenizers.meta_cat_tokenizers import TokenizerWrapperBPE
from utils import evaluate_per_example

In [2]:
# Input & output
data_dir = Path.cwd().parents[0] / 'data'
annotation_file = data_dir / 'emc-dcc_ann.json'
model_dir = Path.cwd().parents[0] / 'models' / 'bilstm'
embeddings_file = model_dir / 'embeddings.npy'
result_dir = Path.cwd().parents[0] / 'results'
bilstm_result_file = result_dir / 'bilstm_predictions.csv.gz'

# Config
config_metacat = ConfigMetaCAT()
config_metacat.general['category_name'] = 'Negation'
config_metacat.train['nepochs'] = 10
config_metacat.train['score_average'] = 'binary'

## Load tokenizer and embeddings matrix
Load a project-wide tokenizer and embeddings matrix which are created in `01_tokenizer_embeddings.ipynb`.

In [3]:
tokenizer = TokenizerWrapperBPE.load(model_dir)
embeddings = np.load(embeddings_file)

## Train biLSTM

In [4]:
# Initiate MetaCAT
meta_cat = MetaCAT(tokenizer=tokenizer, embeddings=embeddings, config=config_metacat)

In [5]:
# Train model
results = meta_cat.train(json_path=annotation_file, save_dir_path=str(model_dir))

Epoch: 0 **************************************************  Train
              precision    recall  f1-score   support

           0       0.95      0.98      0.97      9709
           1       0.85      0.70      0.77      1587

    accuracy                           0.94     11296
   macro avg       0.90      0.84      0.87     11296
weighted avg       0.94      0.94      0.94     11296

Epoch: 0 **************************************************  Test
              precision    recall  f1-score   support

           0       0.99      0.98      0.98      1082
           1       0.88      0.91      0.89       173

    accuracy                           0.97      1255
   macro avg       0.93      0.94      0.94      1255
weighted avg       0.97      0.97      0.97      1255


##### Model saved to /Users/stan3/Data/negation-detection/models/bilstm/model.dat at epoch: 0 and f1: 0.969937081231062 #####

Epoch: 1 **************************************************  Train
              prec

In [6]:
meta_cat.save(save_dir_path=str(model_dir))

## Simple evaluation with MetaCAT's eval()
MetaCAT's eval() function does not return the example ID.

In [7]:
# Load biLSTM
meta_cat = MetaCAT.load(model_dir)

# Evalate with MetaCAT's eval()
results = meta_cat.eval(json_path=annotation_file)

Epoch: 0 **************************************************  Eval
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     10791
           1       0.97      0.99      0.98      1760

    accuracy                           1.00     12551
   macro avg       0.99      0.99      0.99     12551
weighted avg       1.00      1.00      1.00     12551



In [8]:
# Print full F1 score to check for changes in result
print(f'F1: {results["f1"]}')

F1: 0.9950764380466066


## Custom evaluation per example
In this project we are interested per example whether a negation has been identified or not. MetaCAT does not have such functionality, it only returns the scores, predictions and examples.

In this part we iterate through all annotations in a an anntation file (MedCAT Trainer format), create an ID for every example (based on `exampleID=documentID_start_end`), and collect the prediction per example.

In [9]:
bilstm_predictions = evaluate_per_example(annotation_file, meta_cat, f'bilstm')
bilstm_predictions.to_csv(bilstm_result_file, index=False, compression='gzip', line_terminator='\n')
bilstm_predictions

Unnamed: 0,entity_id,bilstm
0,DL1111_32_46,not negated
1,DL1111_272_280,not negated
2,DL1111_363_377,not negated
3,DL1112_22_28,negated
4,DL1113_59_67,not negated
...,...,...
12546,SP2118_862_876,not negated
12547,SP2119_23_45,negated
12548,SP2120_3_23,not negated
12549,SP2121_73_85,not negated
