# Objectives
* Gain hands on experience working with pyConText and its resources
* Understand and develop Targets
* Understand and develop Modifiers
* Graph and visualize Targets and Modifiers together
* Gain tools for group project in reducing False Negatives and False Positives (F1 measure)

In [1]:
import os

# we will definitely need pyConText
import pyConTextNLP
from pyConTextNLP import pyConTextGraph
from pyConTextNLP.itemData import itemData
from pyConTextNLP.display.html import mark_document_with_html
print(pyConTextNLP.__version__)
# useful utilities in RadNLP as well

import pandas as pd
# packages for interaction
from ipywidgets import interact, interactive, fixed
from IPython.display import display, HTML, Image
import ipywidgets
# and also our utilities for this class
from nlp_pneumonia_utils import Annotation
from nlp_pneumonia_utils import AnnotatedDocument
from nlp_pneumonia_utils import read_brat_annotations
from nlp_pneumonia_utils import read_doc_annotations
from nlp_pneumonia_utils import read_annotations
from nlp_pneumonia_utils import calculate_prediction_metrics
from nlp_pneumonia_utils import mark_text
from nlp_pneumonia_utils import clearPyConTextRegularExpressions
from nlp_pneumonia_utils import pneumonia_annotation_html_markup
from nlp_pneumonia_utils import mark_document_with_html
from nlp_pneumonia_utils import view_single_sentence_graph
from nlp_pneumonia_utils import markup_sentence

from pycontext_quiz import identify_target_category
from pycontext_quiz import file_delimiter_quiz
from pycontext_quiz import modifier_directionality_quiz
from pycontext_quiz import second_most_frequent_modifier_quiz

from visual import Vis
from visual import snippets_markup
from visual import view_pycontext_output

from textblob import TextBlob

print('Imported pneumonia nlp utilities...')

0.6.2.0
Imported pneumonia nlp utilities...


# First thing, let's load our training set

In [2]:
annotated_doc_map = read_doc_annotations('data/training_v2.zip')

# let's also use a simple list of documents as well as this map
annotated_docs = list(annotated_doc_map.values())

print('Total Annotated Documents : {0}'.format(len(annotated_docs)))

Reading annotations from file : data/training_v2.zip
Opening local file : data/training_v2.zip
Total Annotated Documents : 70


## Let's set up an example sentence to work with

In [3]:
example_sentence = """IMPRESSION: No evidence of pneumonia."""

## We are going to use an algorithm called pyConText. Our first goal is to convert the "keywords" that we worked on in our previous activity into pyConText "targets." Any itemData in pyConText has 4 parts:
1. The literal (e.g. "pneumonia", "pneumoniathorax", "can rule out", "cannot be excluded", etc)
2. The category (e.g. "EVIDENCE_OF_PNEUMONIA")
3. The regular expression (optional) used to capture the literal in the text. If no regular expression is provided, a regular expression is generated literally from the literal.
4. The rule (optional). If the itemData is being used as a modifier, the rule states what direction the modifier operates in the sentence: current valid values are: "forward", the item can modify objects following it in the sentence; "backward", the item can modify objects preceding it in the sentence; or "bidirectional", the item can modify objects preceding and following it in the sentence.

## Additionally, itemData can be instantiated in code or from a file.  Let's start with code:

In [4]:
# Now let's set up some rules for pyConText for EVIDENCE_OF_PNEUMONIA
# At this moment, we will just set up these "concepts" and well handle modifiers for them after that

targets1 = []
modifiers1 = []

# so before we add targets, remember from above that they will look like this : 
# targets = itemData(["literal", "CATEGORY", "regular expression(s)", "empty or forward or backward or bidirectional"])

# so now let's set this up for "pneumonia" with the category "EVIDENCE_OF_PNEUMONIA"
targets1 = itemData(["pneumonia", "EVIDENCE_OF_PNEUMONIA", "", ""], 
                    ["consolidation", "EVIDENCE_OF_PNEUMONIA", "", ""],
                    ["infiltrate", "EVIDENCE_OF_PNEUMONIA", "", ""]
                   )



## Let's apply these targets and modifiers (still blank) to the example sentence

In [5]:
# let's go ahead and use this now on one single example sentence:
markup = markup_sentence(example_sentence, modifiers1, targets1)
# prettier display with IPython display
display(markup.nodes(data = True))
#print(markup.getXML())

[(<id> 45651149063972163252820262615501182279 </id> <phrase> pneumonia </phrase> <category> ['evidence_of_pneumonia'] </category> ,
  {'category': 'target'})]

## Question : Will we find a target match on this sentence? Will we match "pneumonias"?

In [6]:
example_sentence_2 = """Findings consistent with CHF, although underlying bilateral lower lobe pneumonias cannot be excluded."""

In [7]:
# let's see how things look on this sentence
markup_sentence_2 = markup_sentence(example_sentence_2, modifiers1, targets1, verbose = True)
display(markup_sentence_2.nodes(data = True))

[]

## We didn't mark up a target for "pneumonias" since we only had the singular variant "pneumonia"
### We can 
### (1) add the target "pneumonias" to our target list or 
### (2) add a regular expression for the target "pneumonia" so that it recognizes different variants

In [8]:
# add a new target for 'pneumonias'
targets1 = itemData(["pneumonia", "EVIDENCE_OF_PNEUMONIA", "", ""], 
                    ["pneumonias", "EVIDENCE_OF_PNEUMONIA", "", ""],
                    ["consolidation", "EVIDENCE_OF_PNEUMONIA", "", ""],
                    ["infiltrate", "EVIDENCE_OF_PNEUMONIA", "", ""]
                   )

In [9]:
# let's try that out
markup_sentence_2 = markup_sentence(example_sentence_2, modifiers1, targets1, verbose = True)
display(markup_sentence_2.nodes(data = True))

[(<id> 77273921323184316770388224004835710279 </id> <phrase> pneumonias </phrase> <category> ['evidence_of_pneumonia'] </category> ,
  {'category': 'target'})]

## We can run the code over an entire document

In [10]:
example_document = """
PORTABLE CHEST:  Comparison made to prior film from X:XX a.m. the same day.
     
The ET tube and nasogastric tube remain in good position. Cardiac and
mediastinal contours are stable. No acute changes are seen within the lung
parenchyma; specifically, there is no evidence of new infiltrate (skin folds
do project over the right lung). No consolidation on either side.

IMPRESSION: No evidence of pneumonia."""



In [11]:
# This function now works on entire documents combining all sentence-level objects into one object we can can then graph
def markup_context_document(report_text, modifiers, targets):
    context = pyConTextGraph.ConTextDocument()
    
    # we will use TextBlob for breaking up sentences
    sentences = [s.raw for s in TextBlob(report_text).sentences]
    for sentence in sentences:
        m = markup_sentence(sentence, modifiers=modifiers, targets=targets)
        context.addMarkup(m)
    
    return context

In [13]:
context = markup_context_document(example_document, modifiers1, targets1)
view_pycontext_output(context)

## Now let's think about modifiers - what modifiers are needed for the targets in our document?

In [14]:
modifiers1 = itemData(["no", "DEFINITE_NEGATED_EXISTENCE", "", "forward"]
                   )

# before we continue, let's clear a mapping of compiled regular expressions which pyConText uses
clearPyConTextRegularExpressions()

# so now let's set this up with more variants of "EVIDENCE_OF_PNEUMONIA"
#full_targets_path = 'file:///' + os.path.join(os.getcwd(), pneumonia_targets_file)
#print('Loading pneumonia targets from : ' + full_targets_path)
#targets = pyConTextNLP.itemData.instantiateFromCSVtoitemData(full_targets_path)

# let's go ahead and use this again on our updated targets
context = markup_context_document(example_document, modifiers1, targets1)
# prettier display with IPython display
#display(context.getDocumentGraph().nodes(data = True))
#print(context.getXML())

Clearing pyConText compiled regular expressions


## Let's look at what pyConText does with the sentences

In [15]:
view_pycontext_output(context)

## Another important attribute that Modifiers can employ : "terminate"
This allows any modifier working forward or backward to stop its modifications if it encounters one of these terms.  Let's demonstrate an example where we want "probable" to modify "arthritis" as a condition but not "pneumonia":

In [16]:
terminate_example_sentence = """probable arthritis but no pneumonia"""

In [20]:
temp_targets = itemData(["pneumonia", "EVIDENCE_OF_PNEUMONIA", "", ""],
                       ["arthritis", "ANOTHER_CONDITION", "", ""])

modifiers_without_terminate = itemData(["probable", "PROBABLE_EXISTENCE", "", "forward"],
                                      ["no", "DEFINITE_NEGATED_EXISTENCE", "", "forward"])



## Without the "terminate" modifier, what will happen? 

In [21]:
clearPyConTextRegularExpressions()
view_pycontext_output(markup_context_document(terminate_example_sentence, modifiers_without_terminate, temp_targets))

Clearing pyConText compiled regular expressions


## If we add a "terminate" modifier, what will happen? 

In [22]:
modifiers_with_terminate = itemData(["probable", "PROBABLE_EXISTENCE", "", "forward"],
                                   ["no", "DEFINITE_NEGATED_EXISTENCE", "", "forward"],
                                   ["but", "CONJ", "", "terminate"])

In [23]:
clearPyConTextRegularExpressions()
view_pycontext_output(markup_context_document(terminate_example_sentence, modifiers_with_terminate, temp_targets))

Clearing pyConText compiled regular expressions


## Multiple mentions of pneumonia in a report may occur, as they did with our sample report. How would you programmatically decide whether the report showed EVIDENCE_OF_PNEUMONIA or NO_EVIDENCE_OF_PNEUMONIA?

<br/><br/>This material presented as part of the DeCART Data Science for the Health Science Summer Program at the University of Utah in 2018.<br/>
Presenters : Dr. Wendy Chapman, Jianlin Shi <br> Acknowledgement: Many thanks to Kelly Peterson, because part of the materials are adopted from his previous work.