# Calculating Agreement for Brat annotations

Now we have your annotations ready and have learned the agreement formulas, let's try some exercises to calculate the agreement betwee each other.

Although the formulas are simple, efficiently getting the numbers in the contingency table is not trivial. We have provided an optimized function for you here (If you are interested how we implemented it, check [here](./compare_utils.py). ). Let's try it out.


In [1]:
# import packages
import os
from compare_utils import compare_projects
from IPython.display import HTML

## 1. Initiate the directories

First, we need to tell compare who against who. In Brat, annotations are saved in directories, thus the question is equivalent to compare which directory against which.

If you are not sure what directories you should look for, check the list here:
https://brat.jupyter.med.utah.edu/#/student_folders/

In [2]:
# tell where is the projects located
annotator_a='sample1'
annotator_b='sample2'

In [3]:
# convert the project name to real directory path

brat_projects_loc=os.path.abspath('../../BRAT')
annotator_a=os.path.join(brat_projects_loc, annotator_a)
annotator_b=os.path.join(brat_projects_loc, annotator_b)

# you could try to print annotator_a and annotator_b out to see where they are

## 2. Strict comparison

**compare_projects** is the function that we wrapped up the meat in. It takes in 2~3 paramters:
1. Your directory 
2. The directory that you want to compare against
3. compare method ('strict' or 'relax')

It turns a dictionary of evaluators with annotation types as the key, an Evaluator as the value. The Evaluator class will contain all the numbers in the contingency table we need.

In [4]:
evaluators = compare_projects(annotator_a, annotator_b, 'strict')

In [5]:
evaluators

{'DOCUMENT_PNEUMONIA_NO': <compare_utils.Evaluator at 0x7fd60bef9860>,
 'DOCUMENT_PNEUMONIA_YES': <compare_utils.Evaluator at 0x7fd60bef9908>,
 'SPAN_POSITIVE_PNEUMONIA_EVIDENCE': <compare_utils.Evaluator at 0x7fd60beece10>}

In [6]:
for type_name, evaluator in evaluators.items():
    print(type_name)
    a,b,c,d=evaluator.get_values()
#   now you can print these numbers
    print(a,b,c,d)
#   or display in a contingency table
    display(evaluator.display_values())

SPAN_POSITIVE_PNEUMONIA_EVIDENCE
1 3 2 None


Unnamed: 0,B+,B-
A+,1,3.0
A-,2,


DOCUMENT_PNEUMONIA_YES
0 1 1 None


Unnamed: 0,B+,B-
A+,0,1.0
A-,1,


DOCUMENT_PNEUMONIA_NO
2 0 0 None


Unnamed: 0,B+,B-
A+,2,0.0
A-,0,


## 3. Relaxed comparsion
When comparin mention level annotations, it is more useful to use relaxed comparision -- consider a match if an annotation of annotator A overlaps with the annotator B's. For instance, "Left lower lobe pneumonia" vs "pneumonia".

In [7]:
# the code is very similar to the above
evaluators = compare_projects(annotator_a, annotator_b, 'relax')

In [8]:
for type_name, evaluator in evaluators.items():
    print(type_name)
    a,b,c,d=evaluator.get_values()
#   now you can print these numbers
    print(a,b,c,d)
#   or display in a contingency table
    display(evaluator.display_values())

SPAN_POSITIVE_PNEUMONIA_EVIDENCE
2 2 1 None


Unnamed: 0,B+,B-
A+,2,2.0
A-,1,


DOCUMENT_PNEUMONIA_YES
0 1 1 None


Unnamed: 0,B+,B-
A+,0,1.0
A-,1,


DOCUMENT_PNEUMONIA_NO
2 0 0 None


Unnamed: 0,B+,B-
A+,2,0.0
A-,0,


## 4. Show the disagreement

Now we are wondering where are the disagreement annotations. Evaluator saved that information as well. Let's try to display them.

### 4.1 Show the annotations in annotator_a, but not annotator_b (false positive)

In [11]:
def snippets_markup(annotated_doc_map, display_types={'SPAN_POSITIVE_PNEUMONIA_EVIDENCE'}, width=900, height=400):
	if len(annotated_doc_map) == 0:
		print('No documents to display.')
		return
	div_config1 = '<div style="background-color:#f9f9f9;padding-left:10px;' \
				  'width: ' + str(width - 23) + 'px; ">'
	div_config2 = '<div style="background-color:#f9f9f9;padding:10px; ' \
				  'width: ' + str(width) + 'px; height: ' + str(height) + 'px; overflow-y: scroll;">'
	html = ["<html>", div_config1, "<table width=100% >",
			"<col style=\"width:25%\"><col style=\"width:75%\">", "</div>",
			"<tr><th style=\"text-align:center\">document name</th><th style=\"text-align:center\">Snippets</th></table></div>",
			div_config2,
			"<table width=100% ><col style=\"width:25%\"><col style=\"width:75%\">"]
	for doc_name, anno_doc in annotated_doc_map.items():
		html.extend(snippet_markup(doc_name, anno_doc, display_types))
	html.append("</table></div>")
	html.append("</html>")
	return ''.join(html)
def snippet_markup(doc_name, annotations):
	from pyConTextNLP.display.html import __insert_color
	html = []
	color = 'blue'
	window_size = 50
	html.append("<tr>")
	html.append("<td style=\"text-align:left\">{0}</td>".format(doc_name))
	html.append("<td></td>")
	html.append("</tr>")
	for anno in annotations:
		if anno.type in display_types:
			#           make sure the our snippet will be cut inside the text boundary
			begin = anno.start_index - window_size
			end = anno.end_index + window_size
			begin = begin if begin > 0 else 0
			end = end if end < len(anno_doc.text) else len(anno_doc.text)
			#           render a highlighted snippet
			cell = __insert_color(anno_doc.text[begin:end], [anno.start_index - begin, anno.end_index - end], color)
			#           add the snippet into table
			html.append("<tr>")
			html.append("<td></td>")
			html.append("<td style=\"text-align:left\">{0}</td>".format(cell))
			html.append("</tr>")
	return html

In [15]:
for type_name, evaluator in evaluators.items():
    print(type_name)
    fps=evaluator.get_fps()
    print(fps)

SPAN_POSITIVE_PNEUMONIA_EVIDENCE
OrderedDict([('subject_id_148_hadm_id_27941', [<nlp_pneumonia_utils.Annotation object at 0x7fd60bef9f98>]), ('subject_id_105_hadm_id_27261', [<nlp_pneumonia_utils.Annotation object at 0x7fd60bef9dd8>])])
DOCUMENT_PNEUMONIA_YES
OrderedDict([('subject_id_148_hadm_id_27941', [<nlp_pneumonia_utils.Annotation object at 0x7fd60bef9400>])])
DOCUMENT_PNEUMONIA_NO
OrderedDict()
