# Generate GeM annotation

This Jupyter notebook provides a semi-automated annotator for describing the content and layout of multimodal documents according to the schema defined in the Genre and Multimodality (GeM) model (Bateman 2008). 

The annotator is intended to facilitate the process of describing the mass of detail in document layouts, which has been previously identified as a major bottleneck for annotating documents using the GeM model (Thomas 2009; Hiippala 2015).

That being said, this notebook does not generate traditional human-annotated GeM markup, but rather a variant intended to be processed using computers, which is hereby termed *auto-GeM*. Various tools are provided as a part of the <a href="https://github.com/thiippal/gem-tools">gem-tools</a> repository for visualizing auto-GeM annotation.

The notebook is intended to be friendy to novice users: therefore most of the functions reside in a module named *generator*. Advanced users may examine this file for a better understanding of the annotator's operation. 

**References**

Bateman, J.A. (2008) *Multimodality and Genre: A Foundation for the Systematic Analysis of Multimodal Documents*. London: Palgrave.

Hiippala, T. (2015) *The Structure of Multimodal Documents: An Empirical Approach*. New York and London: Routledge.

Thomas, M. (2009) *Localizing pack messages: A framework for corpus-based cross-cultural
multimodal analysis*. PhD thesis, University of Leeds.

## 1. Import the necessary packages.

In [None]:
from generator import *

## 2. Set up the classifier.

In [None]:
model = load_model()

## 3. Process the document image.

#### Preprocess the document image.

For best results, use documents with a resolution of 300 DPI.

In [None]:
image, original, filename, filepath = preprocess("test_images/2005-hwy-side_b-5.jpg")

#### Detect regions of interest in the document image.

Define a kernel for morphological operations.

In [None]:
kernel = (11, 11)
iterations = 2

Detect regions of interest.

In [None]:
contours = detect_roi(image, kernel, iterations)

#### Sort and classify the detected contours.

In [None]:
sorted_contours = sort_contours(contours)

classified_contours, contour_types = classify(sorted_contours, image, model)

#### Draw the detected contours for examination.

In [None]:
Image(filename="output/image_contours.png")

#### Mark false positives and erroneous or missing elements.

Enter their identifiers below.

Separate the numbers with a space (e.g. 11 24 32).

In [None]:
false_positives = false_positives(raw_input())

updated_contours, updated_contour_types = redraw(image, classified_contours, contour_types, false_positives)

Image(filename="output/image_contours_updated.png")

Do you wish to mark additional elements in the document image (**y**/**n**)?

In [None]:
mark = raw_input()

if mark == 'y':
    updated_contours, updated_contour_types = draw_roi(image, updated_contours, updated_contour_types)
else:
    pass

#### Project the contours on the original high resolution document image.

In [None]:
hires_contours = project(image, original, updated_contours)

## 4. Generate annotation.

In [None]:
generate_annotation(filename, original, hires_contours, updated_contour_types)