Output JSON Specification
The annotation data for each document made by MioGatto is stored in two JSON files: the Math concept dictionary (<ID>_mcdict.json
) and annotation data (<ID>_anno.json
).
This stores the user-created dictionary data for each document. Its top-level keys are:
-
_author
(string): The name of the author who creates the dictionary. -
_mcdict_version
(string): The version of mcdict specification. Currently1.0
. -
concepts
(object): The dictionary of math concepts itself
In the concepts
objects, math concepts are stored for each identifier type. Data for an identifier type is stored in the key of its Unicode hex code point.
"concepts": {
"44": {
"_surface": {
"text": "D",
"unicode_name": "LATIN CAPITAL LETTER D"
}
"identifiers": {
"default": [
{
"affixes": [],
"arity": 0,
"description": "Sample concept 1"
}
],
"roman": [
{
"affixes": [],
"arity": 0,
"description": "Sample concept 2"
}
]
}
}
}
The top-level keys for this are:
-
_anno_version
(string): The version of the annotation data specification. Currently1.0
. -
_annotator
(string): The name of the annotator who creates the dictionary. -
mi_anno
(object): The actual annotation data
Annotation data just stores the ID of the annotated math concept and grounding source information. Each occurrence is represented in the ID of the <mi>
tag. The concept_id
is the position of the concept in the corresponding entry of the dictionary. The sog
is the list of dictionaries about sources of grounding: start
is the starting word ID of the source, stop
is the stopping word ID, and type
is either 0
(declaration), 1
(definition), or 2
(others).
"mi_anno": {
"S1.E1.m1.1.1": {
"concept_id": 0,
"sog": [
{
"start": "S1.SS1.p1.1.1.w1",
"stop": "S1.SS1.p1.1.1.w2",
"type": 0
}
]
},
}