Skip to content

Output JSON Specification

Takuto Asakura edited this page May 5, 2023 · 5 revisions

The annotation data for each document made by MioGatto is stored in two JSON files: the Math concept dictionary (<ID>_mcdict.json) and annotation data (<ID>_anno.json).

Math concept dictionary (v1.0)

This stores the user-created dictionary data for each document. Its top-level keys are:

  • _author (string): The name of the author who creates the dictionary.
  • _mcdict_version (string): The version of mcdict specification. Currently 1.0.
  • concepts (object): The dictionary of math concepts itself

In the concepts objects, math concepts are stored for each identifier type. Data for an identifier type is stored in the key of its Unicode hex code point.

    "concepts": {
        "44": {
            "_surface": {
                "text": "D",
                "unicode_name": "LATIN CAPITAL LETTER D"
            }
            "identifiers": {
                "default": [
                    {
                        "affixes": [],
                        "arity": 0,
                        "description": "Sample concept 1"
                    }
                ],
                "roman": [
                    {
                        "affixes": [],
                        "arity": 0,
                        "description": "Sample concept 2"
                    }
                ]
            }
        }
    }

The annotation data (v1.0)

The top-level keys for this are:

  • _anno_version (string): The version of the annotation data specification. Currently 1.0.
  • _annotator (string): The name of the annotator who creates the dictionary.
  • mi_anno (object): The actual annotation data

Annotation data just stores the ID of the annotated math concept and grounding source information. Each occurrence is represented in the ID of the <mi> tag. The concept_id is the position of the concept in the corresponding entry of the dictionary. The sog is the list of dictionaries about sources of grounding: start is the starting word ID of the source, stop is the stopping word ID, and type is either 0 (declaration), 1 (definition), or 2 (others).

    "mi_anno": {
        "S1.E1.m1.1.1": {
            "concept_id": 0,
            "sog": [
                {
                    "start": "S1.SS1.p1.1.1.w1",
                    "stop": "S1.SS1.p1.1.1.w2",
                    "type": 0
                }
            ]
        },
    }