MioGatto: Math Identifier-oriented Grounding Annotation Tool

System requirements

Python3 (3.9 or later)
A Web Browser with MathML support (for the GUI annotation system)
- Firefox is recommended

Installation

The dependencies will be all installed with one shot:

python -m pip install -r requirements.txt

In case you don't want to install the dependencies into your system, please consider using venv.

Project structure

Files in this repository

All the components of MioGatto is included in this repository:

lib/ contains the project library.
server/ contains the implementation of the server.
client/ contains the implementation of the client.
tools/ contains our utility Python scripts.

Files not in this repository

On the other hand, the annotation data is not included in this repository due to the NDA constrain for the arXMLiv dataset. The data is licensed to SIGMathLing members as Dataset for Grounding of Formulae. Please consider joining SIGMathLing to acquire the dataset.

arxmliv/ contains the original documents from the arXMLiv dataset
data/ contains the annotation data
sources/ contains the preprocessed documents

Annotator's guide

For the guide with GIF animation, please refer to our Wiki:

https://github.com/wtsnjp/MioGatto/wiki/Annotator's-Guide

Prepare the input and analyze the annotated data (Advanced)

The Python scripts under the tools directory are mostly for the developers for the grounding dataset. The --help (-h) option is available for all scripts and should provide guides to their basic usages.

Preparing data

As mentioned above, the HTML5 files in the arXMLiv dataset are suitable as the input document for MioGatto. Alternatively, you can provide the equivalent HTML5 files from LaTeX sources by using LaTeXML:

latexmlc --preload=[nobibtex,ids,mathlexemes,localrawstyles]latexml.sty --format=html5 --pmml --cmml --mathtex --nodefaultresources --dest=<output HTML file> <input TeX file>

Then you can give the HTML5 files to our preprocess script:

python -m tools.preprocess <HTML file>

This will output the preprocessed HTML file to the sources/ and generate the initialized JSON files for the annotation to the data/ by default. Please refer to the help message for the options.

python -m tools.preprocess -h

Analysing the annotation results

For the basic analyses for annotation data, execute:

python -m tools.analyzer <paper id>

Some supplemental files including graph images will be saved in the results directory as default.

Similarly, analyses for the sources of grounding annotation can be performed with the tools.sog script.

python -m tools.sog <paper id>

To calculate agreements between data by two annotators, execute:

python -m tools.agreement --target=<path to annotator's data dir> <paper id>

Developing client

The client is developed with TypeScript. All development tools will be installed with:

cd client
npm install

To compile the client source client/index.ts, execute the following in the client directory:

npm run build

Publications

Takuto Asakura, Yusuke Miyao. What Is Needed for Intra-document Disambiguation of Math Identifiers?. In Proceedings of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).
[paper] [bib] [poster]
Aamin Dev, Takuto Asakura, Rune Sætre. An Approach to Co-reference Resolution and Formula Grounding for Mathematical Identifiers using Large Language Models. In Proceedings of The 2nd Workshop on Mathematical Natural Language Processing (MathNLP 2024).
[paper] [bib]
Takuto Asakura, Yusuke Miyao, Akiko Aizawa. Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers. In Proceedings of 13th Conference on Language Resources and Evaluation (LREC 2022). pp. 4851―4858, 2022.
[paper] [bib] [slides] [video] [resource]
Takuto Asakura, Yusuke Miyao, Akiko Aizawa, Michael Kohlhase. MioGatto: A Math Identifier-oriented Grounding Annotation Tool. In 13th MathUI Workshop at 14th Conference on Intelligent Computer Mathematics (MathUI 2021).
[preprint] [paper] [slides] [code]
Takuto Asakura, André Greiner-Petter, Akiko Aizawa, Yusuke Miyao. Towards Grounding of Formulae. In Proceedings of First Workshop on Scholarly Document Processing (SDP 2020). pp. 138―147, 2020.
[paper] [bib] [poster] [resource]
Takuto Asakura, André Greiner-Petter, Akiko Aizawa, Yusuke Miyao. Dataset Creation for Grounding of Formulae. In SCIDOCA 2020.
[slides] [resource]

Acknowledgements

This project has been supported by JST, ACT-X Grant Number JPMJAX2002, Japan.

License

This software is licensed under the MIT license.

Third-party software

jQuery: Copyright JS Foundation and other contributors. Licensed under the MIT license.
jQuery UI: Copyright jQuery Foundation and other contributors. Licensed under the MIT license.

Takuto Asakura (wtsnjp)

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
client		client
lib		lib
server		server
static		static
tools		tools
.flake8		.flake8
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MioGatto: Math Identifier-oriented Grounding Annotation Tool

System requirements

Installation

Project structure

Files in this repository

Files not in this repository

Annotator's guide

Prepare the input and analyze the annotated data (Advanced)

Preparing data

Analysing the annotation results

Developing client

Publications

Acknowledgements

License

Third-party software

About

Releases

Packages

Contributors 4

Languages

License

wtsnjp/MioGatto

Folders and files

Latest commit

History

Repository files navigation

MioGatto: Math Identifier-oriented Grounding Annotation Tool

System requirements

Installation

Project structure

Files in this repository

Files not in this repository

Annotator's guide

Prepare the input and analyze the annotated data (Advanced)

Preparing data

Analysing the annotation results

Developing client

Publications

Acknowledgements

License

Third-party software

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages