This repository contains code associated with the article Corpus-based insights into multimodality and genre in primary school science diagrams by Tuomo Hiippala, published in Visual Communication (open access).
To reproduce the results reported in the article, you must first download the following data:
- The Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset (direct download)
- The AI2D-RST corpus (direct download)
Clone this repository and extract the AI2D corpus into the same directory under ai2d
. Then extract the AI2D-RST corpus into the same directory as the AI2D corpus. The directory structure should be as below:
ai2d/
├── ai2d-rst/
├── annotations/
├── images/
└── questions/
└── categories.json
└── categories_ai2d-rst.json
You should also create a fresh virtual environment for Python 3.8+ and install the libraries defined in requirements.txt
using the following command:
pip install -r requirements.txt
ai2d_rst.py
contains the scripts for building the AI2D-RST corpus.
01_extract_features_from_corpus.py
extracts information about multimodal discourse structure from the AI2D-RST corpus.
02_fit_umap.py
creates and plots UMAP embeddings for the AI2D-RST diagrams.
03_fit_umap.py
illustrates how to create and plot multiple embeddings into a single figure.
Questions? Open an issue on GitHub or e-mail me at tuomo dot hiippala @ helsinki dot fi.