Skip to content

A repository for the article "Corpus-based insights into multimodality and genre in primary school science diagrams" published in Visual Communication (2023)

License

Notifications You must be signed in to change notification settings

thiippal/diagrams-genre

Repository files navigation

Corpus-based insights into multimodality and genre in primary school science diagrams

Description

This repository contains code associated with the article Corpus-based insights into multimodality and genre in primary school science diagrams by Tuomo Hiippala, published in Visual Communication (open access).

Preliminaries

To reproduce the results reported in the article, you must first download the following data:

  1. The Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset (direct download)
  2. The AI2D-RST corpus (direct download)

Clone this repository and extract the AI2D corpus into the same directory under ai2d. Then extract the AI2D-RST corpus into the same directory as the AI2D corpus. The directory structure should be as below:

ai2d/
├── ai2d-rst/
├── annotations/
├── images/
└── questions/
└── categories.json
└── categories_ai2d-rst.json

You should also create a fresh virtual environment for Python 3.8+ and install the libraries defined in requirements.txt using the following command:

pip install -r requirements.txt

Codebase

ai2d_rst.py contains the scripts for building the AI2D-RST corpus.

01_extract_features_from_corpus.py extracts information about multimodal discourse structure from the AI2D-RST corpus.

02_fit_umap.py creates and plots UMAP embeddings for the AI2D-RST diagrams.

03_fit_umap.py illustrates how to create and plot multiple embeddings into a single figure.

Contact

Questions? Open an issue on GitHub or e-mail me at tuomo dot hiippala @ helsinki dot fi.