OntoGUM: Evaluating Contextualized SOTA Coreference Resolution on 12 More Genres


This repository contains the code for building up the OntoGUM dataset from:


  1. Python >= 3.6
  2. Download GUM from and put the folder in the home directory of this repo

Rebuilding the dataset

You can either use the scripts in this repository or the built bot from GUM to rebuild the dataset.

  • To rebuild the dataset using this repo:
  1. Run to start the conversion after following the prerequisites.
  2. Adjust the arguments in to output different formats. Please note that if you want to test models trained on OntoNotes, the conll format is needed.
  • To rebuild the dataset from GUM:

    Also Check here for differences between GUM and OntoNotes schema.

  1. Follow the instructions in the GUM repo to build up the dataset (including reddit data)
  2. Find the OntoGUM data (tsv and conll) under /gum/_build/target/coref/ontogum


Two output formats are currently supported: tsv and conll. The default output is tsv. If you would like to have the conll format, specify it with the argument --out_format


To straightforwardly view a coref document, copy & paste the tsv file to Spannotator. If you want to visualize the predictions from SpanBert, go to utils and run to generate the tsv file from predicted output json file.

Testing SpanBert

  1. Go to utils and run python to build up the dataset. It will generate train, dev (including by-genre set), and test set under ./dataset
  2. Follow the instructions in SpanBert. Note that change the data directory.

Testing dcoref

  1. Go to utils and run ./


Model OntoNotes OntoGUM
dcoref 57.8 39.7
SpanBert 79.6 64.6


