This is the code used in the paper:
"A Consolidated Open Knowledge Representation for Multiple Texts"
Rachel Wities, Vered Shwartz, Gabriel Stanovsky, Meni Adler, Ori Shapira, Shyam Upadhyay, Dan Roth, Eugenio Martinez Camara, Iryna Gurevych and Ido Dagan. LSDSem 2017. link (TBD).
The dataset developed for the paper can be found here (TBD).
Prerequisites:
- Python 2.7
- numpy
- bsddb
- spacy
- stop-words
Quick Start:
The repository contains the following directories:
- src - the source files - used to load the OKR graph (common), compute inter-annotator agreement (agreement), and automatically construct the OKR object (baseline_system).
- resources - used by the baseline system.
- data - the annotation files used to compute the inter-annotator agreement (agreement) and the development and test set used in the baseline system (baseline).
From src/baseline_system: python compute_baseline_subtasks.py ../../data/baseline/dev ../../data/baseline/test
In the entity mentions components, the F1 score we originaly reoprted was 0.58. We managed to raise it to 0.61 by changing spacy tokenization. If you want the original code that returns the original 0.58 score, set GET_ORIGINAL_SCORE to True in line 22 in eval_entity_mention.py.
The entailment component requires resources. The entity entailment resource files are found in the resources directory. The predicate entailment file is much larger, and we therefore provide the script to build it from the original resource (reverb_local_clsf_all.txt from here).
TBD