Central repository for QA-SRL data.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
FORMAT.md
LICENSE
README.md
download.sh

README.md

The QA-SRL Bank

This repository is the reference point for QA-SRL Bank 2.0, the dataset described in the paper Large-Scale QA-SRL Parsing.

The data may be downloaded here or you can clone this repository and run ./download.sh.

Contents

When you run ./download.sh, the dataset will be downloaded and expanded into the data/qasrl-v2/ directory. Its contents are as follows:

  • data/qasrl-v2:
    • orig/: The original data gathered on MTurk, where workers wrote the questions.
    • expanded/: The expanded dataset with model-generated questions and answers gathered in our expansion round. Train and dev only.
    • dense/: The densely annotated data, combining the expanded data with extra model-generated questions and judgments from turkers on a 5k-sentence subset of dev and test.
    • index.json.gz: An index of the documents that were used across all partitions, with metadata.

If you are modeling the data, you will probably be using orig or expanded for training and tuning, and orig and dense for evaluation. Metadata is included in each set allowing you to determine which round a question or answer judgment originated from.

See the Data Format description for details on how the data files are laid out.

Using the QA-SRL Bank

Once you have downloaded it, you can use your favorite JSON parsing or data reading library to process and iterate through it. However, there are some options already available:

  • If you're using Python (or particularly AllenNLP), you can use the dataset reading code from our model.
  • If you're using Scala, we have a client library.
  • If you're using something else and write your own, please contribute it (or a reference to it)!