The QA-SRL Bank
This repository is the reference point for QA-SRL Bank 2.0, the dataset described in the paper Large-Scale QA-SRL Parsing.
The data may be downloaded here or you can clone this
repository and run
When you run
./download.sh, the dataset will be downloaded and expanded into the
directory. Its contents are as follows:
orig/: The original data gathered on MTurk, where workers wrote the questions.
expanded/: The expanded dataset with model-generated questions and answers gathered in our expansion round. Train and dev only.
dense/: The densely annotated data, combining the
expandeddata with extra model-generated questions and judgments from turkers on a 5k-sentence subset of dev and test.
index.json.gz: An index of the documents that were used across all partitions, with metadata.
If you are modeling the data, you will probably be using
expanded for training and
dense for evaluation.
Metadata is included in each set allowing you to determine which round a question or answer judgment
See the Data Format description for details on how the data files are laid out.
Using the QA-SRL Bank
Once you have downloaded it, you can use your favorite JSON parsing or data reading library to process and iterate through it. However, there are some options already available: