No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Source code for our paper:

Aggregating and Predicting Sequence Labels from Crowd Annotations
An Thanh Nguyen, Byron C. Wallace, Junyi Jessy Li, Ani Nenkova and Matthew Lease
Association for Computational Linguistics (ACL)

The crowd NER dataset is from Rodrigues et al. (2014):

For LSTM-Crowd, our implementation extends Lample et al. (2016):

Data for the Biomedical IE task:


Requirements: Python 2

Installing required packages using virtualenv (so it does not collide with system dependencies)

$ virtualenv env-python2 --python=python2
$ source env-python2/bin/activate
(env-python2) $ pip install sklearn numpy nltk scipy python-crfsuite matplotlib

To run the experiment on NER Task 1, extract the Rodrigues et al. data in then execute: 

python -i task1/val/ -o output.txt

to run on the validation set or 

python -i task1/test/ -o output.txt

to run on the test set.

Execute python --help for more options.


Rodrigues, Filipe, Francisco Pereira, and Bernardete Ribeiro. "Sequence labeling with multiple annotators." Machine Learning 95.2 (2014): 165-181.

Lample, Guillaume, et al. "Neural architectures for named entity recognition." arXiv preprint arXiv:1603.01360 (2016).