Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

This is the repository for the publication

Lukas Lange, Michael A. Hedderich and Dietrich Klakow

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

EMNLP 2019

https://www.aclweb.org/anthology/D19-1362/

Noise-Handling Architectures

In this repository, you can find implementations for the following noisy-label neural network architectures:

Global Noise Matrix from Hedderich & Klakow: Training a Neural Network in a Low-Resource Setting on Automatically, 2018
Feature-Dependent Noise Matrix from Lange et al.: Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels, 2019
Cleaning Model from Veit & al.: Learning from Noisy Large-Scale Datasets with Minimal Supervision, 2017 (adapted to the NLP setting)
Dynamic Transition Matrix from Luo et al.: Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix, 2017 (adapted to the NLP setting)

The code is written for the Named Entity Recognition (NER) setting from the paper but should be easily adaptable to other supervised learning tasks by replacing the data loader and (if needed) the base model architecture.

Installation & Structure

conda create --name noise-matrix-ner python=3.6
source activate noise-matrix-ner
pip install numpy==1.16.2 scikit-learn==0.20.3 jupyter==1.0.0 tensorflow==1.12.0 Keras==2.2.4 
# depending on your hardware, you might want to replace tensorflow with tensorflow-gpu
# fastText needs to be installed manually from https://fasttext.cc/docs/en/support.html (do not use the version from the online pip repo)

The repository has the following structure

code:
- ner.ipynb: Jupyter notebook with the whole experimental pipeline.
- layers.py: Implementation of different special layers in Keras for the noise-handling architectures.
- ner_datacode.py: Utility code for NER, data loading, word embeddings and evaluation.
- noisematrix.py: Utility code for noise matrices, including visualization.
- experimentalsettings.py: Utility code to store experimental configurations.
config: Example configurations for the different experiments.
data: Due to legal reasons, this repository only contains some dummy data. The CoNLL02/03 datasets are widely available. The Estonian NER dataset can be obtained here. Clean and noisy training data need to be parallel (see dummy data). The data needs to be in the CoNLL BIO2 format.

License & Citation

The code is licensed under Apache 2, so feel free to use it in your projects. If you do, please cite us as

@InProceedings{Lange2019FeatureDependent,
  author = "Lange, Lukas and Hedderich, Michael A. and Klakow, Dietrich",
  title = "Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels",
  booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
  year = "2019",
  address = "Hong Kong, China",
  url = "https://www.aclweb.org/anthology/D19-1362",
  doi = "10.18653/v1/D19-1362",
  pages = "3545--3550",
  publisher = "Association for Computational Linguistics"
}

If you use the implementation of the Global Noise Matrix or the Cleaning Model, please also cite

@InProceedings{W18-3402,
  author = "Hedderich, Michael A. and Klakow, Dietrich",
  title = "Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data",
  booktitle = "Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "12--18",
  location = "Melbourne",
  url = "http://aclweb.org/anthology/W18-3402"
}

Contact

If you have any questions, feel free to contact the authors at {llange,mhedderich} at lsv. uni-saarland .de

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code		code
config		config
data/dummy_data		data/dummy_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

config

config

data/dummy_data

data/dummy_data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

Noise-Handling Architectures

Installation & Structure

License & Citation

Contact

About

Releases

Packages

Languages

Navigation Menu

License

uds-lsv/noise-matrix-ner

Folders and files

Latest commit

History

Repository files navigation

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

Noise-Handling Architectures

Installation & Structure

License & Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Languages