Skip to content
German Morphological Processing for Word Embeddings & Named Entity Recognition
Python
Branch: master
Clone or download
Pull request Compare This branch is 4 commits ahead, 4 commits behind FID-Biodiversity:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
out
LICENSE
README.md
morphProcessing.py

README.md

German Morphological Processing for Word Embeddings & Named Entity Recognition

This short script performs a grammar-dependent morphological processing of the raw text data. Such data can be either be a large text corpus used for computing the word embeddings or a smaller labeled dataset used for training the neural network according to a given downstream-task (e.g. named entity recognition). Using this script prior to any training process improves the quality of the original resources, utimately leading to an increase of the final performance.

The pre-trained word embeddings produced with this morphological processing are provided (under the CC-BY-4.0 license) at the following link.

NOTE: The results of this script (i.e. (1) word embeddings & (2) labled datasets) can be used to train the NER Tagger for reproducing and evaluating the performance boost. Further details can be found in the reference below. Please cite the reference if you happen to use it in your work.

Requirements

Data

Unlabeled text corpora

Labeled datasets for German named entity recognition

Cite

Sajawel Ahmed and Alexander Mehler, "Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora" in Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018. [PDF]

BibTeX

@InProceedings{Ahmed:Mehler:2018,
author		= {Sajawel Ahmed and Alexander Mehler},
title		= {{Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora}},
booktitle	= {Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA)},
location	= {Orlando, Florida, USA},
pdf		= {https://arxiv.org/pdf/1807.10675.pdf},
year		= 2018
}
You can’t perform that action at this time.