Skip to content

teghub/ner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Named Entity Recognition in Turkish news texts using CRF

Conditional Random Fields model for named entity recognition in Turkish news texts which is implemented in Python.

Sample input format (tab seperated) is described below:

Word POS Annotation
Tek Adj O
çatı Noun O
altında Noun O
dokuz Num O
ayrı Adj O
salonda Noun O
gerçekleştirilecek Verb O
Şenlik Noun O
kapsamında Noun O
doksanın Noun O
üzerinde Noun O
etkinlik Noun O
yer Noun O
alacak Verb O

You can also use the trained model ("crf_v2.joblib") to label your test dataset. The output of the model consists of "word - predicted annotation - pos" triple where each item is seperated with tab.

Sample output of the model is given below:

Word Predicted_Annotation POS
Istanbul LOCATION Noun
yüzde PERCENT Noun
2013 DATE Num
Meclis ˙ ORGANIZATION Noun
lira ˙ MONEY Noun
simdi TIME Adv

In order to evaluate the performance of the model, you can execute "CRF_Eval.java". It calculates CONLL F1-score, precision and recall for each annotation type using sequence alignment algorithm.

Citing

If you use this model in an academic publication, please refer to: https://ieeexplore.ieee.org/document/8806523

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published