NLP toolkit for 20+ Turkic languages — a pip-installable open-source Python library with adaptations for the low-resource, morphologically rich Turkic language family.
Maintained by Sherzod Hakimov
If you use TurkicNLP in your research, please cite:
@software{turkicnlp,
title = {TurkicNLP: NLP Toolkit for Turkic Languages},
author = {Sherzod Hakimov},
year = {2026},
url = {https://github.com/turkic-nlp/turkicnlp},
license = {Apache-2.0},
}- 24 Turkic languages from Turkish to Sakha, Kazakh to Uyghur
- Script-aware from the ground up — Latin, Cyrillic, Perso-Arabic, Old Turkic Runic
- Automatic script detection and bidirectional transliteration
- Morphology analyser for ~20 Turkic languages
- Universal dependencies integration — pretrained tokenization, POS tagging, lemmatization, dependency parsing, and NER
- Pretrained embeddings + translation backend — get vectors for sentences and translate across many languages
- License - Apache-2.0
https://github.com/turkic-nlp/turkicnlp
pip install turkicnlpTo install all required dependencies at once:
pip install "turkicnlp[all]"With optional dependencies:
pip install "turkicnlp[stanza]" # Stanza/UD neural models
pip install "turkicnlp[nllb]" # NLLB embeddings and translation backend (transformers, tokenizer libraries)
pip install "turkicnlp[all]" # Everything: stanza, NLLB embeddings & translations
pip install "turkicnlp[dev]" # Development tools