Skip to content
@turkish-nlp-suite

Turkish NLP Suite

Premium tools and datasets for Turkish NLP. Find your sentiment analysis, NER and treebank datasets for Turkish NLP, as well as Turkish spaCy models here.

nlp suite banner

👋 Welcome to Turkish NLP Suite!

Turkish NLP Suite is a non-profit organization dedicated to Turkish NLP. We create open source corpora, pretrained models, code , tutorials and all types of linguistic resources for Turkish natural language processing. All of our code is cutting-edge, our models are easy to install and use, tutorials are great to get started .. This is state-of-art Turkish NLP after all.

💥 We ❤️ spaCy

That's true, we love spaCy because of the blazing fast code, great architecture, flexible pipelining, detailed documentation and awesome ecosystem. We proudly present spaCy Turkish models:

  • tr_core_web_md
  • tr_core_web_lg
  • tr_core_web_trf

All pipelines contains a tokenizer, trainable lemmatizer, POS tagger, dependency parser, morphologizer and NER components. You can find out more about each model in the dedicated repo and download the models from HuggingFace.

spaCy Turkish models comes with comprehensive tutorials and code. Please visit the documentation section for the details.

😎 We love cutting edge NLP

We corporate modern techniques into all our work including transformers, GPU computing as well as using the most efficient data structures. For some examples, the brand new Turkish spaCy model tr_core_web_trf is a transformer based pipeline; mini project "Quick FAQ Chatbot" integrates sentence-transformers and more.

📘 We love compiling datasets

Modern NLP revolves around data, hence labelled data (even in the msall amounts) are crucial for improving quality of many NLP tasks. As a result, compiling and serving Turkish datasets lies at the core of Turkish NLP Suite project. All of our datasets are presented with a commercial licence, completely open-source and ready to use. We also use our datasets in our projects and tutorials. Here's a list of our datasets:

For the details and data please visit the dedicated repos of the datasets. We also provide guidance and documentations for the ones who would like to compile their own datasets. If you're looking for creating your own datasets, please visit the documentation section.

👷 We love mining our datasets

Surely we like to mine some good Turkish datasets 😉 If you'd like to do some data mining together, you can have a look at our video series Quick recipes with spaCy Turkish and Quick FAQ Bot; or even better read the Medium blog post about how sentiment turned into political heat after earthquake disaster.

🎥 We love documentation: Turkish NLP Youtube Channel

If you like doing some pair programming, please visit our Turkish NLP Youtube channel. Here's a list of playlists:

Get started

There are several paths to get started indeed. If you're already working with text and speech data, you can dive into Turkish only parts safely. This path includes information about Turkish linguistics, then application code. One can watch

  • All about Turkish linguistics
  • Quick recipes with spaCy Turkish
  • Quick FAQ Bot
  • How to train spaCy models
  • Semantic Web

If you're a junior/student or didn't work on NLP problems before, we suggest starting from the beginning. This path includes the foundational series "NLP dataset formats" and "How to compile NLP datasets". After warming up to NLP tasks and data conventions, you can dive inot the most advanced parts above 😉

Latest Blog Posts

Publications

Google ML Developer Programs team supported this work by providing Google Cloud Credit. Many thanks to Google Developer Experts for their generous contributions!

Pinned

  1. turkish-spacy-models turkish-spacy-models Public

    Repo for spaCy Turkish model development.

    41 3

  2. Turkish-Wiki-NER-Dataset Turkish-Wiki-NER-Dataset Public

    Repo for Turkish Wiki NER dataset.

    12 1

  3. hizli-faq-bot hizli-faq-bot Public

    Code repo for small project, "spaCy ve semantic searchle hızlı FAQ botu"

    Python 4 1

  4. BeyazPerde-Movie-Reviews BeyazPerde-Movie-Reviews Public

    Repo for Turkish movie reviews dataset.

    4 1

  5. Corona-mini-dataset Corona-mini-dataset Public

    Small Turkish corpus of Corona symptoms.

    1

  6. Vitamins-Supplements-NER-dataset Vitamins-Supplements-NER-dataset Public

    Repo for Turkish Vitamins and Supplements NER dataset.

    3 1

Repositories

Showing 10 of 13 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…