Skip to content
#

computational-linguistics

Here are 12 public repositories matching this topic...

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …

  • Updated May 17, 2024
  • C++

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate a…

  • Updated Nov 16, 2023
  • C++

Improve this page

Add a description, image, and links to the computational-linguistics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the computational-linguistics topic, visit your repo's landing page and select "manage topics."

Learn more