#

corpus-processing

Here are 39 public repositories matching this topic...

BLKSerene / Wordless

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

translation tokenizer corpus linguistics tagger literature dependency-parser corpus-linguistics lemmatizer corpus-tools corpus-processing corpus-search corpus-statistics stopword corpus-analysis

Updated Jul 27, 2024
Python

bitextor

bitextor / bitextor

Bitextor generates translation memories from multilingual websites

Updated Jun 18, 2024
Python

hankcs / TreebankPreprocessing

Python scripts preprocessing Penn Treebank and Chinese Treebank

natural-language-processing corpus-processing

Updated Sep 2, 2020
Python

Helsinki-NLP / OpusFilter

OpusFilter - Parallel corpus processing toolkit

nlp natural-language-processing machine-translation parallel-corpus corpus-tools corpus-processing

Updated Jun 26, 2024
Python

NathanDuran / Switchboard-Corpus

Utilities for Processing the Switchboard Dialogue Act Corpus

dialogue corpus corpus-data corpus-tools switchboard dialogues corpus-processing dialogue-data switchboard-corpus dialogue-act

Updated Jan 24, 2021
Python

StarlangSoftware / Corpus-Py

Corpus processing library

sentence-tokenizer sentence-segmentation corpus-processing turkish-sentence-segmentation turkish-sentence-tokenizer

Updated May 20, 2024
Python

NathanDuran / MRDA-Corpus

Utilities for Processing the Meeting Recorder Dialogue Act Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing dialogue-act

Updated Jan 24, 2021
Python

NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

kennedyCzar / NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual interaction.

Updated Sep 8, 2022
Python

levindoneto / lanGen

N-Gram language model that learns n-gram probabilities from a given corpus and generates new sentences from it based on the conditional probabilities from the generated words and phrases.

natural-language-processing generator n-grams language-modelling corpus-processing ngram-language-model

Updated Feb 8, 2018
Python

versotym / rhymetagger

A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Spanish poetry

language-processing corpus-processing versification

Updated Nov 20, 2021
Python

ku-nlp / kyoto-reader

A processor for KyotoCorpus, KWDLC, and AnnotatedFKCCorpus

japanese coreference corpus-processing pyknp predicate-argument-structure

Updated Jun 26, 2024
Python

johentsch / ms3

A parser for annotated MuseScore 3 files.

Updated May 23, 2024
Python

ringoreality / uniblock

uniblock, scoring and filtering corpus with Unicode block information (and more).

nlp machine-translation corpus-processing emnlp2019

Updated Sep 21, 2019
Python

jonathandunn / corpus_similarity

Measure the similarity of text corpora for 74 languages

nlp language natural-language-processing text corpus corpora corpus-linguistics corpus-tools corpus-processing

Updated Jan 26, 2024
Python

NathanDuran / Maptask-Corpus

Utilities for Processing the HCRC Map Task Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing dialogue-act

Updated Jan 24, 2021
Python

CSCfi / Kielipankki-utilities

Scripts for data conversion

vrt corpus-tools korp corpus-processing

Updated Jul 22, 2024
Python

NathanDuran / bAbI-Tasks-Corpus

Utilities for Processing the bAbi Tasks Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing

Updated Jun 27, 2020
Python

frankier / STIFF

Sense Tagged Instances For Finnish

nlp wsd word-sense-disambiguation linguistic-corpora corpus-processing

Updated Feb 22, 2023
Python

NathanDuran / BT-Oasis-Corpus

Utilities for Processing the BT Oasis Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing dialogue-act

Updated Jan 24, 2021
Python

NathanDuran / DSTC3-Corpus

Utilities for Processing the Dialogue State Tracking Challenge 3 Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing

Updated Jun 27, 2020
Python

Improve this page

Add a description, image, and links to the corpus-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus-processing topic, visit your repo's landing page and select "manage topics."