corpus-processing

Here are 39 public repositories matching this topic...

levindoneto / lanGen

N-Gram language model that learns n-gram probabilities from a given corpus and generates new sentences from it based on the conditional probabilities from the generated words and phrases.

natural-language-processing generator n-grams language-modelling corpus-processing ngram-language-model

Updated Feb 8, 2018
Python

severinsimmler / forpus

Star

Forpus is a Python library for processing plain text corpora to various corpus formats.

python-library python3 corpus-tools corpus-processing corpus-formats

Updated Mar 16, 2018
Python

ringoreality / uniblock

Star

uniblock, scoring and filtering corpus with Unicode block information (and more).

nlp machine-translation corpus-processing emnlp2019

Updated Sep 21, 2019
Python

jvansoest / histwords

Star

Collection of tools for building diachronic/historical word vectors

nlp word2vec embeddings corpus-processing pmi-svd

Updated Oct 30, 2019
Python

NathanDuran / SCoSE-Corpus

Star

Utilities for Processing the Saarbrücken Corpus of Spoken English

dialogue corpus corpus-data corpus-tools dialogues corpus-processing

Updated Jun 27, 2020
Python

NathanDuran / bAbI-Tasks-Corpus

Star

Utilities for Processing the bAbi Tasks Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing

Updated Jun 27, 2020
Python

NathanDuran / DSTC3-Corpus

Star

Utilities for Processing the Dialogue State Tracking Challenge 3 Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing

Updated Jun 27, 2020
Python

NathanDuran / FRAMES-Corpus

Star

Utilities for Processing the FRAMES Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing

Updated Jun 27, 2020
Python

hankcs / TreebankPreprocessing

Star

Python scripts preprocessing Penn Treebank and Chinese Treebank

natural-language-processing corpus-processing

Updated Sep 2, 2020
Python

cadia-lvl / diar-az

Star

Diarization A to Z - Kaldi to Gecko to Kaldi and corpus and back

parsing corpus-processing diarization rttm

Updated Nov 23, 2020
Python

Krutash / Vector-Space-IR-model

Star

We designed an Information Retrieval system based on Vector Space model in python. We Also have implemented Bi gram Indices for Phrasal query search and Champion List retrieval. We also compared time of whole retrieving in our project report.

information-retrieval information-extraction vector-space-model information-technology corpus-processing phrase-extraction bigram-model corpus-search champion-list

Updated Jan 8, 2021
Python

NathanDuran / Maptask-Corpus

Star

Utilities for Processing the HCRC Map Task Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing dialogue-act

Updated Jan 24, 2021
Python

NathanDuran / MRDA-Corpus

Star

Utilities for Processing the Meeting Recorder Dialogue Act Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing dialogue-act

Updated Jan 24, 2021
Python

NathanDuran / BT-Oasis-Corpus

Star

Utilities for Processing the BT Oasis Corpus

dialogue corpus corpus-data corpus-tools dialogues corpus-processing dialogue-act

Updated Jan 24, 2021
Python

NathanDuran / Switchboard-Corpus

Star

Utilities for Processing the Switchboard Dialogue Act Corpus

dialogue corpus corpus-data corpus-tools switchboard dialogues corpus-processing dialogue-data switchboard-corpus dialogue-act

Updated Jan 24, 2021
Python

mosesab / Language-Text-Extraction-

Star

Gets text and extracts sentences in a language from text using that language's lexicon.

nlp natural-language-processing corpus python3 python-programming english languages text-processing language-resources language-processing python-standard-library corpus-processing corpus-search

Updated Sep 26, 2021
Python

versotym / rhymetagger

Star

A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Spanish poetry

language-processing corpus-processing versification

Updated Nov 20, 2021
Python

antcont / LEXB

Star

Python scripts for the construction of the LEXB parallel corpus of South Tyrolean legislation (IT-DE).

machine-translation web-scraping corpus-tools corpus-processing tmx-parser tmx-cleaning

Updated Jan 23, 2022
Python

jamnicki / split-corpus

Star

Split-corpus package that provide dividing text corpora into the meaningful parts as close to specified size as possible.

processing nlp natural-language-processing corpora large-files corpus-processing

Updated Feb 8, 2022
Python

petar-popovic-bg / Jerteh

Star

This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.

nlp ocr text-processing corpus-linguistics nlp-parsing unitexgramlab corpus-tools treetagger corpus-processing

Updated Mar 4, 2022
Python

Improve this page

Add a description, image, and links to the corpus-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus-processing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus-processing

Here are 39 public repositories matching this topic...

levindoneto / lanGen

severinsimmler / forpus

ringoreality / uniblock

jvansoest / histwords

NathanDuran / SCoSE-Corpus

NathanDuran / bAbI-Tasks-Corpus

NathanDuran / DSTC3-Corpus

NathanDuran / FRAMES-Corpus

hankcs / TreebankPreprocessing

cadia-lvl / diar-az

Krutash / Vector-Space-IR-model

NathanDuran / Maptask-Corpus

NathanDuran / MRDA-Corpus

NathanDuran / BT-Oasis-Corpus

NathanDuran / Switchboard-Corpus

mosesab / Language-Text-Extraction-

versotym / rhymetagger

antcont / LEXB

jamnicki / split-corpus

petar-popovic-bg / Jerteh

Improve this page

Add this topic to your repo