#

corpus

Here are 17 public repositories matching this topic...

INL / BlackLab

Linguistic search for large annotated text corpora, based on Apache Lucene

Updated Jun 18, 2024
Java

kcutils / kc2tei

This repository contains program source code of a converter that can transform Kiel Corpus files into standardised TEI-XML files.

java converter grammar corpus linguistics ipa phonetics digital-humanities xsampa tei tei-xml kiel sablecc international-phonetic-alphabet tei-files prolab kiel-corpus

Updated May 17, 2024
Java

alyssarose05 / SearchEngine

This is a search engine that searches through a given corpus for queries.

java search-engine corpus school-project inverted-index stoplist porter-stemmer

Updated Apr 10, 2024
Java

motazsaad / shami-corpus

Shami Dialect Corpus (SDC)

corpus arabic arabic-nlp

Updated Aug 25, 2023
Java

INL / clariah-fcs-endpoints

REST endpoints for CLARIAH Federated Content Search

corpus fcs clariah

Updated Jan 22, 2024
Java

YuyuZha0 / corpus

an search engine for classic Chinese poetry

search-engine corpus lucene vertx-web chinese-poetry

Updated Feb 10, 2023
Java

INL / OpenConvert

Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)

conversion corpus

Updated Feb 11, 2022
Java

QuoVadis

pauldiac / QuoVadis

QuoVadis: annotation of Entities and Relations, initial Ph.D. work

nlp natural-language-processing annotation corpus nlp-resources nlp-web-services

Updated Apr 9, 2020
Java

mthebaud / wiki-pt-extractor

Automated Wikipedia page content extractor

java wiki wikipedia extractor corpus jsoup corpora

Updated Oct 29, 2019
Java

uma-pi1 / OPIEC

Reading the data from OPIEC - an Open Information Extraction corpus

nlp natural-language-processing wiki wikipedia corpus information-extraction dataset corpora corpus-data nlp-resources wikipedia-dump corpus-tools natural-language-understanding open-information-extraction dataset-interface wikipedia-corpus corpus-processing nlp-datasets

Updated Jun 12, 2019
Java

MatthewWolff / MarkovChainBot

Uses markov chains and a corpus of text to respond to conversation

markov-model chatbot markov-chain corpus artificial-intelligence language-model cs-540

Updated Dec 5, 2017
Java

erayerdin / corpustk

A text management tool for linguistic purposes...

corpus linguistics text-processing corpus-linguistics

Updated Nov 16, 2017
Java

twinters / textfiles-com-scraper

⛏️📄 Script to scrape all files linked on a textfiles.com page

scraper corpus corporate textfiles textfilescom

Updated Aug 15, 2017
Java

justhalf / weak-semi-crf-naacl2016

The code for Weak Semi CRF (together with Linear CRF and Semi CRF) on new SMSNP dataset.

nlp naacl crf sms corpus

Updated Jun 21, 2017
Java

yjham2002 / richGBot

📖 Probabilistic model and Deep Learning based Korean NLP Engine

java nlp machine-learning deep-learning chatbot nlu corpus korean nlp-machine-learning morpheme-analyzer

Updated May 12, 2017
Java

adliska / parallel_text_cleaning

Code for my BSc thesis: Cleaning of Parallel Texts for Machine Translation

thesis machine-translation corpus parallel-texts

Updated Feb 12, 2016
Java

wonderer007 / Naive-Bayes-classifier

Naive Bayes classifier is classification algorithm. It uses Naive based Bernoulli and Multinomial equation to classify documents(Text) as ham or spam.

java algorithm eclipse corpus naive-bayes-classifier classification-algorithm ham bernoulli classify-documents corpus-folder

Updated Jun 21, 2015
Java

Improve this page

Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."