corpus

Here are 37 public repositories matching this topic...

lil-lab / nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

machine-learning natural-language-processing computer-vision corpus

Updated Aug 18, 2022
HTML

JiangYanting / Pre-modern_Chinese_corpus_dataset

Star

近代汉语语料库数据集自然语言处理语料库古代汉语古汉语文言文数字人文计算语言

machine-learning natural-language-processing data-mining corpus dataset

Updated Jul 31, 2023
HTML

lxs602 / Chinese-Mandarin-Dictionaries

Star

中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.

unicode dictionaries dictionary corpus english chinese hanzi goldendict chinese-language zhongwen chinese-mandarin-dictionaries handian

Updated Apr 15, 2024
HTML

ELI-Data-Mining-Group / PELIC-dataset

Star

The University of Pittsburgh English Language Institute Corpus (PELIC) dataset

corpus esl lexical-analysis longitudinal-data concordancer tesol second-language-acquisition learner-corpus intensive-english-program english-for-academic-purposes second-language-writing

Updated Mar 31, 2023
HTML

MiMoText / roman18

Star

Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)

corpus enlightenment novels french literature trier 18th-century

Updated Apr 23, 2024
HTML

CuiShaohua / News-Review-Pickup

Star

新闻人物言论自动提取---->得到说话的人和说话的内容

flask word2vec corpus sbv myproject npy pyc

Updated Jan 17, 2020
HTML

dstl / muc3

Star

Message Understanding Conference 3 Corpus

html corpus tipster

Updated Feb 17, 2021
HTML

tylergneill / pramana-nlp

Star

data, metadata, tools, and LDA experiments on a corpus of Sanskrit philosophy texts

corpus topic-modeling segmentation lda identifiers

Updated Nov 28, 2021
HTML

lungetech / cgc-corpus

Star

DARPA CGC Corpus

corpus cgc

Updated May 1, 2017
HTML

pln-fing-udelar / humor

Star

HUMOR dataset for humor research

nlp machine-learning humor corpus dataset crowdsourcing nlp-machine-learning

Updated Mar 29, 2023
HTML

KurdishBLARK / KurdishLyricsCorpus

Star

A Corpus of the Kurdish Folkloric Lyrics

lyrics corpus kurdish folkloristics kurdish-language-processing

Updated Apr 12, 2023
HTML

slack0 / sumspeech

Star

A Text / Speech Summarizer

vocabulary corpus speech matrix-factorization sentence topic-modeling summarization tf-idf topic-extraction topic-distribution

Updated Nov 6, 2021
HTML

AndyTheFactory / article-extraction-dataset

Star

Article title, authors, date and body extraction dataset.

text-mining news html-to-markdown scraping corpus news-aggregator text-extraction dataset web-scraping readability datasets scraping-websites html2text news-crawler corpus-builder corpus-tools article-extractor text-cleaning text-preprocessing

Updated Mar 26, 2024
HTML

Jean-Baptiste-Camps / Geste

Star

Un corpus de chansons de geste

corpus corpus-data xml-tei pos-tagging old-french lemmatization

Updated Sep 14, 2021
HTML

sonu-gupta / tosdr-terms-of-service-corpus

Star

This repository contains python code to create a corpus of 12,215 terms of service documents scraped from TOSDR, intended for legal, privacy, and natural language processing research.

python corpus language-resources tosdr terms-of-service-agreements

Updated Mar 14, 2023
HTML

burgos2021 / programa

Star

Materiales para el curso de verano, «Del corpus a la interpretación: Estilometría con R», Burgos, 2021

r corpus stylometry

Updated Sep 11, 2021
HTML

mikahama / SemFi

Sponsor

Star

Semantic relations for Finnish words

nlp semantics corpus finnish

Updated Dec 13, 2023
HTML

motazsaad / Arabic-Stories-Corpus

Star

Arabic Stories Corpus

stories corpus story arabic arabic-nlp arabic-language

Updated Dec 16, 2021
HTML

mr-segfault / fuzz_corpus_garden

Star

a garden of file formats from a collection of sources for use as inputs for fuzzing engines.

input seed corpus fuzzer fuzz fuzz-corpus-garden fuzzing-engines

Updated Oct 4, 2019
HTML

Kimonokimo / NLP-comment-project

Star

Toxic Comment Classification Project constructed by Qimo Li, Chen He and Kun Qiu for the course "Introduction to Natural Language Processing in Python" at Brandeis University.

python nlp data-science machine-learning natural-language-processing sentiment-analysis random-forest scikit-learn jupyter-notebook corpus cross-validation text-analysis linguistics spacy nltk classification logistic-regression postagging scattertext

Updated Dec 20, 2019
HTML

Improve this page

Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus

Here are 37 public repositories matching this topic...

lil-lab / nlvr

JiangYanting / Pre-modern_Chinese_corpus_dataset

lxs602 / Chinese-Mandarin-Dictionaries

ELI-Data-Mining-Group / PELIC-dataset

MiMoText / roman18

CuiShaohua / News-Review-Pickup

dstl / muc3

tylergneill / pramana-nlp

lungetech / cgc-corpus

pln-fing-udelar / humor

KurdishBLARK / KurdishLyricsCorpus

slack0 / sumspeech

AndyTheFactory / article-extraction-dataset

Jean-Baptiste-Camps / Geste

sonu-gupta / tosdr-terms-of-service-corpus

burgos2021 / programa

mikahama / SemFi

motazsaad / Arabic-Stories-Corpus

mr-segfault / fuzz_corpus_garden

Kimonokimo / NLP-comment-project

Improve this page

Add this topic to your repo