corpus

Toxic Comment Classification Project constructed by Qimo Li, Chen He and Kun Qiu for the course "Introduction to Natural Language Processing in Python" at Brandeis University.

python nlp data-science machine-learning natural-language-processing sentiment-analysis random-forest scikit-learn jupyter-notebook corpus cross-validation text-analysis linguistics spacy nltk classification logistic-regression postagging scattertext

Updated Dec 20, 2019
HTML

CuiShaohua / News-Review-Pickup

Star

新闻人物言论自动提取---->得到说话的人和说话的内容

flask word2vec corpus sbv myproject npy pyc

Updated Jan 17, 2020
HTML

ThinamXx / WordFrequency_using_NLTK

Star

In this repository, I have used NLP to determine: What are the most frequent words in Herman Melville's novel Moby Dick and how often do they occur?

python nlp corpus nltk machinelearning

Updated Aug 10, 2020
HTML

luisteran5296 / Predictive-text-modeling

Star

Predictive texting is a data processed tool that makes it quicker and easier to write text by suggesting words as you type. The tool will read the text inside the text input area and predict the three most suitable options. After the prediction is made, the options are displayed as buttons. The user can press the button to insert text, the tool …

nlp corpus prediction wordcloud ngram-models

Updated Sep 8, 2020
HTML

cligs / data-nh

Star

Data accompanying the dissertation "Genre Analysis and Corpus Design: 19th Century Spanish-American novels (1830-1910)"

corpus spanish mexico novels dissertation cuba argentina network-analysis genre 19th-century genre-classification metadata-analysis family-resemblance

Updated Jan 31, 2021
HTML

dstl / muc3

Star

Message Understanding Conference 3 Corpus

html corpus tipster

Updated Feb 17, 2021
HTML

burgos2021 / programa

Star

Materiales para el curso de verano, «Del corpus a la interpretación: Estilometría con R», Burgos, 2021

r corpus stylometry

Updated Sep 11, 2021
HTML

Jean-Baptiste-Camps / Geste

Star

Un corpus de chansons de geste

corpus corpus-data xml-tei pos-tagging old-french lemmatization

Updated Sep 14, 2021
HTML

slack0 / sumspeech

Star

A Text / Speech Summarizer

vocabulary corpus speech matrix-factorization sentence topic-modeling summarization tf-idf topic-extraction topic-distribution

Updated Nov 6, 2021
HTML

tylergneill / pramana-nlp

Star

data, metadata, tools, and LDA experiments on a corpus of Sanskrit philosophy texts

corpus topic-modeling segmentation lda identifiers

Updated Nov 28, 2021
HTML

motazsaad / Arabic-Stories-Corpus

Star

Arabic Stories Corpus

stories corpus story arabic arabic-nlp arabic-language

Updated Dec 16, 2021
HTML

cooperchris17 / nippontv_arirang

Star

Scattertext plot comparing Nippon TV (Japan) and Arirang News (South Korea) YouTube videos. See cooperchris17/yt_short_news for more details

nlp corpus linguistics scattertext

Updated Mar 23, 2022
HTML

anjani-dhrangadhariya / rob-preliminary-annotation

Star

documenting annotations for risk of bias

data-science data annotations corpus information-extraction dataannotations pharma systematic-reviews risk-of-bias

Updated Aug 3, 2022
HTML

lil-lab / nlvr

Star

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

machine-learning natural-language-processing computer-vision corpus

Updated Aug 18, 2022
HTML

sonu-gupta / tosdr-terms-of-service-corpus

Star

This repository contains python code to create a corpus of 12,215 terms of service documents scraped from TOSDR, intended for legal, privacy, and natural language processing research.

python corpus language-resources tosdr terms-of-service-agreements

Updated Mar 14, 2023
HTML

Improve this page

Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus

Here are 38 public repositories matching this topic...

lungetech / cgc-corpus

creaciond / lermontov-online

sridharvaranasi / star-wars-text-analysis

Frances255 / Hospital

mr-segfault / fuzz_corpus_garden

Kimonokimo / NLP-comment-project

CuiShaohua / News-Review-Pickup

ThinamXx / WordFrequency_using_NLTK

luisteran5296 / Predictive-text-modeling

cligs / data-nh

dstl / muc3

burgos2021 / programa

Jean-Baptiste-Camps / Geste

slack0 / sumspeech

tylergneill / pramana-nlp

motazsaad / Arabic-Stories-Corpus

cooperchris17 / nippontv_arirang

anjani-dhrangadhariya / rob-preliminary-annotation

lil-lab / nlvr

sonu-gupta / tosdr-terms-of-service-corpus

Improve this page

Add this topic to your repo