corpus

Collection of text corpora for publicly available speeches from Mexican president Andres Manuel Lopez Obrador (AMLO) sourced from YouTube. The dataset includes his daily morning conferences (conferencias mañaneras) 😴🪿

Updated Jul 7, 2024
Python

tanloong / neosca

Star

L2SCA & LCA fork: cross-platform, GUI, without Java dependency

python nlp corpus linguistics tregex corpus-analysis constituency-parsing syntactic-complexity neosca l2sca

Updated Jul 9, 2024
Python

BLKSerene / Wordless

Star

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

translation tokenizer corpus linguistics tagger literature dependency-parser corpus-linguistics lemmatizer corpus-tools corpus-processing corpus-search corpus-statistics stopword corpus-analysis

Updated Jul 3, 2024
Python

thiippal / AI2D-RST

Star

A repository for the AI2D-RST corpus.

annotation graphs corpus diagrams multimodality rhetorical-structure-theory ai2d-dataset

Updated Jul 2, 2024
Python

CodingDogzxg / MicroblogCrawler

Star

微博热榜爬虫

nlp crawler spider corpus selenium python3 weibo chinese-nlp microblog corpus-linguistics corpus-data nlp-machine-learning weibo-spider lingustics sarcasm-detection pytorch-nlp weibo-crawler microblog-crawler

Updated Jun 29, 2024
Python

DaBr01 / AGB-DE

Star

A corpus and models for the automated legal assessment of clauses in German consumer contracts.

natural-language-processing corpus legaltech

Updated Jun 28, 2024
Python

CanCLID / canto-filter

Star

粵文語料篩選器 Cantonese text filter

nlp data corpus cantonese corpus-data cantonese-language

Updated Jun 26, 2024
Python

neocl / speach

Star

🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)

nlp annotation text corpus linguistics elan transcription

Updated Jun 26, 2024
Python

Improve this page

Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus

Here are 322 public repositories matching this topic...

flairNLP / fundus

adbar / trafilatura

luciamariaalvarezcrespo / GalMisoCorpus2023

innerNULL / mia

timarkh / tsakorpus

open-discourse / open-discourse

zjunlp / IEPile

chatopera / insuranceqa-corpus-zh

mrzjy / ZZZDialog

mrzjy / WutheringDialog

mrzjy / GenshinDialog

goodmike31 / pl-asr-speech-data-survey

ivansabik / chairum-corpus

tanloong / neosca

BLKSerene / Wordless

thiippal / AI2D-RST

CodingDogzxg / MicroblogCrawler

DaBr01 / AGB-DE

CanCLID / canto-filter

neocl / speach

Improve this page

Add this topic to your repo