汉语现代诗歌语料库整理,3489诗人,81.7K诗歌,15.43M字。持续扩充...
-
Updated
Aug 1, 2023 - Python
汉语现代诗歌语料库整理,3489诗人,81.7K诗歌,15.43M字。持续扩充...
Utilities for Processing the Switchboard Dialogue Act Corpus
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
文本去重
A parser for annotated MuseScore 3 files.
粵文語料篩選器 Cantonese text filter
Utilities for Processing the Meeting Recorder Dialogue Act Corpus
Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation"
Vietnamese Wikipedia Corpus
📑 Galician corpus for misogyny detection
Use Bi-LSTM neural network to classify Chinese text sentiment, including eight categories (like, disgust, happiness, sadness, anger, surprise, fear, none)
Python API for loading language data from American-English CHILDES database
Scraper
simple bs4 based web crawl for a corpus in need of statistical machine translation
Filipino wordlist word-level
Utilities for Processing the HCRC Map Task Corpus
Kumpulan dokumen korpus dalam bahasa Indonesia berisi kasus uji deteksi plagiarisme eksternal dengan standar PAN CLEF (http://www.uni-weimar.de/medien/webis/events/pan-11).
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Improving Language Model Performance through Smart Vocabularies
Add a description, image, and links to the corpus-data topic page so that developers can more easily learn about it.
To associate your repository with the corpus-data topic, visit your repo's landing page and select "manage topics."