Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
-
Updated
Aug 18, 2022 - HTML
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言
中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.
The University of Pittsburgh English Language Institute Corpus (PELIC) dataset
Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)
data, metadata, tools, and LDA experiments on a corpus of Sanskrit philosophy texts
HUMOR dataset for humor research
A Corpus of the Kurdish Folkloric Lyrics
A Text / Speech Summarizer
Article title, authors, date and body extraction dataset.
Un corpus de chansons de geste
This repository contains python code to create a corpus of 12,215 terms of service documents scraped from TOSDR, intended for legal, privacy, and natural language processing research.
Materiales para el curso de verano, «Del corpus a la interpretación: Estilometría con R», Burgos, 2021
Arabic Stories Corpus
a garden of file formats from a collection of sources for use as inputs for fuzzing engines.
Toxic Comment Classification Project constructed by Qimo Li, Chen He and Kun Qiu for the course "Introduction to Natural Language Processing in Python" at Brandeis University.
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."