A very simple news crawler with a funny name
-
Updated
Jul 16, 2024 - Python
A very simple news crawler with a funny name
Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
📑 Galician corpus for misogyny detection
My Implementations' Archive
Yet another search platform for linguistic corpora.
Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
[OneKE] [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus
🚁 保险行业语料库,聊天机器人
A project that extracts ZenlessZoneZero text corpus
Extracting character conversations in Wuthering Waves
Extracting character conversations in Genshin Project
Collection of text corpora for publicly available speeches from Mexican president Andres Manuel Lopez Obrador (AMLO) sourced from YouTube. The dataset includes his daily morning conferences (conferencias mañaneras) 😴🪿
L2SCA & LCA fork: cross-platform, GUI, without Java dependency
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
A repository for the AI2D-RST corpus.
微博热榜爬虫
A corpus and models for the automated legal assessment of clauses in German consumer contracts.
粵文語料篩選器 Cantonese text filter
🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."