An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
-
Updated
Jul 27, 2024 - Python
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Crawler for linguistic corpora
Data for the quantitative study of (Vedic) Sanskrit
Large silver standart Russian corpus with NER, morphology and syntax markup
A set of workflows for corpus building through OCR, post-correction and normalisation
Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.
CONLL-U to Pandas DataFrame
MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include semantic tags from Biber (2006) and Biber et al. (1999), including other specific tags.
Yet another search platform for linguistic corpora.
An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts
Vietnamese Wikipedia Corpus
Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpus
Scraper
Measure the similarity of text corpora for 74 languages
TextDirectory allows you to filter, transform, and combine multiple text files into one aggregated file.
simple bs4 based web crawl for a corpus in need of statistical machine translation
A large German Legal Corpus of laws, administrative regulations and court decisions issued in Germany at federal level. Query the corpus: corpora.dipintra.it
Filipino wordlist word-level
Implementation of the term scoring algorithm in Tomokiyo & Hurst (2003), based on Kullback-Leibler Divergence (kldiv). Given a foreground and background corpus, it returns the most descriptive terms of the foreground corpus in the form of a termcloud
Scripts for building a geo-located web corpus using Common Crawl data
Add a description, image, and links to the corpus-linguistics topic page so that developers can more easily learn about it.
To associate your repository with the corpus-linguistics topic, visit your repo's landing page and select "manage topics."