A module to quickly create Corpus objects containing TTR, tokenized sentences, lexical density, class frequencies and more.
-
Updated
Jun 30, 2019 - Python
A module to quickly create Corpus objects containing TTR, tokenized sentences, lexical density, class frequencies and more.
A tool for determinating distances between multimodal annotations.
(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too
A linguistic study of 'mad' as an adverb in New England English.
Progetto per la materia "Linguistica Computazionale" - A.A. 2012/2013
Annotator combining different NLP pipelines.
For a corpus linguistics project, I created an information retrieval program called "You Are Not Alone". My phrase_finder() function searches for a self-identifying phrase in 4 large classic texts (The Souls of Black Folk, Jane Eyre, The Strange Case of Dr. Jekyll & Mr. Hyde, and Frankenstein). Standpoint: "So Matilda’s strong young mind continu…
Code for the article "New methods for analysing diachronic suffix competition across registers"
Using PMI method and Chi-Squared Method, this program scores bigrams on their probablilty of being a collocation. The top 20 scores are displayed as output for both methods.
A tool for the visualization of word frequency differences.
a terminal tool for searching rhymes within the Russian National Corpus
App and Scripts working with the corpus-builder CorpusCook, to have a corpus updated with corrected wrong predictions
Corpus of screenplays from TV show Kaamelott.
Portland State University LING 575: CORPUS LINGUISTICS code repo for winter term 2020.
Data related to "New methods for analysing diachronic suffix competition across registers"
A very simple concordancer with XML support.
Presidential Debate Corpus - Scraped, processed, and subsequently analyzed for keywords and other corpus statistics as an assignment for LING 5550 Corpus Linguistics.
This repository contains freely available and licensed code and annotated data in order to investigate and evaluate verbal processes in systemic functional linguistics (SFL) (initially with a focus on second language acquisition (SLA))
Some Faroese language statistics taken from fo.wikipedia.org content dump
Add a description, image, and links to the corpus-linguistics topic page so that developers can more easily learn about it.
To associate your repository with the corpus-linguistics topic, visit your repo's landing page and select "manage topics."