text-corpus

Star

Here are 35 public repositories matching this topic...

jonsafari / habeas-corpus

Star

Command-line corpus tools

vocabulary corpus corpora corpus-linguistics command-line-tools text-corpus

Updated May 15, 2017
Shell

nikitaeverywhere / edu-text-analysis-experiments

Star

Statistical text analysis and semantic networks with Python

analysis text-analysis tf-idf gephi semantic-networks sigma text-corpus text-analyzer sigma-analysis

Updated Nov 30, 2017
Python

Chandra-cc / Tesseract_ICR-Sheets

Star

A model was trained using Google handwritten Fonts using a text corpus containing only digits ranging from 0-9. The main aim was to recognize ICR sheets from such trained data. Our model gave an accuracy of 94.6% using Tesseract Version-4.

tesseract lstm tesseract-ocr text-corpus tesseract-icr-sheets google-handwritten-fonts recognize-icr-sheets

Updated Aug 4, 2018
Python

capetocape / crawl-text-title-as-corpus

Star

Crawling data from websites as text corpus

python nlp crawling text-corpus

Updated Sep 8, 2018
Python

luonglearnstocode / Seinfeld-text-corpus

Star

text corpus 📃 scraped from the scripts 💬 of all Seinfeld episodes

regex requests web-scraping seinfeld text-corpus beautifulsoup4

Updated Jan 8, 2019
Jupyter Notebook

kurpicz / tcc

Star

Text Corpus Collection

downloader text-corpus

Updated Jul 24, 2019
C++

TextCorpusLabs / covid19

Star

Walk through to convert Kaggle's COVID-19 Open Research Dataset Challenge into a text corpus

python3 text-corpus covid-19

Updated Mar 23, 2020
Python

miras-tech / MirasText

Star

MirasText

nlp sentiment-analysis article corpus language-modeling dataset persian-nlp text-corpus word-embedding irony-detection

Updated Aug 12, 2020
Python

alla-g / NLP2020

Star

Final project for Natural language processing course in final_project_diary folder

selenium text-corpus mystem pymorphy2

Updated Oct 23, 2020
Jupyter Notebook

TextCorpusLabs / congressional-votes

Star

Walk through to convert congressional roll call votes into a text corpus

python3 us-congress text-corpus congress-votes

Updated Jan 21, 2021
Python

RedditEpidemicAnalysis / data

Star

Data collection scripts for analysis of Reddit

text-corpus

Updated Mar 29, 2021

Ermlab / PoLitBert

Star

Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.

nlp polish text-corpus roberta

Updated May 25, 2021
Python

alexlilia / igc-corpus-reader

Star

This is a tool which can be used to index and query a large XML-based text corpus using Elasticsearch.

corpus corpus-linguistics text-corpus corpus-tool icelandic-language

Updated Jun 17, 2021
Python

motazsaad / corpus-expander

Star

Expanding sentences in a given text corpus. The code checks for NE in sentences and create new sentences by injecting new NEs from NE list.

nes named-entities sentence corpus-linguistics language-model text-corpus arabic-nlp expanding-sentences corpus-expander

Updated Dec 16, 2021
Python

t-systems-on-site-services-gmbh / german-wikipedia-text-corpus

Star

This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.

nlp machine-learning text-corpus