Command-line corpus tools
-
Updated
May 15, 2017 - Shell
Command-line corpus tools
Statistical text analysis and semantic networks with Python
A model was trained using Google handwritten Fonts using a text corpus containing only digits ranging from 0-9. The main aim was to recognize ICR sheets from such trained data. Our model gave an accuracy of 94.6% using Tesseract Version-4.
Crawling data from websites as text corpus
text corpus 📃 scraped from the scripts 💬 of all Seinfeld episodes
Walk through to convert Kaggle's COVID-19 Open Research Dataset Challenge into a text corpus
MirasText
Final project for Natural language processing course in final_project_diary folder
Walk through to convert congressional roll call votes into a text corpus
Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.
This is a tool which can be used to index and query a large XML-based text corpus using Elasticsearch.
Expanding sentences in a given text corpus. The code checks for NE in sentences and create new sentences by injecting new NEs from NE list.
This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
AsoSoft Text Corpus is the first large scale text corpus for the Kurdish language.
A framework for semantic text search
A text corpus collection for the DroppedText language.
Search a long list of names (patterns) in a large text corpus systematically and quickly
Add a description, image, and links to the text-corpus topic page so that developers can more easily learn about it.
To associate your repository with the text-corpus topic, visit your repo's landing page and select "manage topics."