#

text-corpus

Here are 15 public repositories matching this topic...

mrzjy / StarrailDialog

A project that extracts Honkai: Star Rail text corpus

game nlp character conversation npc rpg-game multilanguage text-corpus honkai mihoyo honkai-star-rail

Updated Jul 12, 2024
Python

TextCorpusLabs / oas

Walk through to convert PMC OAS Dataset into a text corpus

python3 oas text-corpus

Updated Mar 25, 2024
Python

WHOSpeeches / WHODataHub

Collect the WHO's Director General's speeches.

python3 who text-corpus

Updated Jan 27, 2023
Python

TextCorpusLabs / wikimedia

Walk through to convert WikiMedia into a text corpus

wikimedia python3 text-corpus

Updated Jan 26, 2023
Python

appeler / search_names

Search a long list of names (patterns) in a large text corpus systematically and quickly

search names text-corpus

Updated Oct 7, 2022
Python

TextCorpusLabs / NJGovNews

Web scraping of the New Jersey news feeds

python3 newsfeed text-corpus

Updated Mar 10, 2022
Python

motazsaad / corpus-expander

Expanding sentences in a given text corpus. The code checks for NE in sentences and create new sentences by injecting new NEs from NE list.

nes named-entities sentence corpus-linguistics language-model text-corpus arabic-nlp expanding-sentences corpus-expander

Updated Dec 16, 2021
Python

alexlilia / igc-corpus-reader

This is a tool which can be used to index and query a large XML-based text corpus using Elasticsearch.

corpus corpus-linguistics text-corpus corpus-tool icelandic-language

Updated Jun 17, 2021
Python

Ermlab / PoLitBert

Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.

nlp polish text-corpus roberta

Updated May 25, 2021
Python

TextCorpusLabs / congressional-votes

Walk through to convert congressional roll call votes into a text corpus

python3 us-congress text-corpus congress-votes

Updated Jan 21, 2021
Python

miras-tech / MirasText

MirasText

nlp sentiment-analysis article corpus language-modeling dataset persian-nlp text-corpus word-embedding irony-detection

Updated Aug 12, 2020
Python

TextCorpusLabs / covid19

Walk through to convert Kaggle's COVID-19 Open Research Dataset Challenge into a text corpus

python3 text-corpus covid-19

Updated Mar 23, 2020
Python

capetocape / crawl-text-title-as-corpus

Crawling data from websites as text corpus

python nlp crawling text-corpus

Updated Sep 8, 2018
Python

Chandra-cc / Tesseract_ICR-Sheets

A model was trained using Google handwritten Fonts using a text corpus containing only digits ranging from 0-9. The main aim was to recognize ICR sheets from such trained data. Our model gave an accuracy of 94.6% using Tesseract Version-4.

tesseract lstm tesseract-ocr text-corpus tesseract-icr-sheets google-handwritten-fonts recognize-icr-sheets

Updated Aug 4, 2018
Python

nikitaeverywhere / edu-text-analysis-experiments

Statistical text analysis and semantic networks with Python

analysis text-analysis tf-idf gephi semantic-networks sigma text-corpus text-analyzer sigma-analysis

Updated Nov 30, 2017
Python

Improve this page

Add a description, image, and links to the text-corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-corpus topic, visit your repo's landing page and select "manage topics."