Walk through to convert PMC OAS Dataset into a text corpus
-
Updated
Mar 25, 2024 - Python
Walk through to convert PMC OAS Dataset into a text corpus
Expanding sentences in a given text corpus. The code checks for NE in sentences and create new sentences by injecting new NEs from NE list.
Walk through to convert Kaggle's COVID-19 Open Research Dataset Challenge into a text corpus
Walk through to convert congressional roll call votes into a text corpus
This is a tool which can be used to index and query a large XML-based text corpus using Elasticsearch.
Crawling data from websites as text corpus
Walk through to convert WikiMedia into a text corpus
A model was trained using Google handwritten Fonts using a text corpus containing only digits ranging from 0-9. The main aim was to recognize ICR sheets from such trained data. Our model gave an accuracy of 94.6% using Tesseract Version-4.
Search a long list of names (patterns) in a large text corpus systematically and quickly
Statistical text analysis and semantic networks with Python
A project that extracts Honkai: Star Rail text corpus
Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.
MirasText
Add a description, image, and links to the text-corpus topic page so that developers can more easily learn about it.
To associate your repository with the text-corpus topic, visit your repo's landing page and select "manage topics."