Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
Updated
May 16, 2024 - Python
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Text preprocessing, representation and visualization from zero to hero.
🧹 Python package for text cleaning
Preprocessing Library for Natural Language Processing
A python package for text preprocessing task in natural language processing.
Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Easy NLP in Python
A powerful text cleaner for Japanese web texts
Performs tokenization, stemming, lemmatization, index creation, index compression and ranked retrieval of Cranfield documents
Vector Space based Search Engine for Arxiv Research Publications
Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA techniques) over Reddit Posts from TLDRHQ dataset.
My 2020 project focusing on NLP - Information Extraction
ToxiScan is a text analysis tool that leverages the power of Natural Language Toolkit (NLTK) and the Naive Bayes classifier to determine the presence of toxicity in textual data.
A tool for extracting chapters from Gutenberg Project Italian raw text e-books. RegEx are used to match chapter headings and extract the text between them.
🛠️An easy to use tool for Data Preprocessing specially for Text Preprocessing
🐨 text preprocess.
Add a description, image, and links to the text-preprocessing topic page so that developers can more easily learn about it.
To associate your repository with the text-preprocessing topic, visit your repo's landing page and select "manage topics."