Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
-
Updated
May 22, 2024 - Python
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨
Text Classification Algorithms: A Survey
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Python library for creating PEG parsers
Persian NLP Toolkit
Text Analytics Jupyter Notebook example for the Azure cognitive service
Intuitive find & replace CLI (sed alternative)
Automatic Korean word spacing with Python
Repo for Applied Text Mining in Python (coursera) by University of Michigan
A simple Python module for parsing human names into their individual components
Open Korean Text Processor - An Open-source Korean Text Processor
A fast implementation of Aho-Corasick in Rust.
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Mycroft's multilingual text parsing and formatting library
All-in-one text de-duplication
Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.
To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."