Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
-
Updated
May 22, 2024 - Python
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Text Classification Algorithms: A Survey
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Python library for creating PEG parsers
Persian NLP Toolkit
Automatic Korean word spacing with Python
A simple Python module for parsing human names into their individual components
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Mycroft's multilingual text parsing and formatting library
All-in-one text de-duplication
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…
Practice with Python data types, filter(), map(), and list comprehensions
短文本聚类预处理模块 Short text cluster
Text Normalization & Inverse Text Normalization
🗣️ Tool to generate adversarial text examples and test machine learning models against them
A neural network intent parser
Tool which allow you to detect and translate text.
Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.
To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."