#

text-processing

Here are 1,575 public repositories matching this topic...

google / diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

diff match patch text-processing difference

Updated May 22, 2024
Python

learnbyexample / Command-line-text-processing

⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨

ruby linux command-line regex perl ebook awk sed text-processing grep

Updated Jun 5, 2024
Shell

kk7nc / Text_Classification

Text Classification Algorithms: A Survey

deep-learning random-forest text-classification recurrent-neural-networks naive-bayes-classifier dimensionality-reduction logistic-regression document-classification convolutional-neural-networks text-processing decision-trees boosting-algorithms support-vector-machines hierarchical-attention-networks nlp-machine-learning conditional-random-fields k-nearest-neighbours deep-belief-network rocchio-algorithm deep-neural-network

Updated Nov 14, 2022
Python

PyMuPDF

pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

python pdf font data-science ocr tesseract epub mupdf text-processing pdf-documents extract-data table-extraction text-shaping xps pymupdf

Updated Sep 30, 2024
Python

fastnlp / fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

natural-language-processing deep-learning text-classification chinese-nlp text-processing nlp-parsing nlp-library

Updated Jun 5, 2023
Python

pyparsing / pyparsing

Python library for creating PEG parsers

python parsing parser-combinators python3 parsing-expression-grammar python-3 text-processing python-2 python2 parsing-library peg-parsers

Updated Sep 29, 2024
Python

roshan-research / hazm

Persian NLP Toolkit

python nlp natural-language-processing tokenizer embeddings persian text-processing dependency-parser farsi pos-tagging persian-nlp normalization lemmatization

Updated Jul 16, 2024
Python

LaloCo / TextAnalytics

Text Analytics Jupyter Notebook example for the Azure cognitive service

azure text-analysis text-processing cognitive-services

Updated Apr 7, 2024
Jupyter Notebook

chmln / sd

Intuitive find & replace CLI (sed alternative)

rust cli terminal command-line regex text-processing

Updated May 28, 2024
Rust

PyKoSpacing

haven-jeon / PyKoSpacing

Automatic Korean word spacing with Python

nlp text-processing spacing korean-nlp

Updated Jul 4, 2024
Python

umer7 / Applied-Text-Mining-in-Python

Repo for Applied Text Mining in Python (coursera) by University of Michigan

python nlp text-mining text-classification regex pandas classification text-processing nlp-tasks

Updated Oct 4, 2020
Jupyter Notebook

derek73 / python-nameparser

A simple Python module for parsing human names into their individual components

python text-processing text-parser python-module

Updated May 28, 2024
Python

open-korean-text / open-korean-text

Open Korean Text Processor - An Open-source Korean Text Processor

natural-language-processing tokenizer korean text-processing korean-text-processing korean-tokenizer

Updated Mar 12, 2024
Scala

BurntSushi / aho-corasick

A fast implementation of Aho-Corasick in Rust.

search finite-state-machine text-processing aho-corasick substring-matching

Updated Sep 25, 2024
Rust

cbaziotis / ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

nlp tokenizer text-processing semeval nlp-library word-segmentation spelling-correction tokenization text-segmentation spell-corrector word-normalization

Updated Feb 27, 2024
Python

karolzak / support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Updated Jun 21, 2022
Python

pyarabic

linuxscout / pyarabic

pyarabic

text-processing nlp-library arabic-language

Updated Jan 14, 2024
Python

MycroftAI / lingua-franca

Mycroft's multilingual text parsing and formatting library

natural-language-processing library text-processing hacktoberfest

Updated Aug 14, 2023
Python

PacktPublishing / Hands-On-Python-Natural-Language-Processing

machine-learning natural-language-processing text-mining deep-neural-networks deep-learning text-generation text-processing natural-language-generation natural-language-inference natural-language-understanding

Updated Jan 30, 2023
Jupyter Notebook

ChenghaoMou / text-dedup

All-in-one text de-duplication

nlp text-processing data-processing de-duplication

Updated May 21, 2024
Python

Improve this page

Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."