💫 Industrial-strength Natural Language Processing (NLP) in Python
-
Updated
May 31, 2024 - Python
💫 Industrial-strength Natural Language Processing (NLP) in Python
All the slides, accompanying code and exercises all stored in this repo. 🎈
👑 spaCy building blocks and visualizers for Streamlit apps
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Rosette API Client Library for Python
Rule-based token, sentence segmentation for Russian language
FPE - Format Preserving Encryption with FF3 in Python
A Fast and Accurate Neural Thai Word Segmenter
A unified tokenization tool for Images, Chinese and English.
🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec
使用sentencepiece中BPE训练中文词表,并在transformers中进行使用。
Easy token price estimates for LLMs
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Implementation of the GBST block from the Charformer paper, in Pytorch
A character tokenizer for Hugging Face Transformers
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and more...
An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Fast tokenization and structural analysis of any programming language
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.
To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."