tokenizer

A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer

Here are 254 public repositories matching this topic...

jmaczan / bpe-tokenizer

CompLin / nheengatu

gtoffoli / spacy-cameltokenizer

CanvaChen / chinese-llama-tokenizer

adbar / simplemma

ksasso1028 / vintage_vectors

trag1c / crossandra

Dadmatech / DadmaTools

shenxiangzhuang / bleuscore

andreihar / taibun

VarunGumma / IndicTransTokenizer

roshan-research / hazm

hplt-project / sacremoses

BLKSerene / Wordless

vuthaihoc / coccoc-tokenizer-rest-api

Taufiq-ML / RNN-Sentiment-analysis-

SkywardAI / kimchima

entelecheia / lexikanon

bitextor / bitextor

kyegomez / MambaByte

Related Topics