tokenizer

A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer

Here are 62 public repositories matching this topic...

zurawiki / tiktoken-rs

lindera-morphology / lindera

guillaume-be / rust-tokenizers

DCjanus / cang-jie

Garvys / rustfst

daac-tools / vibrato

untitaker / html5gum

daac-tools / vaporetto

lindera-morphology / lindera-tantivy

PyThaiNLP / nlpo3

3Dpass / p3d

3Dpass / pass3d

sile / erl_tokenize

reinfer / blingfire-rs

Traumatism / maeel

osyoyu / tantivy-tokenizer-tiny-segmenter

daac-tools / python-vaporetto

kodemartin / rustpostal

bepzi / rlox-tokenizer

kojix2 / tiktoken-c

Related Topics