tokenizer
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
Here are 68 public repositories matching this topic...
The compiler frontend
-
Updated
Jun 13, 2024 - C++
A simple "interpreter" that takes in as input simple arithmetic expressions and computes their values.
-
Updated
Apr 28, 2024 - C++
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
-
Updated
Apr 15, 2024 - C++
Interpreter for the Datalog Programming Language. Lexer > Parser > Interpreter
-
Updated
Apr 9, 2024 - C++
Powder is my attempt to program a scripting language that compiles down to byte codes that are interpreted and executed by a virtual machine. The language itself is a cross between C and Python, with a few unique syntax features.
-
Updated
Nov 18, 2023 - C++
Fast and customizable text tokenization library with BPE and SentencePiece support
-
Updated
Nov 10, 2023 - C++
High-Performance Stemmer, Tokenizer, and Spell Checker for R
-
Updated
Oct 27, 2023 - C++
Juman++ (a Morphological Analyzer Toolkit)
-
Updated
Oct 3, 2023 - C++
A C++ Parser Project
-
Updated
Oct 1, 2023 - C++
A basic interpreter which interprets and executes a program line by line
-
Updated
Jul 17, 2023 - C++
- Followers
- 10.2k followers
- Wikipedia
- Wikipedia