Skip to content
#

tokenizer

A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.

Here are 76 public repositories matching this topic...

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.

  • Updated Jul 7, 2023
  • Java
antlr4-experiments

🔧 My studies on context-free grammar, using ANTLR4 (C++) to generate the parser files. Some basics are developed, such as token processing, recursion, variable definition, array processing, Abstract Syntax Tree (AST) manipulation, UNICODE support, and error handling.

  • Updated Oct 17, 2022
  • Java