Skip to content

zhongzc/lexer-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LexerGen

LexerGen is a code generation tool that converts regular expression rules into the Golang version of Lexer code. Also, with custom generation rules, users can customize the Go code that generates other languages, see: codogen.Generator.

Usage

  • Specify a rule input file. The left side of each rule is the name, and the right side is the regular expression. Like input.lx:
NUM     0|[1-9][0-9]*
IDENT   [A-Za-z_][A-Za-z0-9_]*
COMMENT /\*([^\*]*\*[^/])*[^\*]*\*/
  • Run the command line program lexerGen, in Windows, lexerGen.exe:
./lexerGen -i input.lx -o lexer -go lexer

Usage of ./lexerGen:

-cpp
    generate c++11 code
-go string
    generate Golang code, need to specify the package name
-i string
    the input rule file
-o string
    the output path (default "lexer")
  • Get the generated code from the specified path.
$ tree lexer
lexer
├── automata.go
├── charIterator.go
├── lexer.go
└── lexer_test.go

Feature

v0.3

  • Support generating C++11 code.
  • Fix some bugs.

v0.2

  • Significantly reduce the amount of generated code.
  • Supported regular expression:
    • negated group

v0.1

  • Can specify multiple regular expression rules for the lexer.
  • Support UTF-8.
  • The generated lexer has well-defined APIs.
  • The codebase is self-explanatory and organized to be learner-friendly.
  • Easy to customize to generate codes for other languages.
  • Supported regular expression:
    • alternation
    • concatenate
    • quantifier
    • character set

TODO

  • Support escape group, like \w,\d,\s...