Skip to content

thenumbernine/lua-parser

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

Donate via Stripe
Donate via Bitcoin

Lua Parser in Lua

Parses to an abstract syntax tree representation. Call tostring() on the AST to get equivalent Lua code.

Works for versions 5.1 5.2 5.3 5.4 and maybe some luajit versions depending on their compatability.

AST also contains some functions like flatten() for use with optimizing / auto-inlining Lua.

See the tests folder for example usage.

Reference

Parser = require 'parser' This will return the parser class.

Parser.parse(data[, version, source]) This parses the code in data and returns an ast._block object. This is shorthand for Parser(data, version, source).tree version is a string '5.1', '5.2', '5.3', etc., corresponding to your Lua version. The Parser object has a few more functions to it corresponding with internal use while parsing. source is a description of the source, i.e. filename, which is included in some nodes (functions) for information on where they are declared.

ast = require 'parser.ast' This is the AST (abstract syntax tree) library, it hold a collection of AST classes, each representing a different token in the Lua syntax.

n = ast.node() = This is the superclass of all AST classes.

Each has the following properties:

n.type = returns the type of the node, coinciding with the classname in the ast library with underscore removed.

n:copy() = returns a copy of the node.

n:flatten(func, varmap) = flattens / inlines the contents of all function call of this function. Used for performance optimizations.

node.tostringmethods[serializationMethod] = function(self) ... end serializes this node, where serializationMethod is the current ast.tostringmethod. The default ast.tostringmethod is set to 'lua', and by default node.tostringmethods.lua is provided for all classes in ast.

ast.allclasses holds an integer-indexed table of all listed classes.

ast.node subclasses:

n = ast._block(...) = a block of code in Lua.
... is a list of initial child stmt nodes to populate the block node with.
n.type == 'block'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._stmt() = a statement-node parent-class.

n = ast._assign(vars, exprs) =
An assignment operation.
Subclass of _stmt.
n.type == 'assign'.
Represents the assignment of n.vars to n.exprs.

n = ast._do(...) =
A do ... end block.
Subclass of _stmt.
n.type == 'do'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._while(cond, ...) =
A while cond do ... end block.
Subclass of _stmt.
n.type == 'while'.
n.cond holds the condition expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._repeat(cond, ...) =
A repeat ... until cond block.
Subclass of _stmt.
n.type == 'repeat'.
n.cond holds the condition expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._if(cond, ...) =
A if cond then ... elseif ... else ... end block.
Subclass of _stmt.
n.type == 'if'.
n.cond holds the condtion expression of the first if statement.
All subsequent arguments must be ast._elseif objects, optionally with a final ast._else object.
n.elseifs holds the ast._elseif objects.
n.elsestmt optionally holds the final ast._else.

n = ast._elseif(cond, ...) =
A elseif cond then ... block.
Subclass of _stmt.
n.type == 'elseif'.
n.cond holds the condition expression of the else statement.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._else(...) =
A else ... block.
n.type == 'else'.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._foreq(var, min, max, step, ...) =
A for var=min,max[,step] do ... end block.
Subclass of _stmt.
n.type == 'foreq'.
n.var = the variable node.
n.min = the min expression.
n.max = the max expression.
n.step = the optional step expression.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._forin(vars, iterexprs, ...)
A for var1,...varN in expr1,...exprN do ... end block.
Subclass of _stmt.
n.type == 'forin'.
n.vars = table of variables of the for-in loop.
n.iterexprs = table of iterator expressions of the for-in loop.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._function(name, args, ...)
A function [name](arg1, ...argN) ... end block.
Subclass of _stmt.
n.type == 'function'.
n.name = the function name. This is optional. Omit name for this to represent lambda function. (Which technically becomes an expression and not a statement...)
n.args = table of arguments. This does get modified: each argument gets assigned an .param = true, and an .index = for which index it is in the argument list.
n[1] ... n[#n] = nodes of statements within the block.

n = ast._arg(index)
An argument to a function.
n.type == 'arg'.
n.index = which index in the function's argument list this is.

n = ast._local(exprs)
A local ... statement.
Subclass of _stmt.
n.type == 'local'
n.exprs = list of expressions to be declared as locals.
Expects its member-expressions to be either functions or assigns.

n = ast._return(...)
A return ... statement.
Subclass of _stmt.
n.type == 'return'
n.exprs = list of expressions to return.

n = ast._break(...)
A break statement.
Subclass of _stmt.
n.type == 'break'

n = ast._call(func, ...)
A func(...) function-call expression.
n.type == 'call'
n.func = expression of the function to call.
n.args = list argument expressions to pass into the function-call.

n = ast._nil()
A nil literal expression.
n.type == 'nil'.
n.const == true.

n = ast._true()
A true boolean literal expression
n.type == 'boolean'.
n.const == true.
n.value == true.

n = ast._false()
A false boolean literal expression
n.type == 'boolean'.
n.const == true.
n.value == false.

n = ast._number(value)
A numeric literal expression.
n.type == 'number'.
n.value = the numerical value.

n = ast._string(value)
A string literal expression.
n.type == 'string'.
n.value = the string value.

n = ast._vararg()
A vararg ... expression.
n.type == 'vararg'.
For use within function arguments, assignment expressions, function calls, etc.

n = ast._table(args)
A table { ... } expression.
n.type == 'table'.
n.args = a table of the expressions of the table.
If the expression in n.args[i] is an ast._assign then an entry is added into the table as key = value. If it is not an ast._assign then it is inserted as a sequenced entry.

n = ast._var(name)
A variable reference expression.
n.type == 'var'
n.name = the variable name.

n = ast._par(expr)
A ( ... ) parenthesis expression.
n.type == 'parenthesis'.
n.expr = the expression within the parenthesis.

n = ast._index(expr, key)
An expr[key] expression, i.e. an __index-metatable operation.
n.type == 'index'.
n.expr = the expression to be indexed.
n.key = the expression of the index key.

n = ast._indexself(expr, key)
An 'expr:keyexpression, to be used as the expression of aast._ callnode for member-function-calls. These are Lua's shorthand insertion ofselfas the first argument.<br>n.type == 'indexself'.<br> n.expr =the expression to be indexed.<br>n.key =the key to index. Must only be a Lua string, (not anast._ string`, but a real Lua string).

Binary operations:

node type Lua operator
add +
sub -
mul *
div /
mod %
concat ..
lt <
le <=
gt >
ge >=
eq ==
ne ~=
and and
or or
idiv // 5.3+
band & 5.3+
bxor ~ 5.3+
bor | 5.3+
shl << 5.3+
shr >> 5.3+

n.args = a table of the arguments of the operation.

Unary operations:

node type Lua operator
unm -
not not
len #
bnot ~ 5.3+

n.arg = the single argument of the operation.

more extra functions:

Some more useful functions in AST:

  • ast.copy(node) = equivalent of node:copy()
  • ast.flatten(node, func, varmap) = equivalent of node:flatten(func, varmap)
  • ast.nodeclass = class-creation function for use with the ast library.
  • ast.refreshparents
  • ast.traverse
  • ast.tostringmethod = this specifies the serialization method. It is used to look up the serializer stored in ast.tostringmethods

TODO:

  • Option for parsing LuaJIT -LL, -ULL, -i number suffixes.
  • Speaking of LuaJIT, it has different edge case syntax for 2.0.5, 2.1.0, and whether 5.2-compat is enabled or not. It isn't passing the minify_tests.lua.

Dependencies:

While I was at it, I added a require() replacement for parsing Lua scripts and registering callbacks, so any other script can say "require 'parser.require'.callbacks:insert(function(tree) ... modify the parse tree ... end)" and voila, Lua preprocessor in Lua!

minify_tests.txt taken from the tests at https://github.com/stravant/LuaMinify

I tested this by parsing itself, then using the parsed & reconstructed version to parse itself, then using the parsed & reconstructed version to parse the parsed & reconstructed version, then using the 2x parsed & reconstructed version to parse itself

About

Lua parser and abstract syntax tree in Lua

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages