Lua Parser in Lua
Parses to an abstract syntax tree representation. Call tostring() on the AST to get equivalent Lua code.
Works for versions 5.1 5.2 5.3 5.4 and maybe some luajit versions depending on their compatability.
AST also contains some functions like flatten() for use with optimizing / auto-inlining Lua.
See the tests folder for example usage.
Reference
Parser = require 'parser'
This will return the parser class.
Parser.parse(data[, version, source])
This parses the code in data
and returns an ast._block
object.
This is shorthand for Parser(data, version, source).tree
version
is a string '5.1', '5.2', '5.3'
, etc., corresponding to your Lua version.
The Parser
object has a few more functions to it corresponding with internal use while parsing.
source
is a description of the source, i.e. filename, which is included in some nodes (functions) for information on where they are declared.
ast = require 'parser.ast'
This is the AST (abstract syntax tree) library,
it hold a collection of AST classes, each representing a different token in the Lua syntax.
n = ast.node()
= This is the superclass of all AST classes.
Each has the following properties:
n.type
= returns the type of the node, coinciding with the classname in the ast
library with underscore removed.
n:copy()
= returns a copy of the node.
n:flatten(func, varmap)
= flattens / inlines the contents of all function call of this function. Used for performance optimizations.
node.tostringmethods[serializationMethod] = function(self) ... end
serializes this node, where serializationMethod
is the current ast.tostringmethod
.
The default ast.tostringmethod
is set to 'lua'
, and by default node.tostringmethods.lua
is provided for all classes in ast
.
ast.allclasses
holds an integer-indexed table of all listed classes.
ast.node subclasses:
n = ast._block(...)
= a block of code in Lua.
...
is a list of initial child stmt
nodes to populate the block
node with.
n.type == 'block'
.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._stmt()
= a statement-node parent-class.
n = ast._assign(vars, exprs)
=
An assignment operation.
Subclass of _stmt
.
n.type == 'assign'
.
Represents the assignment of n.vars
to n.exprs
.
n = ast._do(...)
=
A do ... end
block.
Subclass of _stmt
.
n.type == 'do'
.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._while(cond, ...)
=
A while cond do ... end
block.
Subclass of _stmt
.
n.type == 'while'
.
n.cond
holds the condition expression.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._repeat(cond, ...)
=
A repeat ... until cond
block.
Subclass of _stmt
.
n.type == 'repeat'
.
n.cond
holds the condition expression.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._if(cond, ...)
=
A if cond then ... elseif ... else ... end
block.
Subclass of _stmt
.
n.type == 'if'
.
n.cond
holds the condtion expression of the first if
statement.
All subsequent arguments must be ast._elseif
objects, optionally with a final ast._else
object.
n.elseifs
holds the ast._elseif
objects.
n.elsestmt
optionally holds the final ast._else
.
n = ast._elseif(cond, ...)
=
A elseif cond then ...
block.
Subclass of _stmt
.
n.type == 'elseif'
.
n.cond
holds the condition expression of the else
statement.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._else(...)
=
A else ...
block.
n.type == 'else'
.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._foreq(var, min, max, step, ...)
=
A for var=min,max[,step] do ... end
block.
Subclass of _stmt
.
n.type == 'foreq'
.
n.var =
the variable node.
n.min =
the min expression.
n.max =
the max expression.
n.step =
the optional step expression.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._forin(vars, iterexprs, ...)
A for var1,...varN in expr1,...exprN do ... end
block.
Subclass of _stmt
.
n.type == 'forin'
.
n.vars =
table of variables of the for-in loop.
n.iterexprs =
table of iterator expressions of the for-in loop.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._function(name, args, ...)
A function [name](arg1, ...argN) ... end
block.
Subclass of _stmt
.
n.type == 'function'
.
n.name =
the function name. This is optional. Omit name for this to represent lambda function. (Which technically becomes an expression and not a statement...)
n.args =
table of arguments. This does get modified: each argument gets assigned an .param = true
, and an .index =
for which index it is in the argument list.
n[1] ... n[#n] =
nodes of statements within the block.
n = ast._arg(index)
An argument to a function.
n.type == 'arg'
.
n.index =
which index in the function's argument list this is.
n = ast._local(exprs)
A local ...
statement.
Subclass of _stmt
.
n.type == 'local'
n.exprs =
list of expressions to be declared as locals.
Expects its member-expressions to be either functions or assigns.
n = ast._return(...)
A return ...
statement.
Subclass of _stmt
.
n.type == 'return'
n.exprs =
list of expressions to return.
n = ast._break(...)
A break
statement.
Subclass of _stmt
.
n.type == 'break'
n = ast._call(func, ...)
A func(...)
function-call expression.
n.type == 'call'
n.func =
expression of the function to call.
n.args =
list argument expressions to pass into the function-call.
n = ast._nil()
A nil
literal expression.
n.type == 'nil'
.
n.const == true
.
n = ast._true()
A true
boolean literal expression
n.type == 'boolean'
.
n.const == true
.
n.value == true
.
n = ast._false()
A false
boolean literal expression
n.type == 'boolean'
.
n.const == true
.
n.value == false
.
n = ast._number(value)
A numeric literal expression.
n.type == 'number'
.
n.value =
the numerical value.
n = ast._string(value)
A string literal expression.
n.type == 'string'
.
n.value =
the string value.
n = ast._vararg()
A vararg ...
expression.
n.type == 'vararg'
.
For use within function arguments, assignment expressions, function calls, etc.
n = ast._table(args)
A table { ... }
expression.
n.type == 'table'
.
n.args =
a table of the expressions of the table.
If the expression in n.args[i]
is an ast._assign
then an entry is added into the table as key = value
. If it is not an ast._assign
then it is inserted as a sequenced entry.
n = ast._var(name)
A variable reference expression.
n.type == 'var'
n.name =
the variable name.
n = ast._par(expr)
A ( ... )
parenthesis expression.
n.type == 'parenthesis'
.
n.expr =
the expression within the parenthesis.
n = ast._index(expr, key)
An expr[key]
expression, i.e. an __index
-metatable operation.
n.type == 'index'
.
n.expr =
the expression to be indexed.
n.key =
the expression of the index key.
n = ast._indexself(expr, key)
An 'expr:keyexpression, to be used as the expression of a
ast._ callnode for member-function-calls. These are Lua's shorthand insertion of
selfas the first argument.<br>
n.type == 'indexself'.<br>
n.expr =the expression to be indexed.<br>
n.key =the key to index. Must only be a Lua string, (not an
ast._ string`, but a real Lua string).
Binary operations:
node type | Lua operator | |
---|---|---|
add | + |
|
sub | - |
|
mul | * |
|
div | / |
|
mod | % |
|
concat | .. |
|
lt | < |
|
le | <= |
|
gt | > |
|
ge | >= |
|
eq | == |
|
ne | ~= |
|
and | and |
|
or | or |
|
idiv | // |
5.3+ |
band | & |
5.3+ |
bxor | ~ |
5.3+ |
bor | | |
5.3+ |
shl | << |
5.3+ |
shr | >> |
5.3+ |
n.args =
a table of the arguments of the operation.
Unary operations:
node type | Lua operator | |
---|---|---|
unm | - |
|
not | not |
|
len | # |
|
bnot | ~ |
5.3+ |
n.arg =
the single argument of the operation.
more extra functions:
Some more useful functions in AST:
ast.copy(node)
= equivalent ofnode:copy()
ast.flatten(node, func, varmap)
= equivalent ofnode:flatten(func, varmap)
ast.nodeclass
= class-creation function for use with theast
library.ast.refreshparents
ast.traverse
ast.tostringmethod
= this specifies the serialization method. It is used to look up the serializer stored inast.tostringmethods
TODO:
- Option for parsing LuaJIT -LL, -ULL, -i number suffixes.
- Speaking of LuaJIT, it has different edge case syntax for 2.0.5, 2.1.0, and whether 5.2-compat is enabled or not. It isn't passing the
minify_tests.lua
.
Dependencies:
While I was at it, I added a require() replacement for parsing Lua scripts and registering callbacks,
so any other script can say "require 'parser.require'.callbacks:insert(function(tree) ... modify the parse tree ... end)"
and voila, Lua preprocessor in Lua!
minify_tests.txt
taken from the tests at https://github.com/stravant/LuaMinify
I tested this by parsing itself, then using the parsed & reconstructed version to parse itself, then using the parsed & reconstructed version to parse the parsed & reconstructed version, then using the 2x parsed & reconstructed version to parse itself