-
Notifications
You must be signed in to change notification settings - Fork 4
LangDefs Debugging
Highlight v3.41
Useful references on how to debug syntax definition files to pinpoint issue.
Table of Contents
During the creation of a new language definition unexpected parser behaviors might show up in some edge cases. Tracking down the problem is not always easy since Highlight parser is an opaque blackbox to the user — except for its internal state variables which expose the current state of the parser inside hooked functions.
Here you'll find some guidelines and tools on how to leverage those internal states to isolate the problem.
Highlight ships with token_add_state_ids.lua
, a plugin which exposes in the output document the parser's state changes, and their IDs:
Description="Add internal state IDs behind each token (for debugging)."
function syntaxUpdate(desc)
function Decorate(token, state)
return token .. ' ('.. string.format("%d",state) .. ')'
end
end
Plugins={
{ Type="lang", Chunk=syntaxUpdate },
}
The plugin can be turned ON and OFF in Highlight GUI, from the "Plug-in" tab, allowing to visually track the parser states for the current input file.
A python example, without the State IDs plugin:
… and with the State IDs plugin enabled:
The plugin adds, at each parser state change, the integer of the new state enclosed in parenthesis. This reveals us interesting details about the parser's inner workings; for example, from the above screenshot we can notice that during the string parsing the parser updates the syntax multiple times, even though the same state is confirmed. This shows us that the parser is consuming the string in chunks, trying to isolate any tokens that could match legitimate sub-string elements (escape sequences, interpolations).
As for the actual numbers, these represent the various possible parser states, which are assigned at initialization time, and might vary with each syntax (depending on what elements are actually defined). The correspondence between parser states and integer values can be retrived via the --verbose
option.
Let's try it with the example file used with HighlightGUI in our screenshots. From the command line we'll invoke highlight --verbose StatesIDs-plugin-Example.py
and try to pinpoint in the output which states correspond to "11
", "9
" and "1
" (actual output cut-down here, for space reasons):
> highlight --verbose StatesIDs-plugin-Example.py
Loading language definition:
C:\Program Files\Highlight\langDefs\python.lang
Description: Python
LUA GLOBALS:
...
HL_INTERPOLATION: number [ 10 ]
HL_INTERPOLATION_END: number [ 19 ]
HL_KEYWORD: number [ 11 ] <-- (11) = Keywords
HL_KEYWORD_END: number [ 20 ]
...
HL_NUMBER: number [ 2 ]
HL_OPERATOR: number [ 9 ] <-- (9) = Operators
HL_OPERATOR_END: number [ 18 ]
...
HL_STANDARD: number [ 0 ]
HL_STRING: number [ 1 ] <-- (1) = Strings
HL_STRING_END: number [ 12 ]
HL_UNKNOWN: number [ 100 ]
...
I've added arrows on the right side, pointing to the values we were seeking for. Now we know what these numbers mean in term of the parser states:
-
1
representsStrings
-
9
representsOperators
-
11
representsKeywords
Let's analyse the plugin output:
We can now get a clear picture of how HL parser is tokenizing the "print("Hello!")
" line, step by step:
token | state ID | parser state |
---|---|---|
"print " |
(11) | Keyword token |
"( " |
(9) | Operator token |
"" " |
(1) | String token |
"Hello " |
(1) (1) | String token |
"! " |
(1) (1) | String token |
"" " |
(1) | String token |
") " |
(9) | Operator token |
You'll' also notice that syntaxUpdate()
is being called twice for tokens inside the string (ie, for "Hello
" and "!
"). This means that for the current syntax definition the parser needs to undergo two state updates for evaluating those tokens — basically, one update to establish they are not sub-elements (eg: an escape sequence), and another to establish that the string state needs to carry on.
In complex language definitions, the parser might go through multiple updates to evaluate each token, depending on the token's context and the definitions provided by the syntax, but especially if there are custom rules hooked into OnStateChange()
that force it to return with custom values (eg: HL_REJECT
, HL_STANDARD
, etc.).
Playing around with the state-IDs plugin and following the parser's syntax updates and state changes with various input examples and languages — while studying their syntax definition code — is a great way to gain insights on Highlight's internals and how custom code in the hook functions can alter the parser's behaviour.