An example of how to correctly parse python-like indentation-scoped files using flex (and bison).
Besides that, this project also serves as a template CMake-based project for a flex&bison parser and includes rules to track the current line and column of the scanner.
All the magic happens in the scanner, which emits
TOK_OUTDENT tokens whenever
the level of indentation increases or decreases. The parser in this project just echoes the tokens.
The scanner includes the
<normal> mode which it starts in. That's where you
put your regular rules. Whenever a newline is encountered in that mode, the
parser enters the
<indent> mode, in which it keeps counting the spaces and
tabs (and ignoring blank lines) until it sees anything else, in which case it
outputs either a
TOK_INDENT, one or more
TOK_OUTDENT as necessary or none
of these tokens and goes back to
The scanner also does its best to keep track of the column where the current
match starts, which can be accessed (and changed) through
yycolumn. The line
number is kept track of by flex internally.
All of this means that you can write the parser as usual, make use of the
TOK_OUTDENT tokens in order to handle indentation and access
the current line of tokens through
@1.last_line if the
token spans multiple lines, which I don't recommend.) and the column range of it
One caveat is that if one of your rules includes a newline character and is
matches text longer than one symbol, you will need to reset
yycolumn by hand.
Another one is that, for technical reasons, the column-range of the
TOK_OUTDENT tokens is the first character of the line or,
for outdents happening through reaching the end of the file,
Until I write a full tutorial, I recommend you look at the code, it is short and fully commented.