Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Document how to maintain line and column information with custom lexers #59

Open
fitzgen opened this Issue Aug 23, 2011 · 8 comments

Comments

Projects
None yet
5 participants

fitzgen commented Aug 23, 2011

I just ran in to this while hacking on CoffeeScript, whose lexer doesn't expose the proper information. It would be nice if the interface were documented. I'm going to dig in to the code now, but I can't guarantee a pull request.

Owner

zaach commented Aug 23, 2011

It's mentioned here: http://zaach.github.com/jison/docs/#tracking-locations

Basically, the lexer sets a yylloc property with a value that follows the API. How the lexer internally maintains line/column information is not specified; when lex() returns a token the parser will use whatever value lexer.yylloc is.

fitzgen commented Aug 23, 2011

That just mentions how to use @1.first_line, etc, not how to set lexer.yyloc from a custom lexer, which is what was throwing me off. Just thought it would be nice.

Owner

zaach commented Aug 23, 2011

Oh, I see. It would really depend on the lexer, though a suggestive guide or concrete example could help out, for sure. So far, CoffeeScript is the only implementation with a custom lexer that I've seen.

showell commented Oct 6, 2011

Hi, just following up on this. Like @fitzgen, I ran across this issue in trying to modify CoffeeScript.

Contributor

wavebeem commented Feb 17, 2012

I'm working on a project that uses a custom lexer, so I agree that better documentation on setting this would be useful. An example would be fantastic, but I think I can probably figure it out based on looking at the generated lexer.

Owner

zaach commented Feb 17, 2012

The relevant lines are here, when the lexer sets yylloc. Just make sure to set it whenever the lexer finds a match, as you would with yytext. You can see here how jison pushes yylloc onto a stack, same as with yytext.

Contributor

wavebeem commented Feb 18, 2012

Thanks!

I've written a custom scanner called Lexer which can be integrated with Jison. Here's how you can expose the line and column information from Lexer to Jison:

var Parser = require("jison").Parser;
var Lexer = require("lex");

var grammar = {
    "bnf": {
        // ...
    }
};

var parser = new Parser(grammar);
var lexer = parser.lexer = new Lexer;

var row = 1;
var col = 1;

function track(callback) {
    return function (lexeme) {
        var first_line = row;
        var first_column = col;

        var lines = lexeme.split("\n");
        var length =  lines.pop().length;
        var newlines = lines.length;

        row += newlines;
        if (newlines > 0) col = length;
        else col += length;

        this.yylloc = {
            first_line: first_line,
            first_column: first_column,
            last_line: row,
            last_column: col
        };

        this.yytext = lexeme;

        return callback.call(this, lexeme);
    };
}

lexer.addRule(/\s+/, track(function () {}));

lexer.addRule(/[0-9]+(?:\.[0-9]+)?\b/, track(function (lexeme) {
    return "NUMBER";
}));

lexer.addRule(/$/, function () {
    return "EOF";
});

Hope that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment