Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Line beginning identifier (^) not work. #67

Open
neizod opened this Issue · 8 comments

6 participants

@neizod

In the lexer part:

"a"     return 'BODY';
^"a"    return 'HEAD';

test case: a a return token: BODY BODY while

^"a"    return 'HEAD';
"a"     return 'BODY';

return token: HEAD HEAD. (expected: HEAD BODY)

@jiaweihli

Have you tried using:

 "^a"
@neizod

just try it and nothing happen as i expected.

@zaach
Owner

This is tricky because the lexer uses JavaScript regular expressions, which don't allow you to start from an arbitrary position in a string. This means a new string is created each time starting at end of the last match, so ^ is technically alway true.

A possible workaround would be to prepend the input with a unique character and replace ^ with that character in the rules.

@zaach zaach was assigned
@victorporof

@zaach The y flag [0] may help with this, however I don't know about how supported it is in other browsers than Gecko-based.

[0] https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp

@alvaro-cuesta

Quick and dirty hack to solve this:

"a" %{
  this.yy_ = this;
  return (this.yylloc.first_column === 0) ? 'HEAD' : 'BODY';
%}
@aaditmshah

What about using custom scanners? I have written a library called Lexer in the spirit of Flex which allows you to match arbitrary expressions as follows:

var Parser = require("jison").Parser;
var Lexer = require("lex");

var grammar = {
    "bnf": {
        // ...
    }
};

var parser = new Parser(grammar);
var lexer = parser.lexer = new Lexer;

lexer.addRule(/^a/, function (lexeme) {
    this.yytext =  lexeme;
    return "BODY";
});

lexer.addRule(/a/, function (lexeme) {
    this.yytext = lexeme;
    return "HEAD";
});

Perhaps we could integrate it into Jison to be the default scanner? Advantages:

  1. It's easier to use regular expressions themselves instead of string descriptions of regular expressions.
  2. It's easier to use functions themselves instead of string descriptions of function bodies.
  3. Lexer currently supports some very powerful features such as start conditions, global patterns, optional case insensitive matching, optionally matching beginning and end of lines, etc.

I've also wanted to improve the performance of Lexer for quite a while by using Finite State Automata instead of native regular expressions. Perhaps we could work on that collaboratively?

@zaach
Owner

@aaditmshah A more JavaScript friendly lexer is definitely a nice thing to have, but one of the qualifications for the default lexer is that it can be expressed in a way that's familiar to Flex users.

I've thought about implementing a regex engine in JS, but building one with enough features and speed to be useful is more than I have time for. Another option I believe others have explored is compiling a C/C++ regex engine using emscripten.

@aaditmshah

I have enough time to implement a regex engine in pure JavaScript. What is the interface required to integrate a regex engine with jison? Is it the same interface that's exposed by jison-lex?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.