Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problematic linking of comments to AST nodes #22

Open
nene opened this issue Jun 12, 2013 · 1 comment
Open

Problematic linking of comments to AST nodes #22

nene opened this issue Jun 12, 2013 · 1 comment

Comments

@nene
Copy link
Contributor

nene commented Jun 12, 2013

The algorithm for linking comments to AST nodes is very simplistic and the resulting outcome is therefore not very useful.

For example:

// comment A
function foo(a, b) { // comment B
    var x = 1;
}

Here both comments get linked to the FunctionDeclNode. Which looks fine. But when we just add one extra newline:

// comment A
function foo(a, b)
{ // comment B
    var x = 1;
}

The comment B gets linked to VarStatementNode instead. Not too bad. But when we eliminate the var statement:

// comment A
function foo(a, b) { 
    // comment B
}

Now the comment B vanishes altogether. Like it hadn't been in the source code at all. However when we add another function after foo:

// comment A
function foo(a, b) { 
    // comment B
}

function bar() {
}

The comment B is now linked to function bar instead, although it clearly sits inside function foo.

There are more weird cases I could bring out...

There are many culprits collaborating in this problem:

  • Only some AST nodes have the line @line field assigned (seemingly arbitrarily).
  • The number in @line field refers sometimes to the line where the node begins. Sometimes to the line where it ends. (Seemingly arbitrarily again.)
  • Just one line number is not enough information about Node's position - one really needs to know both where a node begins and where it ends.
  • With just line numbers data everything falls apart when all the comments and code are on one long line - one really needs to know the character positions too.
  • The comments linking algorithm is simply too naive.
@nene
Copy link
Contributor Author

nene commented Jun 12, 2013

Having wondered over this all for quite a while, I've come to a realization that maybe it's not the job of the JavaScript parser to say how comments relate to syntax nodes.

There's no spec for such relationship. So everybody who uses the comment data can have their own arbitrary understanding of how the syntax nodes and comments should relate. It feels very app-specific to me. Some might not care about relating the comments to syntax nodes at all. Others might have their own algorithm for doing the mapping, that only makes sense in their domain.

So maybe it would be better to just leave this step for the users of the parser. Just include the list of all comment tokens into the parse result and be done with it.

For example the Esprima parser does exactly that: http://esprima.org/demo/parse.html

(I've been using Esprima, and I have my own algorithm for linking the comments to AST nodes, but it's not a general-purpuse algorithm - e.g. I'm discarding parts of the syntax tree that I'm not interested in.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant