Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro annotations #40

Open
calixteman opened this issue Jun 4, 2019 · 9 comments
Open

Macro annotations #40

calixteman opened this issue Jun 4, 2019 · 9 comments

Comments

@calixteman
Copy link
Collaborator

In firefox code, we use some macros to make annotations (for clang static analysis for example).
For example:
https://searchfox.org/mozilla-central/source/dom/animation/AnimationTarget.h#54
or
https://searchfox.org/mozilla-central/source/mfbt/CheckedInt.h#510
Such annotations lead to parse errors and I understand that (because likely this comment itself is valid C++/C: we just have to define some good macros).
So is it something tree-sitter-c/cpp could handle or should I have my own grammar for this code ?
I'd prefer of course the former.

@maxbrunsfeld
Copy link
Contributor

I'm not sure. Currently, when we using tree-sitter-cpp in Atom for syntax highlighting, we rely on error recovery to handle things like that. So in most cases, only one or two lines will be incorrectly colored. It's a tricky problem.

Just curious - what kind of task are you using tree-sitter-cpp for here? Do localized syntax errors completely ruin things? If so, one option is to run your file through the preprocessor before feeding it to Tree-sitter.

@maxbrunsfeld
Copy link
Contributor

I haven't fully thought through whether it's possible to handle this as part of the grammar. If you're interested in trying it out, I'd be open to that.

I'm just concerned that parsing that kind of preprocessor usage might add a ton of ambiguity to the grammar.

@calixteman
Copy link
Collaborator Author

My goal is to analyze the code itself, for example to compute code metrics (like mccabe) or to have more granularity on what the patches are doing (e.g. touch function foo or just rename function foo to bar, ...), compute a callgraph...

@calixteman
Copy link
Collaborator Author

yeah or just add MOZ_[A-Z_]+ in extras... or if it leads to ambiguities then just update the grammar for them.

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Jun 4, 2019

just add MOZ_[A-Z_]+ in extras

That seems like it would work (and it wouldn't introduce any ambiguities), although you'd have to fork the grammar, because I don't think we should add mozilla-specific stuff to the mainline grammar.

Alternatively you could just replace those words with whitespace before passing the file to Tree-sitter.

@calixteman
Copy link
Collaborator Author

Ideally I don't want to fork and I'd prefer to contribute and improve things.
And yes I agree that it won't be a good idea to add mozilla stuff.
Anyway maybe we could handle some funs/vars annotations we could get via a config file and then have a scanner which will return these special "keywords" with type == "annotation".

@maxbrunsfeld
Copy link
Contributor

Yeah we don't have a built-in way right now to handle configuration files at runtime right now, but you could handle it at parser-generation time by checking for some environment variable in the grammar and adding some extras.

let customExtras = [];
if (process.env.TREE_SITTER_CPP_CUSTOM_EXTRAS) {
  customExtras = process.env.TREE_SITTER_CPP_CUSTOM_EXTRAS.split(' ');
}

// ...

module.exports = grammar({
  // ...

  extras: $ => [$.comment, /\s/, ...customExtras],
}

And then you'd have to regenerate the parser yourself

export TREE_SITTER_CPP_CUSTOM_EXTRAS="MOZ_IMPLICIT MOZ_FOO"
tree-sitter generate

Not sure how useful that is, but I'd be ok with adding some hook like that to the grammar.

@calixteman
Copy link
Collaborator Author

Could be interesting.
I could get relevant macros with current code and then pass them to generate and recompile everything.
Let me think about that tomorrow.

@maxbrunsfeld
Copy link
Contributor

I also think that it might be simpler to just strip them with sed or similar (replacing them with spaces to preserve file offsets) before parsing. But there might be some disadvantage of that approach that I'm not thinking of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants