Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli/loader): Add TREE_SITTER_INTERNAL_BUILD C/C++ compiler definition #1187

Merged
merged 1 commit into from
Jul 16, 2021
Merged

feat(cli/loader): Add TREE_SITTER_INTERNAL_BUILD C/C++ compiler definition #1187

merged 1 commit into from
Jul 16, 2021

Conversation

ahlinc
Copy link
Contributor

@ahlinc ahlinc commented Jun 20, 2021

Follow up for use case from #1186.
This PR adds C/C++ compiler macro definition that allows to detect that the external scanner was compiled directly by tree-sitter parse or other sub command and allows conditionally enable additional debug logic.

Example:

void *tree_sitter_NAME_external_scanner_create() {
  #ifdef TREE_SITTER_INTERNAL_BUILD
    return new Scanner(std::getenv("TREE_SITTER_DEBUG"));
  #else
    return new Scanner();
  #endif
}
struct Scanner {

#ifdef TREE_SITTER_INTERNAL_BUILD
  bool debug;
  Scanner(bool debug) {
    this->debug = debug;
  }
#endif

  void print_lookahead(TSLexer* lexer) {
    #ifdef TREE_SITTER_INTERNAL_BUILD
      if (debug) {
        string s;
        switch (lexer->lookahead) {
        case 0x20: s = "\\s"; break;
        case '\t': s = "\\t"; break;
        case '\n': s = "\\n"; break;
        case '\v': s = "\\v"; break;
        case '\f': s = "\\f"; break;
        case '\r': s = "\\r"; break;
        default: s = lexer->lookahead; break;
        }
        std::cout << "la " << s << " " << lexer->lookahead << std::endl;
      }
    #endif
  }

  void print_column(TSLexer* lexer) {
    #ifdef TREE_SITTER_INTERNAL_BUILD
      if (debug) {
        std::cout << "col " << lexer->get_column(lexer) << std::endl;
      }
    #endif
  }

};

@ahlinc
Copy link
Contributor Author

ahlinc commented Jun 30, 2021

This was rebased on top of current master branch.

@ahlinc
Copy link
Contributor Author

ahlinc commented Jul 3, 2021

@maxbrunsfeld do you plan to merge this suggestion or decline it?

@maxbrunsfeld
Copy link
Contributor

I think this seems fine, and it doesn't add much complexity, but I also think it would be fine (and is even simpler) to not use conditional compilation for this purpose, and to just always run the code that does if (debug) { printf(/* ... */); }, because branch prediction will make that conditional pretty much free.

I'm curious - what are your thoughts around that? Do you think there are cases where you'd see a real performance hit from always running the if (debug) conditional?

@dcreager
Copy link
Contributor

Do you think there are cases where you'd see a real performance hit

There can be some instruction cache pressure if unless you add some branch prediction annotations (e.g. if (unlikely(debug))), so that the compiler places the compiled debug message code in a separate chunk of memory.

@ahlinc
Copy link
Contributor Author

ahlinc commented Jul 16, 2021

@maxbrunsfeld @dcreager Thank you for comments! Now I see that my example in the description wasn't good enough.

First of all the proposed -DTREE_SITTER_INTERNAL_BUILD flag can be considered as a compilation time marker and it's up to the grammar authors do they like to use it or not.
In a complex external scanner that I wrote:

  • I have an extended serialization with a more context that helps me to augment the debug log with more information and with such compilation flag I can be sure that in the release mode my external scanner uses smaller serialization than when I use it within the tree-sitter parse -d command for a grammar development and debugging.
  • I found that on failed assertions in the external scanner it's very useful to call abort() function and crush an entire program immediately than allow the parser to finish it's job. With additional info about a line where the abort() function was called this allows to stop the debug log on failed assertion with a printed line where the scanner was interrupted. But in the release mode I prefer to have reliable scanner that doesn't crush my program even if there are some errors in the scanner logic.

For example:

#ifdef TREE_SITTER_INTERNAL_BUILD
  #define FAIL                                           \
    do {                                                 \
      std::cout << "fail on: " << __LINE__ << std::endl; \
      abort();                                           \
    } while (0)
#else
  #define FAIL \
    return false
#endif

Can be used as:

if (condition) FAIL;

@maxbrunsfeld
Copy link
Contributor

Yeah, that makes sense. Thanks for coming up with a good solution for this use case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants