Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to NAPI #190

Merged
merged 8 commits into from
Mar 8, 2024
Merged

Convert to NAPI #190

merged 8 commits into from
Mar 8, 2024

Conversation

segevfiner
Copy link
Contributor

@segevfiner segevfiner commented Feb 25, 2024

Re-did the conversion to napi based on the work in #129 by @MichaelBelousov & @maxbrunsfeld

Language bindings generation: tree-sitter/tree-sitter#3077

Notes:

  • This is not backwards compatible, needed for this to be a true napi module. If we want backwards compat, one way is to create some other non-napi module that can do the setLanguage for the old grammar, and bind those together in JS when available. (optional peer dependency).
  • Some arbitrary changes from the napi branch might have been lost, as I went
    over the code again and tried to apply the minimal changes due to the large number of conflicts, let me know if you
    want to bring any over.
  • Do we want conversion functions to throw or use std::optional instead of Napi::Maybe?
  • We should type tag language objects.
  • We should generally consider the places we Unwrap whether we need a custom function
    that checks a type tag, or we can remove such checks entirely? Kinda of a
    common pitfall with Napi and the API kinda shrugs over it ATM (See Query::UnwrapQuery for example). For now I left it with the InstanceOf check. See https://nodejs.org/api/n-api.html#object-wrap
  • Do we really need the whole buffering thing in CallbackInput, can't we simply just take Utf16Value, store that, and return a pointer to it?
  • Do we really need the transfer buffers, can't we just return objects back and forth instead? I'm not sure what they are for. So I just kept them as is.

cc @verhovsky

@segevfiner
Copy link
Contributor Author

Seems like some test is failing due to a weird race condition in Jest... Not sure how to fix that, as it doesn't happen locally and I'm not sure how a race condition is even possible here or where it happens...

@segevfiner segevfiner changed the title WIP: Convert to NAPI Convert to NAPI Feb 25, 2024
@verhovsky
Copy link
Collaborator

"weird race condition in Jest" sounds similar to #75 but the error is a bit different.

@segevfiner
Copy link
Contributor Author

Yeah, cause this one seems to happen in the middle of that test for some reason... There is really nothing that should be non-deterministic here besides Jest, so I can't think of anything else causing it 😕

@maxbrunsfeld
Copy link
Contributor

Hey @segevfiner, thanks for picking this up! I've mostly stepped away from maintaining this module, since I don't use it in any of my projects any more, but a couple of thoughts:

Do we really need the transfer buffers, can't we just return objects back and forth instead? I'm not sure what they are for. So I just kept them as is.

I don't remember all of the details here, and I haven't read your diff in detail, but in many cases, the reason for using buffers is that creating JavaScript objects from native bindings is very slow, so we try to avoid doing it at all in code paths that could be called in a loop.

Specifically, creating new objects and setting properties on them through the native V8 APIs always uses the non-optimal code paths in V8, in which the object acts like a general hash table. Whereas when creating the objects in JS, V8 can apply many more optimizations.

Do we really need the whole buffering thing in CallbackInput, can't we simply just take Utf16Value, store that, and return a pointer to it?

I don't honestly remember. I think I did this to avoid excessive allocations. As far as I remember, there is no non-copying way to retrieve the UTF16 data out of a V8 string; you have to copy it into a buffer. Maybe this has changed since then, and NAPI exposes some non-copying mechanism? I'm not sure. But assuming that it hasn't changed, then the reason for keeping the buffer is to avoid having to reallocate it on every call to the input callback. I also think that sometimes, during incremental parsing, Tree-sitter calls the input callback but only reads a small length of text. That's why I limited the size of the buffer and hold a persistent reference to the previous string - so that we'd avoid copying an unlimited of text (if JS decided to return one giant string) only to have most of unused.

@verhovsky
Copy link
Collaborator

verhovsky commented Feb 26, 2024

I got the Jest error consistently for a while on my M1 (arm64) Mac running macOS 14.3.1 with Node 21.6.2 installed with Homebrew. Then it started passing now I can't get it to fail again. I also re-ran some of the Node 20 or 18 tests in the CI and one that failed started passing and one that initially passed failed.

The NodeClass seems like it might be a real issue though because I got it once locally in the mocha tests as well:

  51 passing (334ms)
  1 failing

  1) Node
       .text
         returns the text of a node generated by .parse(String):
     TypeError: NodeClass is not a constructor
      at unmarshalNode (index.js:693:18)
      at Tree.get (index.js:34:14)
      at Context.<anonymous> (test/node_test.js:405:32)

I was console.log'ing a bit when the above happened and nodeTypeId (and therefore NodeClass) were both undefined,

the tree looked like this
Tree {
  input: 'α0 / b👎c👎',
  getText: [Function: getTextFromString],
  language: {
    name: 'javascript',
    language: [External: 6000024f13a0],
    nodeTypeInfo: [
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      ... 128 more items
    ],
    nodeSubclasses: [
      [class SyntaxNode],
      [class IdentifierNode extends SyntaxNode],
      [class HashBangLineNode extends SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class HtmlCharacterReferenceNode extends SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class IdentifierNode extends SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class StringFragmentNode extends SyntaxNode],
      [class StringFragmentNode extends SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class OptionalChainNode extends SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      [class SyntaxNode],
      ... 171 more items
    ]
  }
}

When I got the error in the Jest tests it was different, they were both defined as

1
[class IdentifierNode extends SyntaxNode]

assert(cursor.gotoParent());
assert(cursor.gotoParent());
assert(!cursor.gotoParent());
if (tree.rootNode) {
Copy link
Collaborator

@verhovsky verhovsky Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is related to TypeError: NodeClass is not a constructor but not exactly)

The Jest tests were added specifically just to test that node-tree-sitter's JavaScript code handles bizarre results from the C++ code where everything is undefined in #164 . Instead of adding this if (tree.rootNode) check, you could just disable the Jest tests... Ideally someone would figure out why this is happening and how to solve it, since it obviously makes no sense that jsParser.parse("a * b + c / d").rootNode returns nothing

It has something to do with the way Jest imports stuff when running tests in worker processes and there's some sort of race condition when the native bindings are undefined. According to MaibornWolff/metric-gardener#42 (comment)

this issue is triggered especially if the number of test files is higher than the number of worker processes used by jest

and setting Jest's --maxWorkers to 24 helps. I tried the --maxWorkers setting but it didn't help me reproduce the issue.

jestjs/jest#9206 was an upstream issue for this, which was closed as stale by a bot

See also #53

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to just say we won't support Jest, this is clearly a bug on their end and if the PR that purportedly fixed it doesn't fix it, we should just revert it.

@segevfiner
Copy link
Contributor Author

Also see segevfiner#1 where I add type tags. Can be merged into this one or as a followup PR if you want it, will also require tree-sitter/tree-sitter#3109.

@amaanq amaanq force-pushed the napi-2 branch 3 times, most recently from 57f28e3 to a6d6cf3 Compare March 4, 2024 06:41
@amaanq amaanq mentioned this pull request Mar 4, 2024
3 tasks
@amaanq
Copy link
Member

amaanq commented Mar 4, 2024

@segevfiner @verhovsky do check out the changes I've made now, I mostly ported over #163 and tidied some bits up

@amaanq amaanq force-pushed the napi-2 branch 2 times, most recently from 7cceb8e to 5827523 Compare March 8, 2024 05:58
kskdkdojrenbsndj

This comment was marked as spam.

@segevfiner segevfiner deleted the napi-2 branch June 2, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants