New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
web-tree-sitter fails in Node.js 19 and higher: Error: bad export type for tree_sitter_tsx_external_scanner_create: undefined
#2338
Comments
Have you taken a look at the previous threads about how only certain imports are allowed in the external scanners if you want them to work with web-tree-sitter? |
@ahelwer got any links for me? I should be clear that I just use tree-sitter & ts/tsx grammars from an API perspective. I don't know much about their interactions with each other. I opened this issue since @hendrikvanantwerpen mentioned in the linked issue that the issue didn't seem related to That said, I can try to help how I can. |
There is now a repository with code and instructions on how to reproduce this issue here. (There is also a bit of background here followed by some possibly relevant comments.) Subsequent to creating the repository, I looked into things a bit more and found that adding This can be observed via a branch of the aforementioned repository that uses a debug version of In the async function main() {
await Parser.init();
// XXX: if `await` is added below, will not get error
tsx_do();
//await tsx_do();
await ts_do();
} If as mentioned in the comment in the code: tsx_do(); is replaced by: await tsx_do(); subsequent invocations do not result in an error (at least not here). |
I tried tweaking the Update: in retrospect, this perhaps makes sense because I tried Possibly the problem will occur given any two grammars that have external scanners. Update 2: This branch contains a reproduction that uses |
Looking through @TylerLeonhardt's previous quote from here:
I gave AFAICT, for the local Searching for import global GOT_func_tree_sitter_tsx_external_scanner_create:int; (env_memory_base + 1004736)[0]:int =
GOT_func_tree_sitter_tsx_external_scanner_create; export function tree_sitter_tsx_external_scanner_create():int {
return 0
} These are all expected lines AFAICT. From here it looks like Perhaps the problem is instead that that symbol is being looked for at all. IIUC, some code that leads to that lives in function reportUndefinedSymbols() {
for (var symName in GOT) {
if (GOT[symName].value == 0) {
var value = resolveGlobalSymbol(symName, true).sym;
if (!value && !GOT[symName].required) {
continue;
}
assert(value, "undefined symbol `" + symName + "`. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment");
if (typeof value == "function") {
GOT[symName].value = addFunction(value, value.sig);
} else if (typeof value == "number") {
GOT[symName].value = value;
} else {
throw new Error("bad export type for `" + symName + "`: " + typeof value);
}
}
}
}
Update: It appears there is only one Perhaps there is an issue with trying to load both |
I modified
None of those appear in the Update: Symmetrically, reversing the calls to
Similar to the above situation, none of the mentioned keys appear in the To be clear, in either direction both sets of external scanner symbol names are present as keys in |
Here is a branch that tries to make the reproducing code look a bit more like l10n-dev's code. Below is the source shrunk / tranformed (from a previous effort) and it still exhibits the issue: const Parser = require("web-tree-sitter");
const path = require("path");
////////////////////////////////////////////////////////////////////////
const initParser = Parser.init();
////////////////////////////////////////////////////////////////////////
const tsx_g =
(async function tsx_grammar() {
await initParser;
const tsx_wasm_path =
path.join(process.cwd(),
"node_modules/tree-sitter-typescript/tsx",
"tree-sitter-tsx.wasm");
return await Parser.Language.load(tsx_wasm_path);
})();
const ts_g =
(async function ts_grammar() {
await initParser;
const ts_wasm_path =
path.join(process.cwd(),
"node_modules/tree-sitter-typescript/typescript",
"tree-sitter-typescript.wasm");
return await Parser.Language.load(ts_wasm_path);
})(); Error output at the end is:
|
IIUC, static load(input) {
let bytes;
if (input instanceof Uint8Array) {
bytes = Promise.resolve(input);
} else {
const url = input;
if (
typeof process !== 'undefined' &&
process.versions &&
process.versions.node
) {
const fs = require('fs');
bytes = Promise.resolve(fs.readFileSync(url));
} else {
bytes = fetch(url)
.then(response => response.arrayBuffer()
.then(buffer => {
if (response.ok) {
return new Uint8Array(buffer);
} else {
const body = new TextDecoder('utf-8').decode(buffer);
throw new Error(`Language.load failed with status ${response.status}.\n\n${body}`)
}
}));
}
}
// emscripten-core/emscripten#12969
const loadModule =
typeof loadSideModule === 'function'
? loadSideModule
: loadWebAssemblyModule;
return bytes
.then(bytes => loadModule(bytes, {loadAsync: true}))
.then(mod => {
const symbolNames = Object.keys(mod)
const functionName = symbolNames.find(key =>
LANGUAGE_FUNCTION_REGEX.test(key) &&
!key.includes("external_scanner_")
);
if (!functionName) {
console.log(`Couldn't find language function in WASM file. Symbols:\n${JSON.stringify(symbolNames, null, 2)}`)
}
const languageAddress = mod[functionName]();
return new Language(INTERNAL, languageAddress);
});
} Not sure if it's a good idea, but changing: .then(bytes => loadModule(bytes, {loadAsync: true})) to: .then(bytes => loadModule(bytes, {})) or: .then(bytes => loadModule(bytes, {loadAsync: true, allowUndefined: true})) results in no errors. FWIW, I think this was hinted at in tree-sitter/tree-sitter-typescript#244 by @PF4Public:
I think we have an answer to that -- it looks like [1] |
I had to revisit this issue since we are now updating to Node.js v20 in VS Code. Thanks a lot @sogaiu for creating a minimal test case, it helped me resume debugging this issue quickly. Solutions: Any of the following will help address this issue Build time:
Runtime
Details: First went about rebuilding diff --git a/script/build-wasm b/script/build-wasm
index dc42895a..6ec17962 100755
--- a/script/build-wasm
+++ b/script/build-wasm
@@ -31,7 +31,7 @@ set -e
web_dir=lib/binding_web
emscripten_flags="-O3"
-minify_js=1
+minify_js=0
force_docker=0
emscripen_version=$(cat "$(dirname "$0")"/../cli/emscripten-version)
@@ -91,6 +91,8 @@ $emcc \
-s ALLOW_MEMORY_GROWTH=1 \
-s MAIN_MODULE=2 \
-s NO_FILESYSTEM=1 \
+ -s DYLINK_DEBUG=2 \
+ -s LIBRARY_DEBUG=1 \
-s NODEJS_CATCH_EXIT=0 \
-s NODEJS_CATCH_REJECTION=0 \
-s EXPORTED_FUNCTIONS=@${web_dir}/exports.json \ Now in the working case, the logs are as follows Log``` initRuntime loadWebAssemblyModule: undefined dylink needed: loadWebAssemblyModule: undefined dylink needed: getMemory: 961296 runtimeInitialized=true loadModule: growing table: 2 loadModule: memory[78128:1039408] table[129:131] getMemory: 1006428 runtimeInitialized=true loadModule: growing table: 2 loadModule: memory[1039424:2045836] table[131:133] new GOT entry: tree_sitter_typescript_external_scanner_create new GOT entry: tree_sitter_typescript_external_scanner_destroy new GOT entry: tree_sitter_typescript_external_scanner_scan new GOT entry: tree_sitter_typescript_external_scanner_serialize new GOT entry: tree_sitter_typescript_external_scanner_deserialize relocating export: EXISTING SYMBOL: __wasm_call_ctors relocating export: EXISTING SYMBOL: __wasm_apply_data_relocs relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_create relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_destroy relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_reset relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_serialize relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_deserialize relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_scan relocating export: EXISTING SYMBOL: tree_sitter_typescript updateGOT: adding 9 symbols updateGOT: before: tree_sitter_typescript_external_scanner_create : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_create : 133 updateGOT: after: tree_sitter_typescript_external_scanner_create : 133 (function 4() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_destroy : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_destroy : 134 updateGOT: after: tree_sitter_typescript_external_scanner_destroy : 134 (function 5() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_reset : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_reset : 134 updateGOT: after: tree_sitter_typescript_external_scanner_reset : 134 (function 5() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_serialize : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_serialize : 135 updateGOT: after: tree_sitter_typescript_external_scanner_serialize : 135 (function 6() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_deserialize : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_deserialize : 136 updateGOT: after: tree_sitter_typescript_external_scanner_deserialize : 136 (function 7() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_scan : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_scan : 137 updateGOT: after: tree_sitter_typescript_external_scanner_scan : 137 (function 8() { [native code] }) updateGOT: before: tree_sitter_typescript : 0 updateGOT: FUNC: tree_sitter_typescript : 138 updateGOT: after: tree_sitter_typescript : 138 (function 9() { [native code] }) done updateGOT reportUndefinedSymbols done reportUndefinedSymbols applyRelocs [library call:_tree_sitter_parse_callback: 0x001f4148 (2048328),0,0,0,0x001f3808 (2045960)] [library call:_tree_sitter_parse_callback: 0x001f4148 (2048328),0x00000010 (16),0x00000001 (1),0x0000000a (10),0x001f3808 (2045960)] (program (expression_statement (type_assertion (type_arguments (type_identifier)) (identifier))) (expression_statement (type_assertion (type_arguments (generic_type name: (type_identifier) type_arguments: (type_arguments (type_identifier)))) (member_expression object: (identifier) property: (property_identifier))))) new GOT entry: tree_sitter_tsx_external_scanner_create new GOT entry: tree_sitter_tsx_external_scanner_destroy new GOT entry: tree_sitter_tsx_external_scanner_scan new GOT entry: tree_sitter_tsx_external_scanner_serialize new GOT entry: tree_sitter_tsx_external_scanner_deserialize relocating export: EXISTING SYMBOL: __wasm_call_ctors relocating export: EXISTING SYMBOL: __wasm_apply_data_relocs relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_create relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_destroy relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_reset relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_serialize relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_deserialize relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_scan relocating export: EXISTING SYMBOL: tree_sitter_tsx updateGOT: adding 9 symbols updateGOT: before: tree_sitter_tsx_external_scanner_create : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_create : 139 updateGOT: after: tree_sitter_tsx_external_scanner_create : 139 (function 4() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_destroy : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_destroy : 140 updateGOT: after: tree_sitter_tsx_external_scanner_destroy : 140 (function 5() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_reset : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_reset : 140 updateGOT: after: tree_sitter_tsx_external_scanner_reset : 140 (function 5() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_serialize : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_serialize : 141 updateGOT: after: tree_sitter_tsx_external_scanner_serialize : 141 (function 6() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_deserialize : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_deserialize : 142 updateGOT: after: tree_sitter_tsx_external_scanner_deserialize : 142 (function 7() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_scan : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_scan : 143 updateGOT: after: tree_sitter_tsx_external_scanner_scan : 143 (function 8() { [native code] }) updateGOT: before: tree_sitter_tsx : 0 updateGOT: FUNC: tree_sitter_tsx : 144 updateGOT: after: tree_sitter_tsx : 144 (function 9() { [native code] }) done updateGOT reportUndefinedSymbols done reportUndefinedSymbols applyRelocs ```In the failure case, you can see the GOT entries for both tsx and typescript are accessed at the same time while the actual symbol relocating happens later Log``` initRuntime loadWebAssemblyModule: undefined dylink needed: loadWebAssemblyModule: undefined dylink needed: getMemory: 961296 runtimeInitialized=true loadModule: growing table: 2 loadModule: memory[78128:1039408] table[129:131] getMemory: 1006428 runtimeInitialized=true loadModule: growing table: 2 loadModule: memory[1039424:2045836] table[131:133] new GOT entry: tree_sitter_typescript_external_scanner_create new GOT entry: tree_sitter_typescript_external_scanner_destroy new GOT entry: tree_sitter_typescript_external_scanner_scan new GOT entry: tree_sitter_typescript_external_scanner_serialize new GOT entry: tree_sitter_typescript_external_scanner_deserialize new GOT entry: tree_sitter_tsx_external_scanner_create new GOT entry: tree_sitter_tsx_external_scanner_destroy new GOT entry: tree_sitter_tsx_external_scanner_scan new GOT entry: tree_sitter_tsx_external_scanner_serialize new GOT entry: tree_sitter_tsx_external_scanner_deserialize relocating export: EXISTING SYMBOL: __wasm_call_ctors relocating export: EXISTING SYMBOL: __wasm_apply_data_relocs relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_create relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_destroy relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_reset relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_serialize relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_deserialize relocating export: EXISTING SYMBOL: tree_sitter_typescript_external_scanner_scan relocating export: EXISTING SYMBOL: tree_sitter_typescript updateGOT: adding 9 symbols updateGOT: before: tree_sitter_typescript_external_scanner_create : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_create : 133 updateGOT: after: tree_sitter_typescript_external_scanner_create : 133 (function 4() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_destroy : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_destroy : 134 updateGOT: after: tree_sitter_typescript_external_scanner_destroy : 134 (function 5() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_reset : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_reset : 134 updateGOT: after: tree_sitter_typescript_external_scanner_reset : 134 (function 5() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_serialize : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_serialize : 135 updateGOT: after: tree_sitter_typescript_external_scanner_serialize : 135 (function 6() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_deserialize : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_deserialize : 136 updateGOT: after: tree_sitter_typescript_external_scanner_deserialize : 136 (function 7() { [native code] }) updateGOT: before: tree_sitter_typescript_external_scanner_scan : 0 updateGOT: FUNC: tree_sitter_typescript_external_scanner_scan : 137 updateGOT: after: tree_sitter_typescript_external_scanner_scan : 137 (function 8() { [native code] }) updateGOT: before: tree_sitter_typescript : 0 updateGOT: FUNC: tree_sitter_typescript : 138 updateGOT: after: tree_sitter_typescript : 138 (function 9() { [native code] }) done updateGOT reportUndefinedSymbols assigning dynamic symbol from main module: tree_sitter_tsx_external_scanner_create -> !UNDEFINED! relocating export: EXISTING SYMBOL: __wasm_call_ctors relocating export: EXISTING SYMBOL: __wasm_apply_data_relocs relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_create relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_destroy relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_reset relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_serialize relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_deserialize relocating export: EXISTING SYMBOL: tree_sitter_tsx_external_scanner_scan relocating export: EXISTING SYMBOL: tree_sitter_tsx updateGOT: adding 9 symbols updateGOT: before: tree_sitter_tsx_external_scanner_create : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_create : 139 updateGOT: after: tree_sitter_tsx_external_scanner_create : 139 (function 4() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_destroy : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_destroy : 140 updateGOT: after: tree_sitter_tsx_external_scanner_destroy : 140 (function 5() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_reset : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_reset : 140 updateGOT: after: tree_sitter_tsx_external_scanner_reset : 140 (function 5() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_serialize : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_serialize : 141 updateGOT: after: tree_sitter_tsx_external_scanner_serialize : 141 (function 6() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_deserialize : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_deserialize : 142 updateGOT: after: tree_sitter_tsx_external_scanner_deserialize : 142 (function 7() { [native code] }) updateGOT: before: tree_sitter_tsx_external_scanner_scan : 0 updateGOT: FUNC: tree_sitter_tsx_external_scanner_scan : 143 updateGOT: after: tree_sitter_tsx_external_scanner_scan : 143 (function 8() { [native code] }) updateGOT: before: tree_sitter_tsx : 0 updateGOT: FUNC: tree_sitter_tsx : 144 updateGOT: after: tree_sitter_tsx : 144 (function 9() { [native code] }) done updateGOT reportUndefinedSymbols done reportUndefinedSymbols applyRelocs ```Given this hinted a change in the loader which for Node.js is performed by V8, I went next to rewrite the test case so that it can be loaded in d8, since emscripten supported it as a runtime environment not much change was needed to get this working which quickly helped confirm the issue as well as bisect the commit that addressed it. diff --git a/lib/binding_web/binding.js b/lib/binding_web/binding.js
index 5352cb18..5ef40a26 100644
--- a/lib/binding_web/binding.js
+++ b/lib/binding_web/binding.js
@@ -888,16 +888,7 @@ class Language {
const fs = require('fs');
bytes = Promise.resolve(fs.readFileSync(url));
} else {
- bytes = fetch(url)
- .then(response => response.arrayBuffer()
- .then(buffer => {
- if (response.ok) {
- return new Uint8Array(buffer);
- } else {
- const body = new TextDecoder('utf-8').decode(buffer);
- throw new Error(`Language.load failed with status ${response.status}.\n\n${body}`)
- }
- }));
+ bytes = new Uint8Array(readbuffer(input));
}
}
@@ -907,8 +898,7 @@ class Language {
? loadSideModule
: loadWebAssemblyModule;
- return bytes
- .then(bytes => loadModule(bytes, {loadAsync: true}))
+ return loadModule(bytes, {loadAsync: true})
.then(mod => {
const symbolNames = Object.keys(mod)
const functionName = symbolNames.find(key => Given the issue is addressed with newer versions of the CLI, no action is needed on this module at this time. |
馃憢 I'm coming from tree-sitter/tree-sitter-typescript#244 (comment) which was originally called out to be a node-tree-sitter issue... until I remembered that I don't use
node-tree-sitter
but ratherweb-tree-sitter
in Node.js (because it's easier to distribute in a CLI).This seems to be the repo for
web-tree-sitter
so I am opening a similar issue here for web-tree-sitter in case it also needs a fix.I wanted to add another detail... you can easily reproduce this issue by running using Node.js 19+ as well:
@deepak1556 gave some advice:
Also gonna @maxbrunsfeld and @verhovsky here since they have been really helpful over in the node-tree-sitter repo and might know what's required here 馃檹
The text was updated successfully, but these errors were encountered: