Skip to content

Change internals to use ESM #58523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
james-pre opened this issue May 31, 2025 · 9 comments
Open

Change internals to use ESM #58523

james-pre opened this issue May 31, 2025 · 9 comments

Comments

@james-pre
Copy link

ESM is the way to handle cross-file relationships. Plus, it's built into the language.

Node.js' internals still use require, meaning new contributors need to learn about CJS and all of the changes it brings.

Moving from CJS to ESM internally comes with numerous benefits:

  • The syntax is immediately recognizable to new contributors
  • JS boilerplate is eliminated, increasing performance
  • Module-only features can be used (e.g. top-level await)

For lib/internal, import specifiers are open to bikeshedding but I think either using node:internal/... or internal: would work nicely.

This is a significant amount of work, and I'd appreciate some feedback before I pour dozens of hours into it.

Also, I imagine that Node.js would move away from CJS completely in the future (though this is just my opinion).

@jasnell
Copy link
Member

jasnell commented May 31, 2025

Unfortunately it's generally not possible for us to move the internals to ESM without extremely careful consideration and significant risk. Fundamentally it would be a massively breaking change for a great deal of the ecosystem I wouldn't do any work on this until it is absolutely certain that we had a broad consensus that this is the direction we want to go and we'd likely need to do it all at once -- it's not likely to be something we could do incrementally without.... headaches. Even the new require(esm) support is not quite enough.

To illustrate the kinds of issues, consider the following bit of code:

queueMicrotask(() => console.log('A'));
process.nextTick(() => console.log('B'));

When interpreted as commonjs, the output of this code is to print B first and then A. When intrepreted as ESM, the output of this code is A, then B. While that may seem fairly innocent, keep in mind that both queueMicrotask and process.nextTick are used extensively throughout the internals. Switching the internals to ESM would likely mean a great many behaviors that users have come to assume (such as the ordering of certain kinds of events) would likely change is subtly breaking ways that are likely quite difficult to anticipate.

Also keep in mind that ESM support is still lacking critical observability support. With loader hooks and other ongoing efforts this is improving but there are still a range of issues to work through. Moving Node.js internals to ESM would necessarily make it much more difficult for users to observe and debug what their application is doing using common APM tools.

Third, it is still quite common for modules in the ecosystem to monkeypatch Node.js internals. A move to ESM would necessarily break those modules. While such monkeypatching is terrible there's a nontrivial portion of the ecosystem that depends on it.

This is all to say that sure, it's conceivable that we would do this at some point but it's not a simple, quick decision that we just randomly decide to do someday. It would likely need to be a carefully coordinated, carefully executed strategic initiative stretched out over several major releases.

@joyeecheung
Copy link
Member

joyeecheung commented May 31, 2025

It’s worth noting that Node.js internals do not actually use CJS, they use a special loader isolated from the user land CJS loader, and have internal-specific context globals like internalBinding. Only the require and module.exports might make them look like CJS, but they are not CJS - they cannot load code directly from user land - no matter CJS or ESM, for example. And not all internals actually use require to load things or exports to export things - some of the bootstrap-related internals just pass things around through context globals like primordials, and ESM as a format does not currently support context globals. To support internal ESM doesn’t mean we can switch from a CJS loader to an existing ESM loader, rather, it means a new internal loader that compiles code as modules instead of script and has special linking/loading rules must be written from scratch, I am not quite sure if it can actually be implemented to fully support internals yet.

The internals also use lazy loading and lazy exporting very extensively, and some internals need to patch other undeclared internals dynamically (e.g. when experimental flags are passed), all of which isn’t well supported by ESM or just not possible, since ESM is designed to be quite static about import/exports while Node.js internals are highly dynamic, much more so than average user code. I think internal ESM might still be possible but only in some of the leaf modules, and likely cannot replace all internal module formats due to the dynamism required to make internals performant/communicate.

@ljharb
Copy link
Member

ljharb commented May 31, 2025

ESM is the way to handle cross-file relationships

Also, this is incorrect.

@james-pre
Copy link
Author

james-pre commented Jun 1, 2025

I really appreciate all of the feedback! While I have read some of the Node.js source, I don't have anything close to a comprehensive understanding of the codebase.

I absolutely agree with @jasnell that this shouldn't be rushed, we need to make sure any changes are done right.

Before moving on to more general discussion, I'd like to address a few specific points:

process.nextTick vs queueMicrotask

These broadly work the same: some function is executed asynchronously and non-blocking as soon as possible.

Find and replace along with moving the needed calls around should result with the same ordering of operations. This could expose race conditions though, which I think would be a good thing since they can then be fixed.

Monkey-patching

While I personally disagree with the entire concept of monkey-patching, it is undeniably used. In the best interest of security, it would make sense to disallow monkey patching in a breaking update, however this will likely not be done.

Dynamic imports

As mentioned, static imports would present significant problems. Fortunately, import() is a solid, standard API that can be used for dynamic imports. If synchronous importing is needed, something like require could be used (it could just be renamed to importSync or something).

With my specific thoughts out of the way, I'd like to move on to some broader ideas.

First, I think it's important to differentiate between lib and lib/internal. For clarification, I'm only going to refer to lib/internal as internal.

These two sides of Node.js create a architectural boundary. I'm not very experienced with how these two interact, beyond require('internal/...'), so feel free to provide context and clarification.

To start, I think we should focus on the non-internal lib code. The internals can always be looked at later, but as @joyeecheung mentioned, the code is completely separate.

When looking at lib, a question keeps popping up for me: how separable is lib from the rest of the codebase?

I think part of this issue is my goal of being able to reuse lib on other hosts with less effort. For example, queueMicrotask is widely available while process.nextTick is exclusive to Node.js and not having to emulate a bunch of Node.js exclusive internals would save a ton of time. This would allow libraries to easily emulate Node.js APIs on other hosts. I created and maintain @zenfs/core, which aims to emulate the entire Node.js file system API. readable-stream has over 160 million downloads every week. Packages emulating node:events have at least 100 million weekly downloads. Clearly, there is significant usage of (ported) Node.js APIs on other hosts.

Looking broader than just using standardized modules, using standardized ECMAScript would offer the same benefits I mentioned initially on a larger scale.

So, I have another question: What parts of lib do you think could be improved or changed with built-in language features?

I hope to create some positive and meaningful discussion.

@joyeecheung
Copy link
Member

joyeecheung commented Jun 1, 2025

First, I think it's important to differentiate between lib and lib/internal. For clarification, I'm only going to refer to lib/internal as internal.

I think that might be a misconception that we should document better....there is essentially no difference between how lib and lib/internal are loaded in core, the only difference is whether the final exports of them are exposed directly to users, and that's also something we can control programmatically. Code can be easily moved between lib and lib/internal and so long as they are not explicitly exported to users, they are essentially not separate. The boundary of module loading lies between these are native and internal JS (both lib and lib/internal), and internal JS and user-land JS. Between lib and lib/internal there are no meaningful boundaries.

When looking at lib, a question keeps popping up for me: how separable is lib from the rest of the codebase?

I am not sure what's "the rest of the codebase" referring to - but I assume it's talking about test or benchmark that are run in user land. The separation does not come from JS format, but from how they are executed and what gets exposed to them. Both code from lib and lib/internal have access to internal bindings from the native layer, they can load other internal JS modules, and they have certain context globals injected from the native layer that should not be visible to users. Their source code come from memory, loaded from within the binary that's embedded into the executable during the build process, not from disk, and the source code cannot be mutated or leaked into the user land. In core we call both code from lib and lib/internal internals due to these special properties implemented for better performance and encapsulation. These properties need to be re-implemented for ESM if we want to support ESM as a format for internal modules. The user land ESM loader does not support any of these and cannot be reused.

Fortunately, import() is a solid, standard API that can be used for dynamic imports. If synchronous importing is needed, something like require could be used (it could just be renamed to importSync or something).

With how the internals generally do lazy loading especially for circular dependencies, I don't think import helps and it generally must be synchronous. For example patterns like this can be seen throughout the code base - lazy loading another dependency when a synchronous function/getter is invoked for the first time.

case 'TTY': {
const tty = require('tty');
stream = new tty.WriteStream(fd);
stream._type = 'tty';
break;
}
case 'FILE': {
const SyncWriteStream = require('internal/fs/sync_write_stream');
stream = new SyncWriteStream(fd, { autoClose: false });
stream._type = 'fs';
break;
}

And, again, import or require(esm)/importSync would not be available out of the box because internals have a completely different module loader implementation, they cannot reuse the implementation of the user-land import() or require(esm) - I think there might be a misconception this is an API implemented by V8 so that we can get it out of the box? But it is not, V8 only parses the syntax, the heavy lifting of ESM loading is implemented by Node.js via V8 callbacks, and to use ESM internally, it's Node.js that needs to implement a new ESM loader that supports the internal module properties, it can not reuse the user-land ESM loader, as explained before.

I think part of this issue is my goal of being able to reuse lib on other hosts with less effort.

I think that can be offered whenever we can, but should not be prioritized over compatibility, performance and stability of core. Generally we either vendor-out (like readable-stream) or vendor-in dependencies (like undici) to solve it, and I think vendoring-in is the right direction as it allows easier semver management for the dependencies. If one day we need to vendor-in an ESM dependency it would be the time to eventually support ESM in core, but as I mentioned in the earlier comments, this would likely only be done for some leaf modules (vendor-in dependencies generally are). Trying to vendor-out code from Node.js, in general, brings us more issues than it solves especially if the code still needs to be actively maintained or optimized, or needs access to other internals. I think vendoring out modules that are somewhat self-contained and on the leaf side of the graph, like readable streams, are already hard enough, vendoring out the entire JS surface would be even more challenging, not to mention we don't necessarily always keep them implemented in JS and we move things into native for better performance and encapsulation all the time, then there's no reusability for hosts for them. I think splitting out self-contained utilities to be their own packages, refactoring them as ESM, and then vendoring them in would be a lot more feasible than trying to convert the entire internal JS part of Node.js to be ESM. Though I think spliting existing things out would need an internal ESM loader implementation already in place as guarantee or otherwise we'd never know whether we can load them back in.

What parts of lib do you think could be improved or changed with built-in language features?

New language features can be integrated to add functionality or syntax sugar can be used, of course, but in terms of changing the writing style of internals per-se for contributors, not for user-visible functionalities, I don't think I am seeing new stage 4 language features that would affect how most internals are written significantly - at most they affect a certain use case, but not on the codebase holistically - e.g. during the TC39 meetings this week, AsynContext and inspector championed by Node.js collaborators are all dedicated to a specific use case and can be incorporated when they lands, but they would not affect most of the code base that are unrelated to the features being proposed.

There were once discussions about something like https://github.com/tc39/proposal-symbol-proto to reduce the churns of primordials but I don't think it's being actively pursued in TC39 any more.

@james-pre
Copy link
Author

james-pre commented Jun 1, 2025

Hey @joyeecheung, thank you again for the extensive information.

I am not sure what's "the rest of the codebase"

I intended to mean lib/internal, src, etc. Sorry for the confusion.

I was using import() because it is familiar syntax. Just like how the internal require is different, import() could work differently as well. I think the import/export logic currently used is unlikely to change though.

New language features ...

My question here also involved new APIs that are a part of the language. I was going to use process.nextTick -> queueMicrotask as a good example until I found it is actually part of the HTML spec.

I think splitting out self-contained utilities, refactoring them as ESM, and then vendoring them in would be a lot more feasible than trying to convert the entire internal JS part of Node.js to be ESM.

I agree.

I understand that this is off-topic from the original issue, an unfortunate XY problem. My use case is using Node.js' implementation of fs instead of rewriting the entire thing, which doesn't necessarily require ESM.

It could be possible to split out parts of lib (maybe including lib/internal) into packages and then use them. Importantly, the source of these packages would not be changed.

Since many internal utilities used are exposed globally, requireing the externalized packages would immediately just work.

Further, it isn't necessary to have separate repositories. Separate packages can be published with package.jsons in lib, or even package.jsons generated at build time.

What I imagine is that third parties could use Node.js' lib outside of Node.js. For example:

function internalBinding(path) {
    // emulation
}

// etc.

Object.assign(globalThis, { internalBinding });

// defines `require`
await import('@node/internal/...');

const fs = await import('@node/fs');

// maybe not? 
delete globalThis.internalBinding;

fs.readdirSync(`/`);

The end goal is being able to run code using Node.js APIs on other hosts, provided the necessary emulation for the native side exists.

Please let me know your thoughts. I'd love to see lib published separately in order to use it.

@joyeecheung
Copy link
Member

joyeecheung commented Jun 1, 2025

My use case is using Node.js' implementation of fs instead of rewriting the entire thing, which doesn't necessarily require ESM.

I think if we are heading towards that direction, fs in particular should probably be excluded explicitly - there are still plans (see the commits by @anonrig) to move more of fs into native altogether and just not put a JS layer in between at all. Although fs streams are a different matter. This kind of splitting should probably be limited to modules that almost need no access to the native layer, and fs is specifically just glue between the native layer and user code, and the JS glue can be eliminated at any time for better performance.

@james-pre
Copy link
Author

I think if we are heading towards that direction, fs in particular should probably be excluded explicitly - there are still plans (...) to move more of fs into native altogether and just not put a JS layer in between at all.

I think I saw something about this before.

This kind of splitting should probably be limited to modules that almost need no access to the native layer

Do you have any candidates in mind? I could take a look at extracting some and seeing if they can be vendored in easily.

Also, what about publishing the unmodified lib so it can be used/ported to other hosts with less effort? While it is possible to use this repo as a git submodule, that brings all the native dependencies. Of course I'm okay with doing that if publishing lib isn't feasible.

Thanks again for being so welcoming to contributions and new ideas.

@joyeecheung
Copy link
Member

joyeecheung commented Jun 2, 2025

Do you have any candidates in mind? I could take a look at extracting some and seeing if they can be vendored in easily.

I think the test runner would probably be a good candidate or I wish it was developed like undici and vendored-in, so that it can be shared by other hosts and its users do not have to lock themselves in a test runner that can only reliably run in Node.js. We also had some breakages from the test runner lately that probably would've been managed better if it had separate semver or there is a way for people to roll back to an older version from the registry.

Some relatively new Web API implementation might also be good candidates, like AbortController and EventTarget. I think EventEmitter might also be self-contained enough, but then given how much it's used pretty much everywhere since forever and its internals have been leaked into the user land and heavily depended upon, separating it out might risk unintentional breakage so I wouldn't be too comfortable with changing it too much. The same probably also goes to other very old APIs (say, predating Node.js 4 and having pre-ES6 code - meaning internals leaked everywhere due to Hyrum's Law).

In any case if they are to be converted to ESM I think actually developing an internal ESM loader implementation should be a pre-requisite, or else we cannot load them back.

Also, what about publishing the unmodified lib so it can be used/ported to other hosts with less effort? While it is possible to use this repo as a git submodule, that brings all the native dependencies.

I think we could publish them as part of the release process? For every release we already publish tarballs with all the source code needed to build node, and tarballs for all the headers. Shouldn't be too hard to publish another one that only have files that would be processed by js2c (note that lib alone is not all implementation JS we have, since there are also JS dependencies from deps, like undici). Though that would need input from @nodejs/release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@ljharb @jasnell @joyeecheung @james-pre and others