Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Reflink NM linker #6726

Open
1 of 2 tasks
goloveychuk opened this issue Mar 15, 2025 · 1 comment
Open
1 of 2 tasks

[Feature] Reflink NM linker #6726

goloveychuk opened this issue Mar 15, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@goloveychuk
Copy link
Contributor

goloveychuk commented Mar 15, 2025

  • I'd be willing to implement this feature (contributing guide)
  • This feature is important to have in this repository; a contrib plugin wouldn't do

Describe the user story

Slow node_modules linker.

Describe the solution you'd like

So I've found orogene package mananger, which uses interesting technic.
https://github.com/orogene/orogene/blob/2dc8d9e9d32b9dcc8e8a33e8a729c2c08772c33f/crates/nassun/src/tarball.rs#L443

So when unarchiving tar, it stores files first to cache dir. It's treaded as immutable.
Then it's "cloned" to real node_modules dir to project via "reflinks", which works on COW file systems (including APFS)
TLDR idea is it's creating a new reference to existing blocks instead of doing data write.

I've run some experimentes in macos, m1 mac.
5 gb node_modules dir
cp -r took
5 minutes
go app which do clonefile syscall took
17s

code:

import (
	"golang.org/x/sys/unix"
)

func main() {
	dir := "/Users/vadym/github/rpcpoc/node_modules"
	err := unix.Clonefile(dir, "/Users/vadym/github/rpcpoc/node_modules6", 0)
	if err != nil {
		log.Fatalf("Failed to clone file: %v, path: %s", err, dir)
	}
	return
}

So, benefits compared to existing NM linker:

  1. much faster (if cache exists)
  2. smaller space usage.

Describe the drawbacks of your solution

  1. Requires NAPI or other native helper which can do syscalls.
    Node exposes copyFile syscall, but it does not work for dirs
const fs= require('fs');

fs.copyFile('/Users/vadym/github/rpcpoc/node_modules', '/Users/vadym/github/rpcpoc/node_modules10', fs.constants.COPYFILE_FICLONE_FORCE, console.log)

[Error: ENOSYS: function not implemented, copyfile '

  1. does not improve first unarchiving, only incremental (cache exists in fs). But it improves duplicates (if hoisting not solved it).

Describe alternatives you've considered

Fuse.
Linux support is great, macos making fskit public this in 15.4.
This is alternative track, which dramatically increases speed and improves disk usage.

@goloveychuk goloveychuk added the enhancement New feature or request label Mar 15, 2025
@goloveychuk
Copy link
Contributor Author

Tagging @arcanis because this is design decision. I can write poc linker and run benchmarks.
It could be external plugin, but it will share alot of logic with existing nm linker, which should be abstracted and exported in this case.
Native helper could be used accelerating other things. E.g downloads and tar.gz->zip transforms.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant