Description
- I'd be willing to implement this feature (contributing guide)
- This feature is important to have in this repository; a contrib plugin wouldn't do
Describe the user story
Slow node_modules linker.
Describe the solution you'd like
So I've found orogene package mananger, which uses interesting technic.
https://github.com/orogene/orogene/blob/2dc8d9e9d32b9dcc8e8a33e8a729c2c08772c33f/crates/nassun/src/tarball.rs#L443
So when unarchiving tar, it stores files first to cache dir. It's treaded as immutable.
Then it's "cloned" to real node_modules dir to project via "reflinks", which works on COW file systems (including APFS)
TLDR idea is it's creating a new reference to existing blocks instead of doing data write.
I've run some experimentes in macos, m1 mac.
5 gb node_modules dir
cp -r
took
5 minutes
go app which do clonefile
syscall took
17s
code:
import (
"golang.org/x/sys/unix"
)
func main() {
dir := "/Users/vadym/github/rpcpoc/node_modules"
err := unix.Clonefile(dir, "/Users/vadym/github/rpcpoc/node_modules6", 0)
if err != nil {
log.Fatalf("Failed to clone file: %v, path: %s", err, dir)
}
return
}
So, benefits compared to existing NM linker:
- much faster (if cache exists)
- smaller space usage.
Describe the drawbacks of your solution
- Requires NAPI or other native helper which can do syscalls.
Node exposes copyFile syscall, but it does not work for dirs
const fs= require('fs');
fs.copyFile('/Users/vadym/github/rpcpoc/node_modules', '/Users/vadym/github/rpcpoc/node_modules10', fs.constants.COPYFILE_FICLONE_FORCE, console.log)
[Error: ENOSYS: function not implemented, copyfile '
- does not improve first unarchiving, only incremental (cache exists in fs). But it improves duplicates (if hoisting not solved it).
Describe alternatives you've considered
Fuse.
Linux support is great, macos making fskit public this in 15.4.
This is alternative track, which dramatically increases speed and improves disk usage.