Skip to content

[Feature] Reflink NM linker #6726

Open
Open
@goloveychuk

Description

@goloveychuk
  • I'd be willing to implement this feature (contributing guide)
  • This feature is important to have in this repository; a contrib plugin wouldn't do

Describe the user story

Slow node_modules linker.

Describe the solution you'd like

So I've found orogene package mananger, which uses interesting technic.
https://github.com/orogene/orogene/blob/2dc8d9e9d32b9dcc8e8a33e8a729c2c08772c33f/crates/nassun/src/tarball.rs#L443

So when unarchiving tar, it stores files first to cache dir. It's treaded as immutable.
Then it's "cloned" to real node_modules dir to project via "reflinks", which works on COW file systems (including APFS)
TLDR idea is it's creating a new reference to existing blocks instead of doing data write.

I've run some experimentes in macos, m1 mac.
5 gb node_modules dir
cp -r took
5 minutes
go app which do clonefile syscall took
17s

code:

import (
	"golang.org/x/sys/unix"
)

func main() {
	dir := "/Users/vadym/github/rpcpoc/node_modules"
	err := unix.Clonefile(dir, "/Users/vadym/github/rpcpoc/node_modules6", 0)
	if err != nil {
		log.Fatalf("Failed to clone file: %v, path: %s", err, dir)
	}
	return
}

So, benefits compared to existing NM linker:

  1. much faster (if cache exists)
  2. smaller space usage.

Describe the drawbacks of your solution

  1. Requires NAPI or other native helper which can do syscalls.
    Node exposes copyFile syscall, but it does not work for dirs
const fs= require('fs');

fs.copyFile('/Users/vadym/github/rpcpoc/node_modules', '/Users/vadym/github/rpcpoc/node_modules10', fs.constants.COPYFILE_FICLONE_FORCE, console.log)

[Error: ENOSYS: function not implemented, copyfile '

  1. does not improve first unarchiving, only incremental (cache exists in fs). But it improves duplicates (if hoisting not solved it).

Describe alternatives you've considered

Fuse.
Linux support is great, macos making fskit public this in 15.4.
This is alternative track, which dramatically increases speed and improves disk usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions