Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] dedupe hoister levelQueue items for performance gains #6083

Merged
merged 3 commits into from
Jul 19, 2018
Merged

[WIP] dedupe hoister levelQueue items for performance gains #6083

merged 3 commits into from
Jul 19, 2018

Conversation

spudly
Copy link
Contributor

@spudly spudly commented Jul 10, 2018

Summary

I'm submitting this PR early so I can get as much feedback as possible. I am not at all familiar with the yarn codebase, so I'm not sure if I'm even making this change in the right place.

I have a rather large monorepo (~150 packages) and when running yarn install from yarn>=1.4.0, it just hangs during the "Linking Dependencies" step. For example, today I let it run for 3.5 hours before I finally gave in and killed it. For the past several months I've just used v1.3.2 as a temporary workaround. Today I spent several hours in the yarn code adding console.log statements all over the place and I think I've determined that this.levelQueue is growing exponentially at package-hoister.js:247. The same dependencies are being added to the levelQueue hundreds of times (if not thousands). My levelQueue had nearly three million items in it, and it takes a long time to iterate over all those dependencies, which made it look like yarn had just crashed.

I added a few lines of code to de-dupe the items being pushed onto the queue and that seems to solve my issue.

Because I'm not familiar with the codebase, I don't know what unintended effect this might have, but it seemed to solve all my issues and the node modules appear to have installed correctly (and rather quickly too!).

Please let me know if you have any concerns. I'll gladly finish out this pull request with tests, etc. but I want to make sure I'm going in the right direction first.

Test plan

I'll write some unit tests for this if it is an acceptable change.

@arcanis
Copy link
Member

arcanis commented Jul 12, 2018

That's weird, it doesn't really make sense to me - I don't see how ref.dependencies can contain multiple times the same dependency o_o Are you sure it fixes your issue?

@@ -243,8 +243,12 @@ export default class PackageHoister {
this.taintKey(key, info);

//
const pushed = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use a Set here

@spudly
Copy link
Contributor Author

spudly commented Jul 12, 2018

It does indeed fix my issue. Here is the output of the first few seconds of a yarn install (without my fix and with a console.log statement showing what depPattern values gets pushed): output.txt

As you can see, there are tons of dupes in there. This is only the first few seconds of output and it's already up to 210,403 items pushed. When I filter out duplicates, this drops to 1680.

@arcanis
Copy link
Member

arcanis commented Jul 12, 2018

Impressive, then. Can you share the dependencies in your package.json, so that I can reproduce the issue locally? It will help me figure out if we can easily add a test against this.

@spudly
Copy link
Contributor Author

spudly commented Jul 12, 2018

That's a little hard to do because I have a private monorepo with ~150 packages. That's a lot of dependencies. I'm going to spend some time trying to reduce it down to something smaller that I can share.

Stay tuned.

@spudly
Copy link
Contributor Author

spudly commented Jul 12, 2018

I think this issue is related to having multiple packages depending on the same 3rd party library. It seems to reprocess the 3rd party lib each time I depend on it, and appends it's dependencies to the same shared array each time (hence the dupes).

Based on that, here's an alternative solution to fix this issue. The dependencies array is defined in package-reference.js. This should probably be a Set instead of an array, so that we can ensure it has no dupes.

I proved this out by changing Line 83 of package-reference.js, from this:

this.dependencies = this.dependencies.concat(deps);

to this:

this.dependencies = [...new Set([...this.dependencies, ...deps])];

This fixes my issue by ensuring that when dependencies are added, they are also deduped. It should also be pretty easy to throw a unit test around addDependencies that ensures that it filters out duplicates.

Still working to get a small repo you can reproduce this on.

@spudly
Copy link
Contributor Author

spudly commented Jul 12, 2018

Here's a reproduction of the issue.

repro.zip

It actually looks like this issue is specific to peer dependencies. In my reproduction, I have a package called pkg-peer-dependency. This package is peer-depended on by another package, pkg-has-peer-dependency. I then have 6 other packages that have regular dependencies on pkg-peer-dependency and pkg-has-peer-dependency.

This reproduction does not cause yarn to hang for seemingly-forever because there are only a handful of packages, but by putting a console.log in a strategic location, you can verify that the dependency lists are growing rapidly and they end up with lots of dupes, which causes severe performance issues in large monorepos.

The following output was generated using the reproduction mentioned above, and by adding console.log(ref.name, 'dependencies:', ref.dependencies); in the _seed function from package-hoister.js. As you can see, there are lots of duplicates.

output
yarn install v1.9.0-0
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
workspace-aggregator-335394b6-e949-4a7b-907c-5ac249984f59 dependencies: [ 'pkg-a@1.0.0',
  'pkg-b@1.0.0',
  'pkg-c@1.0.0',
  'pkg-d@1.0.0',
  'pkg-e@1.0.0',
  'pkg-f@1.0.0',
  'pkg-has-peer-dependency@1.0.0',
  'pkg-peer-dependency@1.0.0' ]
pkg-a dependencies: [ 'pkg-peer-dependency@^1.0.0',
  'pkg-has-peer-dependency@^1.0.0' ]
pkg-b dependencies: [ 'pkg-peer-dependency@^1.0.0',
  'pkg-has-peer-dependency@^1.0.0' ]
pkg-c dependencies: [ 'pkg-peer-dependency@^1.0.0',
  'pkg-has-peer-dependency@^1.0.0' ]
pkg-d dependencies: [ 'pkg-peer-dependency@^1.0.0',
  'pkg-has-peer-dependency@^1.0.0' ]
pkg-e dependencies: [ 'pkg-peer-dependency@^1.0.0',
  'pkg-has-peer-dependency@^1.0.0' ]
pkg-f dependencies: [ 'pkg-peer-dependency@^1.0.0',
  'pkg-has-peer-dependency@^1.0.0' ]
pkg-peer-dependency dependencies: []
pkg-has-peer-dependency dependencies: [ 'pkg-peer-dependency@1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0' ]
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-has-peer-dependency dependencies: [ 'pkg-peer-dependency@1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0' ]
pkg-has-peer-dependency dependencies: [ 'pkg-peer-dependency@1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0' ]
pkg-has-peer-dependency dependencies: [ 'pkg-peer-dependency@1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0' ]
pkg-has-peer-dependency dependencies: [ 'pkg-peer-dependency@1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0' ]
pkg-has-peer-dependency dependencies: [ 'pkg-peer-dependency@1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0' ]
pkg-has-peer-dependency dependencies: [ 'pkg-peer-dependency@1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0',
  'pkg-peer-dependency@^1.0.0' ]
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
pkg-peer-dependency dependencies: []
[4/4] Building fresh packages...
success Saved lockfile.
Done in 0.29s.

@arcanis
Copy link
Member

arcanis commented Jul 14, 2018

Thanks a lot! I booked time to check your repro on monday, this sounds really great

@spudly
Copy link
Contributor Author

spudly commented Jul 19, 2018

@arcanis, had any time to review this yet?

@arcanis
Copy link
Member

arcanis commented Jul 19, 2018

Yes! I just finished my own testing - I didn't quite saw the freeze you witnessed (I would be super interested if you could share the package.json that caused it - possibly in private in you prefer, my email is in my profile), but I did see a slight improvement when coupling your fix with a feature I'm working on, which is great.

I think it's mergeable - it should ship in the 1.9.0, scheduled today.

@arcanis arcanis merged commit 3cce1bb into yarnpkg:master Jul 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants