Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change forEach to forEachLimit and purge cache to fix memory consumption issues #8609

Closed
wants to merge 8 commits into from

Conversation

Projects
None yet
6 participants
@dav-is
Copy link
Contributor

dav-is commented Jan 9, 2019

I did some memory profiling on a large app to find the issue. The app was crashing with memory usage of around 1.4GB.

The timelines are only from the "additional chunk assets processing (91%)" stage plus or minus some time. As a note, the profiling slows down the process considerably and I started the profiling manually close to the crash, so don't pay much attention to the lengths of time shown in the timelines.

Before applying this PR

Note that it allocates 550MB all at once (this is not the heap usage, only inside the profiling scope)

beforepr

When applying the forEachLimit(15) change

I noticed the same amount of memory being allocated, except now it was adding 34MB of memory usage every frame. Note that each time it is storing _cachedSource

usingeachlimit

When setting source._cachedSource = undefined with forEachLimit

The heap memory usage went down drastically and stopped running out of memory. The _cachedSource should be removed after a file is emitted.

afterpr

Performance Concerns

I ran time on two processes.

Before PR using max_old_space_size so node didn't run out of memory: 2m9s with 1.4GB peak memory usage
After PR using standard memory limts: 2m15s without running out of memory (other processes in the build use a lot of memory so the peak usage wouldn't be a useful metric here)

Accessing Private Variables

When I spoke to @sokra about this PR he had issue with the fact that I was using private variables and I can't depend on source being a CachedSource. This doesn't matter because I'm not accessing the variable, I'm just setting it to undefined. If the source isn't a CachedSource then it will have no effect and will not cause any errors.

The alternative of this is adding a method to CachedSource to purge the cache and checking that source is an instance of CachedSource and calling the method. There's no such thing as private variables in JavaScript and webpack doesn't use TypeScript, so I don't really see a reason to do this.

If anyone has another solution, I'd be open to suggestions.

The forEachLimit limit

I set the limit to 15 and it seems reasonable, but might need to be tweaked.

Thanks to @timneutkens for his help in diagnosing this issue with me.

@jsf-clabot

This comment has been minimized.

Copy link

jsf-clabot commented Jan 9, 2019

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

@webpack-bot

This comment has been minimized.

Copy link

webpack-bot commented Jan 9, 2019

For maintainers only:

  • This need to be documented (issue in webpack/webpack.js.org will be filed when merged)
@sokra

This comment has been minimized.

Copy link
Member

sokra commented Jan 9, 2019

You need to add the types in the declaration.d.ts file as we have no typings for neo-async

@webpack-bot webpack-bot added PR: CI-ok and removed PR: CI-not-ok labels Jan 9, 2019

@webpack-bot

This comment has been minimized.

Copy link

webpack-bot commented Jan 9, 2019

Thank you for your pull request! The most important CI builds succeeded, we’ll review the pull request soon.

@timneutkens

This comment has been minimized.

Copy link
Contributor

timneutkens commented Jan 9, 2019

From testing so far it seems like this doesn't solve our issue.

@sokra

sokra approved these changes Jan 11, 2019

@webpack-bot

This comment has been minimized.

Copy link

webpack-bot commented Jan 11, 2019

@dav-is Thanks for your update.

I labeled the Pull Request so reviewers will review it again.

@sokra Please review the new changes.

@dav-is

This comment has been minimized.

Copy link
Contributor Author

dav-is commented Jan 16, 2019

@sokra I added more details to the initial post

@dav-is dav-is changed the title Change forEach to forEachLimit to fix memory consumption issues Change forEach to forEachLimit and purge cache to fix memory consumption issues Jan 16, 2019

@@ -340,6 +341,7 @@ class Compiler extends Tapable {
source.existsAt = targetPath;
source.emitted = true;
this.outputFileSystem.writeFile(targetPath, content, callback);
source._cachedSource = undefined;

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

writeFile is async and potentially may fail; does it change how _cachedSource should be cleared? (for ex., only on successful write, or only after the async operation is complete)

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

If writeFile fails, wouldn't the whole process fail?

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

That's an assumption this piece of code should not make.

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

Also this code currently seems to assume that writeFile uses _cachedSource synchronously which may not be the case depending on writeFile implementation.

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

Say a single file's cache wasn't cleared, it would have the memory footprint of a single file (not a big deal).

Say a single file's cache was prematurely pruged, webpack knows how to repopulate the cache so it wouldn't lose the source (it's only a cache)

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

What should be clear is that the cache is never purged and piles up to a GB+. Removing it right after it's written is the obvious end of life for a cache element.

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

What makes you think that writeFile could be asynchronous? The forEachLimit is the async part

The fact that a callback is passed to it. Is the writeFile call guaranteed to finish all its work synchronously despite receiving a callback?

What should be clear is that the cache is never purged and piles up to a GB+. Removing it right after it's written is the obvious end of life for a cache element.

Yes, I get that. That's why I'm trying to clarify when this "after it's written" event happens. Is it guaranteed to happen synchronously within the writeFile call? If so, the code is correct.

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

It's getting pretty late here (2am) I'll take another look at what you're saying in the morning 👍

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

It's getting pretty late here (2am) I'll take another look at what you're saying in the morning 👍

Yes, it's 12am on my end. Thanks for addressing the memory issue and paying attention to my comments!

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

@sompylasar I looked into it and it makes sense to purge the cache before writeFile and I tested it and the fix still works. Thanks for your feedback 👍

Show resolved Hide resolved lib/Compiler.js Outdated

dav-is added some commits Jan 16, 2019

@sokra

This comment has been minimized.

Copy link
Member

sokra commented Jan 16, 2019

I can't easily take this change, as it would be a breaking change. While it probably improves memory/performance for you, it could affect other plugin negatively. Plugins could access the assets in the afterEmit/done hook. Purging the cache in emitAssets would make these plugins run the source generation again.

We need to put this behind an opt-in flag and enable it by default in the next major.

Instead of removing the _cachedSource it makes more sense (and is cleaner) to replace the source in assets with i. e. a SizeOnlySource (which only allows size() and throws on source()). This also has the benefit that the real Source can be GC'd and not only the cache of it. We should get rid of the existsAt and emitted hacks at the same time. I think a WeakMap<Source, { existsAt, sizeOnlySource }> plus a emittedAssets: Set<string> would work here. As removing existsAt and emitted is a breaking change, this should be moved to the next major.

@dav-is

This comment has been minimized.

Copy link
Contributor Author

dav-is commented Jan 17, 2019

Closing in favor of #8642

@dav-is dav-is closed this Jan 17, 2019

@dav-is dav-is deleted the dav-is:change-foreach-to-foreachlimit branch Jan 17, 2019

filipesilva added a commit to filipesilva/angular-cli that referenced this pull request Jan 22, 2019

filipesilva added a commit to filipesilva/angular-cli that referenced this pull request Jan 22, 2019

mgechev added a commit to angular/angular-cli that referenced this pull request Jan 22, 2019

timneutkens added a commit to zeit/next.js that referenced this pull request Jan 24, 2019

Kiku-git added a commit to Kiku-git/angular-cli that referenced this pull request Mar 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.