Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change forEach to forEachLimit and purge cache to fix memory consumption issues #8609

Closed
wants to merge 8 commits into from
@@ -77,6 +77,13 @@ declare module "neo-async" {
callback?: ErrorCallback<E>
): void;

export function eachLimit<T, E>(
arr: IterableCollection<T>,
limit: number,
iterator: AsyncIterator<T, E>,
callback?: ErrorCallback<E>
): void;

export function map<T, R, E>(
arr: T[] | IterableIterator<T>,
iterator: AsyncResultIterator<T, R, E>,
@@ -98,6 +105,8 @@ declare module "neo-async" {
): void;

export const forEach: typeof each;

export const forEachLimit: typeof eachLimit;
}

// There are no typings for @webassemblyjs/ast
@@ -312,8 +312,9 @@ class Compiler extends Tapable {
const emitFiles = err => {
if (err) return callback(err);

asyncLib.forEach(
asyncLib.forEachLimit(
compilation.assets,
15,
(source, file, callback) => {
let targetFile = file;
const queryStringIdx = targetFile.indexOf("?");
@@ -340,6 +341,7 @@ class Compiler extends Tapable {
source.existsAt = targetPath;
source.emitted = true;
this.outputFileSystem.writeFile(targetPath, content, callback);
source._cachedSource = undefined;

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

writeFile is async and potentially may fail; does it change how _cachedSource should be cleared? (for ex., only on successful write, or only after the async operation is complete)

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

If writeFile fails, wouldn't the whole process fail?

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

That's an assumption this piece of code should not make.

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

Also this code currently seems to assume that writeFile uses _cachedSource synchronously which may not be the case depending on writeFile implementation.

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

Say a single file's cache wasn't cleared, it would have the memory footprint of a single file (not a big deal).

Say a single file's cache was prematurely pruged, webpack knows how to repopulate the cache so it wouldn't lose the source (it's only a cache)

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

Otherwise, if writeFile does not in fact rely on _cachedSource, there has to be a reason to put this clearing below the writeFile call and not above where other changes to source state are made.

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

Repopulating the cache may also be expensive. There has to be a clear lifecycle for the cache, otherwise it looks like blind troubleshooting that works by coincidence, not by design. I'm not saying this solution is wrong, but it's not clear why it's right either.

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

What makes you think that writeFile could be asynchronous? The forEachLimit is the async part

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

What should be clear is that the cache is never purged and piles up to a GB+. Removing it right after it's written is the obvious end of life for a cache element.

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

What makes you think that writeFile could be asynchronous? The forEachLimit is the async part

The fact that a callback is passed to it. Is the writeFile call guaranteed to finish all its work synchronously despite receiving a callback?

What should be clear is that the cache is never purged and piles up to a GB+. Removing it right after it's written is the obvious end of life for a cache element.

Yes, I get that. That's why I'm trying to clarify when this "after it's written" event happens. Is it guaranteed to happen synchronously within the writeFile call? If so, the code is correct.

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

It's getting pretty late here (2am) I'll take another look at what you're saying in the morning 👍

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

It's getting pretty late here (2am) I'll take another look at what you're saying in the morning 👍

Yes, it's 12am on my end. Thanks for addressing the memory issue and paying attention to my comments!

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

@sompylasar I looked into it and it makes sense to purge the cache before writeFile and I tested it and the fix still works. Thanks for your feedback 👍

This conversation was marked as resolved by dav-is

This comment has been minimized.

Copy link
@sompylasar

sompylasar Jan 16, 2019

Assigning _cachedSource to any type of source may change its hidden class if it's not a CachedSource, so there has to be at least a check if the property was there before clearing it. Also, is the initial value to which this code resets undefined, or is it null?

This comment has been minimized.

Copy link
@dav-is

dav-is Jan 16, 2019

Author Contributor

That's a good point 👍

The initial value is undefined: https://github.com/webpack/webpack-sources/blob/master/lib/CachedSource.js#L13

};

if (targetFile.match(/\/|\\/)) {
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.