Add `lazyHashes` option to define integrity hashes only in direct parents of assets #172

MLoughry · 2021-12-15T05:32:39Z

Fixes Only include hashes for direct child chunks #171

Adds new option lazyHashes (name subject to change), which controls new behavior that creates a directed acyclic graph of the chunk graph by reducing cycles to strongly-connected components. The DAG is then topologically sorted such that the integrity hashes are calculated from the leaf nodes up, so the hash of the assets with child integrity hashes are not invalidated.

I did some minor refactoring in the process, but the only behavioral change for the default configuration was moving the hashes from __webpack_require__.sriHashes to ${globalObject}.sriHashes. In testing, I ran into issues where async chunks could not access __webpack_require__.

I still need to write tests, but the holidays are approaching, and I will be OOF for the last two weeks of the year. I wanted to make sure I got this out for the maintainers to look at before I disappeared for a while. I have verified that this does work in my group's project with a very large chunk graph (>1.5k chunks).

Still need to process assets in topo-sort order and replace placeholders in all chunks. Then clean up.

…ires

MLoughry · 2021-12-16T19:47:05Z

Tests added, should be ready for review.

jscheid

Hi @MLoughry, amazing work, thanks so much for this!

I've left a few comments with just minor tweaks to the documentation, and there are a few CI issues that need tending to.

I haven't yet had a chance to look in more depth at the actual code, I'll try and make time for that in the next few days.

webpack-subresource-integrity/README.md

examples/lazy-hashes-cycles/README.md

examples/lazy-hashes-simple/README.md

Co-authored-by: Julian Scheid <julians37@gmail.com>

MLoughry · 2021-12-20T03:38:31Z

Thanks. I think I resolved all the issues from the workflow, but I can't re-trigger it myself

jscheid · 2021-12-20T09:16:15Z

It shouldn't be too hard for you to run the tests locally in Node 12. I've just tested this, along the lines of:

git clone https://github.com/MLoughry/webpack-subresource-integrity.git
cd webpack-subresource-integrity
nvm use 12
yarn
yarn check # works
git checkout lazy-hashes
yarn check # doesn't

Let me know if you have any troubles with that process.

jscheid · 2021-12-20T09:52:39Z

I did some minor refactoring in the process, but the only behavioral change for the default configuration was moving the hashes from __webpack_require__.sriHashes to ${globalObject}.sriHashes. In testing, I ran into issues where async chunks could not access __webpack_require__.

Even if all tests still pass, I'm a bit uneasy about this change. Tobias Koppers had recommended to use __webpack_require__ here and here. If we have to make this change I'd prefer if we could get his blessings first.

Before I try to summon him, could you have another go at it and -- if there's no other workaround for the problem with async chunks that you've mentioned -- explain the problem in a bit more detail, perhaps with an example?

EDIT: it would also be nice if we didn't have to pollute the global object. And this would potentially open up a new attack vector, where an XSS might overwrite hashes. I haven't thought through the threat model fully yet but it doesn't feel good to expose this globally.

MLoughry · 2021-12-20T15:54:04Z

Before I try to summon him, could you have another go at it and -- if there's no other workaround for the problem with async chunks that you've mentioned -- explain the problem in a bit more detail, perhaps with an example?

Unless there's a way to introduce a runtime module in async chunks, I'm not sure how to access __webpack_require__ in the chunks when adding the new hashes. As you can see from the (prettier-ified) output of the simple lazy hashes example, the code adds the hashes in the global scope before the chunk modules:

Object.assign(self.sriHashes, {
  493: "sha384-vAyVFrkhZk9hl1bkA2pImobnoaEEQojEZ2uX7uG40DjfzjWid83WONnaDpp/IQq+",
}),
  (self.webpackChunklazy_hashes_simple =
    self.webpackChunklazy_hashes_simple || []).push([
    [574],
    {
      574: (s, e, h) => {
        "use strict";
        h.r(e), h.d(e, { default: () => a }), h.e(493).then(h.bind(h, 493));
        const a = { chunk: 1 };
      },
    },
  ]);

If there were a way to extend the webpackJsonpCallback to accept more arguments (and do what we want with the additional arguments), that could work; however, I don't see that there's a way to do that.

jscheid · 2021-12-20T18:59:45Z

It appears that you can move the Object.assign code into the closure that has access to __webpack_require__ by using the renderModuleContent hook instead of renderContent. I've tried this, tests still pass but I haven't made sure that it actually works at runtime (it would be good to add an end-to-end (puppeteer) test, perhaps similar to the dynamic-modified example.)

Here is a rough patch: https://gist.github.com/jscheid/b03aa1a5bf0421d8e2afdd040e4302ab

MLoughry · 2021-12-20T20:56:55Z

It appears that you can move the Object.assign code into the closure that has access to webpack_require by using the renderModuleContent hook instead of renderContent

Using renderModuleContent would render the sri hashes inside of every single module within a chunk, which is clearly not the desired result.

MLoughry · 2021-12-20T21:24:16Z

We definitely wouldn't want to naively do it in the first module, as I don't think there's any guarantee that it would be executed first. As far as I can tell, webpack doesn't expose information about a specific chunk's entry modules, only the compliation's entry modules (for when you define multiple entry points).

jscheid · 2021-12-20T21:28:23Z

True. Then what do you think about generating a function, maybe called something like addSriHashesForThisChunk, in renderContent and invoking it from renderModuleContent like so: addSriHashesForThisChunk(__webpack_require__)? That would compile down to something like a(b) with minification, which in turn would gzip well even if you have a large number of modules per chunk. There would be a small overhead at runtime but maybe tolerable?

jscheid · 2021-12-20T21:41:00Z

I've pinged @sokra , perhaps he has an idea of how we could do it with zero overhead.

MLoughry · 2021-12-20T21:41:32Z

That's still adding code to every module that would be merging the sri hash objects. Even if it were a marginal size hit, that's a lot of unnecessary processing.

jscheid · 2021-12-20T22:03:52Z

That's still adding code to every module that would be merging the sri hash objects.

You wouldn't have to merge the hash objects more than once, you'd use a guard boolean to only merge them if not done yet, or something like:

var sriHashes = {
  493: "sha384-wGv0P3xRYvSE7KBueqmOLlBkiUwMutW1CYZQYalt/kY8gdnfO0n4vm+szSHWX5iH",
};
function addSriHashes(r) {
  Object.assign(r.sriHashes, sriHashes);
  sriHashes = null;
}

A zero-overhead approach would be preferable, of course, but I'd estimate the size overhead to be a constant 10 bytes or so per chunk (post-gz), and the runtime overhead difficult to measure. For a data point, can you find out easily how many modules your product has per chunk, min/max/avg?

MLoughry · 2021-12-20T23:45:25Z

For a data point, can you find out easily how many modules your product has per chunk, min/max/avg?

From our latest CI build (Which is still on webpack 4, but we're migrating), across 2348 chunks, it ranges between 1 and 1504 modules, with an average of 30.3. For our core scenario chunks (the ones we bother naming and tracking the size of), the average is 178

jscheid · 2021-12-21T01:15:21Z

I think I've found a better approach... or at the very least a lead pointing in the right direction: https://gist.github.com/jscheid/30ea9da1f2538a69b997dc4afb8dee08

Can I leave you to explore it more @MLoughry? Hopefully we'll hear from @sokra as well with some guidance on how to do this properly and future-proof.

jscheid · 2021-12-21T07:27:46Z

@MLoughry I've had a chat with Tobias on Slack and he says the RuntimeModule-based approach is looking good. Based on his advice I've updated the Gist above to use additionalChunkRuntimeRequirements. I think it's an elegant solution now that ticks all the boxes, I hope you agree?

If so, I think the next steps should be the following:

Integrate and polish the RuntimeModule approach, make sure it all works... it might be necessary to postpone the call to generateSriHashPlaceholders (?) and the new class should probably live elsewhere.
There are a few eslint warnings about Forbidden non-null assertion, they are only warnings but could you have a look to see if we can do without the !s?
Add integration tests, we should have at least one E2E test to make sure all chunks load successfully with lazyHashes set and ideally also one to make sure that tampering will cause loading to fail (i.e. ensure that SRI is added.) Let me know if I can help with this, I noticed the wsi-test-helper doesn't have any documentation yet but hopefully is fairly self explanatory when looking at one of the other similar examples.

What do you think, sound good?

MLoughry · 2022-01-03T16:41:17Z

Sorry, I unplugged over the holidays. I'll take a look at this today.

MLoughry · 2022-01-03T20:03:12Z

I made the changes, and confirmed they work in our repo. However, while it is decreasing the size of the entry bundles, it's massively increasing the size of child bundles (more than off-setting any perf gain). I'll need to look more into this.

MLoughry · 2022-01-03T20:20:16Z

Digging into it, the issue is that the sri hashes are duplicated across every chunk within a chunk group. I can't create the graph by chunkGroup, since a given chunk can be in multiple chunk groups, but I need to try to de-dupe hash entries when a different chunk in the same chunk group already has them.

jscheid · 2022-01-03T20:54:30Z

Welcome back @MLoughry, hope you've had a nice break.

Sounds good, that's not ideal of course. Is it caused by the new RuntimeModule approach or was it an issue before?

MLoughry · 2022-01-03T21:58:09Z

It's not related to the runtime module approach, I just hadn't noticed before.

My initial local changes to mitigate this is helping a little, but not nearly enough. I need to noodle on this some more.

jscheid · 2022-01-03T22:27:59Z

Even if the issue at hand might be just a bug or missing optimization, as opposed to a fundamental problem, it makes me wonder if it would make sense to have not two, but three modes:

Fully eager: full up-front cost but zero overhead (current behaviour)
Fully lazy: defer loading of hashes to reduce up-front cost as much as possible, at the cost of overhead caused by duplication of hashes and extra runtime code (your current proposal)
Lazy hoisted: move all hashes as far up the dependency tree as necessary to avoid duplication, but no further. You'd still have a small amount of overhead due to the extra runtime code but this might actually be a good new default?

(I'm not suggesting this would need to go into this PR, just thinking out loud.)

Also, it occurred to me that Object.assign(__webpack_require__.sriHashes, {...}) should probably be something like __webpack_require__.sri({...}) in the interest of reducing runtime module overhead as much as possible, where sri would be a helper function defined once in entry chunks.

MLoughry · 2022-01-04T00:18:23Z

My latest change completely eliminates duplicate hashes within a chunk group, which brings down the size increase to our async chunks to a tolerable level. However, I want to dig in some more, as I'm seeing a lot of hashes duplicated between our entry chunk (mailindex) and the first async chunk (MailBoot) that shouldn't be happening absent a bug, unless I'm missing something.

MLoughry · 2022-01-04T17:59:57Z

Realized the code wasn't taking into account whether a hash already existed in all parent branches to a given chunk, so the hash was being inserted for a child chunk in every chunk group that had that child chunk.

Retooling the code to account for this now.

Probably need to improve the speed/memory perf of the algorithm

MLoughry · 2022-01-04T22:40:26Z

Latest commit significantly improves the result (in our case, -100kB from entry chunk, now only +5kB to primary async chunk group), and I've verified it works.

Two remaining work items:

Improve the perf, particularly memory perf, of these changes. I had to increase our webpack build's memory allotment to get it not to crash with these changes (16GB -> 18GB)
See if I can better name the runtime module, especially per-chunk, so our bundle size report can differentiate the different modules

jscheid · 2022-01-04T22:51:08Z

Amazing, that's all very exciting!

I'm conscious of not further expanding the (already sizeable) scope of this change, but could I ask you a favor? As you continue working on it, could you keep an eye out to see how the idea of "lazy hoisted" hashes could best be implemented? I'm not asking you to implement it of course (unless it was trivial), just to keep it in the back of your mind so that later on you could perhaps point me in the right direction. I'm thinking it would make for a better default than what we have currently.

MLoughry · 2022-01-04T22:57:27Z

My current changes already have a Chunk -> (Chunks to include in this chunk's manifest) map. It would be trivial to invert that map, find any chunks which have their hash in more than one chunk, then walk up the existing graph to find a common parent (or root node(s))

jscheid · 2022-01-04T23:00:58Z

I guess I was wondering how it could best be implemented with least performance impact, but if you think that's going to be obvious then that's great news.

MLoughry · 2022-01-05T03:10:56Z

I think I'm happy with the code, pending your review. There are still a few outstanding ESLint warnings about the null-assertion operator (!), but I feel adding unnecessary null-checks there would just muddy up the code.

jscheid

Looks great, thanks again!

I've added a few suggestions with the aim to remove all uses of the ! postfix operator, with only small impact on readability, or in some cases arguably an improvement in readability.

In the meantime I'll start working on adding one or two end-to-end (Puppeteer) tests.

webpack-subresource-integrity/util.ts

webpack-subresource-integrity/etc/webpack-subresource-integrity.api.md

webpack-subresource-integrity/util.ts

webpack-subresource-integrity/index.ts

webpack-subresource-integrity/util.ts

jscheid

Thanks @MLoughry, all looks great! I'm going to merge this now, but before cutting a new release I'm planning to follow up with the following changes:

add e2e tests
user feedback when hashLoading option isn't a supported value

MLoughry added 9 commits December 14, 2021 18:24

WIP

497315c

Still need to process assets in topo-sort order and replace placeholders in all chunks. Then clean up.

Use Node 12 tsconfig base, as the "engine" field in package.json requ…

69f2823

…ires

Run prettier

d0b7524

Refactoring

9994a00

Finish implementation of lazy hashes

3a1eb85

Update API

ac8704e

Run prettier

8f869a5

Fix lint errors aside from non-null assertions

6e59535

Update README

5d60ed9

MLoughry marked this pull request as draft December 15, 2021 05:33

MLoughry mentioned this pull request Dec 15, 2021

Only include hashes for direct child chunks #171

Closed

MLoughry added 4 commits December 16, 2021 17:26

Fix examples test cases

0c75fd4

Add simple test case for lazy hashes

4d16734

Add test case for chunk with multiple parents

d2a739b

Add chunk cycle test

9bc8bef

MLoughry marked this pull request as ready for review December 16, 2021 19:46

jscheid reviewed Dec 20, 2021

View reviewed changes

webpack-subresource-integrity/README.md Outdated Show resolved Hide resolved

examples/lazy-hashes-cycles/README.md Outdated Show resolved Hide resolved

examples/lazy-hashes-simple/README.md Outdated Show resolved Hide resolved

MLoughry and others added 5 commits December 19, 2021 18:26

Update webpack-subresource-integrity/README.md

1c2f986

Co-authored-by: Julian Scheid <julians37@gmail.com>

Update examples/lazy-hashes-cycles/README.md

945f4cc

Co-authored-by: Julian Scheid <julians37@gmail.com>

Update examples/lazy-hashes-simple/README.md

c4a7aa0

Co-authored-by: Julian Scheid <julians37@gmail.com>

Fix lint errors

4f83459

Replace usage of optional chaning paramerter that broke test in Node 12

756f865

Apply optional chaining fix to *all* tests

09610ce

Use new runtime module approach

d3bc945

Eliminate duplication of hashes within a chunk group

635ac36

Don't duplicate hashes already recorded by all parent trees

fbefab5

Probably need to improve the speed/memory perf of the algorithm

Improve name of runtime module

e162466

Improve memory perf by using iterator helpers instead of array spreads

aff3738

jscheid reviewed Jan 6, 2022

View reviewed changes

Address feedback

b362724

jscheid approved these changes Jan 9, 2022

View reviewed changes

jscheid merged commit fc7cf19 into waysact:main Jan 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `lazyHashes` option to define integrity hashes only in direct parents of assets #172

Add `lazyHashes` option to define integrity hashes only in direct parents of assets #172

MLoughry commented Dec 15, 2021

MLoughry commented Dec 16, 2021

jscheid left a comment

MLoughry commented Dec 20, 2021

jscheid commented Dec 20, 2021 •

edited

Loading

jscheid commented Dec 20, 2021 •

edited

Loading

MLoughry commented Dec 20, 2021

jscheid commented Dec 20, 2021

MLoughry commented Dec 20, 2021

MLoughry commented Dec 20, 2021

jscheid commented Dec 20, 2021

jscheid commented Dec 20, 2021

MLoughry commented Dec 20, 2021

jscheid commented Dec 20, 2021

MLoughry commented Dec 20, 2021

jscheid commented Dec 21, 2021

jscheid commented Dec 21, 2021

MLoughry commented Jan 3, 2022

MLoughry commented Jan 3, 2022

MLoughry commented Jan 3, 2022

jscheid commented Jan 3, 2022

MLoughry commented Jan 3, 2022

jscheid commented Jan 3, 2022 •

edited

Loading

MLoughry commented Jan 4, 2022

MLoughry commented Jan 4, 2022

MLoughry commented Jan 4, 2022

jscheid commented Jan 4, 2022

MLoughry commented Jan 4, 2022

jscheid commented Jan 4, 2022

MLoughry commented Jan 5, 2022 •

edited

Loading

jscheid left a comment

jscheid left a comment

Add lazyHashes option to define integrity hashes only in direct parents of assets #172

Add lazyHashes option to define integrity hashes only in direct parents of assets #172

Conversation

MLoughry commented Dec 15, 2021

MLoughry commented Dec 16, 2021

jscheid left a comment

Choose a reason for hiding this comment

MLoughry commented Dec 20, 2021

jscheid commented Dec 20, 2021 • edited Loading

jscheid commented Dec 20, 2021 • edited Loading

MLoughry commented Dec 20, 2021

jscheid commented Dec 20, 2021

MLoughry commented Dec 20, 2021

MLoughry commented Dec 20, 2021

jscheid commented Dec 20, 2021

jscheid commented Dec 20, 2021

MLoughry commented Dec 20, 2021

jscheid commented Dec 20, 2021

MLoughry commented Dec 20, 2021

jscheid commented Dec 21, 2021

jscheid commented Dec 21, 2021

MLoughry commented Jan 3, 2022

MLoughry commented Jan 3, 2022

MLoughry commented Jan 3, 2022

jscheid commented Jan 3, 2022

MLoughry commented Jan 3, 2022

jscheid commented Jan 3, 2022 • edited Loading

MLoughry commented Jan 4, 2022

MLoughry commented Jan 4, 2022

MLoughry commented Jan 4, 2022

jscheid commented Jan 4, 2022

MLoughry commented Jan 4, 2022

jscheid commented Jan 4, 2022

MLoughry commented Jan 5, 2022 • edited Loading

jscheid left a comment

Choose a reason for hiding this comment

jscheid left a comment

Choose a reason for hiding this comment

Add `lazyHashes` option to define integrity hashes only in direct parents of assets #172

Add `lazyHashes` option to define integrity hashes only in direct parents of assets #172

jscheid commented Dec 20, 2021 •

edited

Loading

jscheid commented Dec 20, 2021 •

edited

Loading

jscheid commented Jan 3, 2022 •

edited

Loading

MLoughry commented Jan 5, 2022 •

edited

Loading