-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lazyHashes
option to define integrity hashes only in direct parents of assets
#172
Conversation
Tests added, should be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MLoughry, amazing work, thanks so much for this!
I've left a few comments with just minor tweaks to the documentation, and there are a few CI issues that need tending to.
I haven't yet had a chance to look in more depth at the actual code, I'll try and make time for that in the next few days.
Co-authored-by: Julian Scheid <julians37@gmail.com>
Co-authored-by: Julian Scheid <julians37@gmail.com>
Co-authored-by: Julian Scheid <julians37@gmail.com>
Thanks. I think I resolved all the issues from the workflow, but I can't re-trigger it myself |
It shouldn't be too hard for you to run the tests locally in Node 12. I've just tested this, along the lines of: git clone https://github.com/MLoughry/webpack-subresource-integrity.git
cd webpack-subresource-integrity
nvm use 12
yarn
yarn check # works
git checkout lazy-hashes
yarn check # doesn't Let me know if you have any troubles with that process. |
Even if all tests still pass, I'm a bit uneasy about this change. Tobias Koppers had recommended to use Before I try to summon him, could you have another go at it and -- if there's no other workaround for the problem with async chunks that you've mentioned -- explain the problem in a bit more detail, perhaps with an example? EDIT: it would also be nice if we didn't have to pollute the global object. And this would potentially open up a new attack vector, where an XSS might overwrite hashes. I haven't thought through the threat model fully yet but it doesn't feel good to expose this globally. |
Unless there's a way to introduce a runtime module in async chunks, I'm not sure how to access Object.assign(self.sriHashes, {
493: "sha384-vAyVFrkhZk9hl1bkA2pImobnoaEEQojEZ2uX7uG40DjfzjWid83WONnaDpp/IQq+",
}),
(self.webpackChunklazy_hashes_simple =
self.webpackChunklazy_hashes_simple || []).push([
[574],
{
574: (s, e, h) => {
"use strict";
h.r(e), h.d(e, { default: () => a }), h.e(493).then(h.bind(h, 493));
const a = { chunk: 1 };
},
},
]); If there were a way to extend the |
It appears that you can move the Here is a rough patch: https://gist.github.com/jscheid/b03aa1a5bf0421d8e2afdd040e4302ab |
Using |
We definitely wouldn't want to naively do it in the first module, as I don't think there's any guarantee that it would be executed first. As far as I can tell, webpack doesn't expose information about a specific chunk's entry modules, only the compliation's entry modules (for when you define multiple entry points). |
True. Then what do you think about generating a function, maybe called something like |
I've pinged @sokra , perhaps he has an idea of how we could do it with zero overhead. |
That's still adding code to every module that would be merging the sri hash objects. Even if it were a marginal size hit, that's a lot of unnecessary processing. |
You wouldn't have to merge the hash objects more than once, you'd use a guard boolean to only merge them if not done yet, or something like: var sriHashes = {
493: "sha384-wGv0P3xRYvSE7KBueqmOLlBkiUwMutW1CYZQYalt/kY8gdnfO0n4vm+szSHWX5iH",
};
function addSriHashes(r) {
Object.assign(r.sriHashes, sriHashes);
sriHashes = null;
} A zero-overhead approach would be preferable, of course, but I'd estimate the size overhead to be a constant 10 bytes or so per chunk (post-gz), and the runtime overhead difficult to measure. For a data point, can you find out easily how many modules your product has per chunk, min/max/avg? |
From our latest CI build (Which is still on webpack 4, but we're migrating), across 2348 chunks, it ranges between 1 and 1504 modules, with an average of 30.3. For our core scenario chunks (the ones we bother naming and tracking the size of), the average is 178 |
I think I've found a better approach... or at the very least a lead pointing in the right direction: https://gist.github.com/jscheid/30ea9da1f2538a69b997dc4afb8dee08 Can I leave you to explore it more @MLoughry? Hopefully we'll hear from @sokra as well with some guidance on how to do this properly and future-proof. |
@MLoughry I've had a chat with Tobias on Slack and he says the If so, I think the next steps should be the following:
What do you think, sound good? |
Sorry, I unplugged over the holidays. I'll take a look at this today. |
I made the changes, and confirmed they work in our repo. However, while it is decreasing the size of the entry bundles, it's massively increasing the size of child bundles (more than off-setting any perf gain). I'll need to look more into this. |
Digging into it, the issue is that the sri hashes are duplicated across every chunk within a chunk group. I can't create the graph by chunkGroup, since a given chunk can be in multiple chunk groups, but I need to try to de-dupe hash entries when a different chunk in the same chunk group already has them. |
Welcome back @MLoughry, hope you've had a nice break. Sounds good, that's not ideal of course. Is it caused by the new |
It's not related to the runtime module approach, I just hadn't noticed before. My initial local changes to mitigate this is helping a little, but not nearly enough. I need to noodle on this some more. |
Even if the issue at hand might be just a bug or missing optimization, as opposed to a fundamental problem, it makes me wonder if it would make sense to have not two, but three modes:
(I'm not suggesting this would need to go into this PR, just thinking out loud.) Also, it occurred to me that |
My latest change completely eliminates duplicate hashes within a chunk group, which brings down the size increase to our async chunks to a tolerable level. However, I want to dig in some more, as I'm seeing a lot of hashes duplicated between our entry chunk (mailindex) and the first async chunk (MailBoot) that shouldn't be happening absent a bug, unless I'm missing something. |
Realized the code wasn't taking into account whether a hash already existed in all parent branches to a given chunk, so the hash was being inserted for a child chunk in every chunk group that had that child chunk. Retooling the code to account for this now. |
Probably need to improve the speed/memory perf of the algorithm
Latest commit significantly improves the result (in our case, -100kB from entry chunk, now only +5kB to primary async chunk group), and I've verified it works. Two remaining work items:
|
Amazing, that's all very exciting! I'm conscious of not further expanding the (already sizeable) scope of this change, but could I ask you a favor? As you continue working on it, could you keep an eye out to see how the idea of "lazy hoisted" hashes could best be implemented? I'm not asking you to implement it of course (unless it was trivial), just to keep it in the back of your mind so that later on you could perhaps point me in the right direction. I'm thinking it would make for a better default than what we have currently. |
My current changes already have a |
I guess I was wondering how it could best be implemented with least performance impact, but if you think that's going to be obvious then that's great news. |
I think I'm happy with the code, pending your review. There are still a few outstanding ESLint warnings about the null-assertion operator ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks again!
I've added a few suggestions with the aim to remove all uses of the !
postfix operator, with only small impact on readability, or in some cases arguably an improvement in readability.
In the meantime I'll start working on adding one or two end-to-end (Puppeteer) tests.
webpack-subresource-integrity/etc/webpack-subresource-integrity.api.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MLoughry, all looks great! I'm going to merge this now, but before cutting a new release I'm planning to follow up with the following changes:
- add e2e tests
- user feedback when
hashLoading
option isn't a supported value
Adds new option
lazyHashes
(name subject to change), which controls new behavior that creates a directed acyclic graph of the chunk graph by reducing cycles to strongly-connected components. The DAG is then topologically sorted such that the integrity hashes are calculated from the leaf nodes up, so the hash of the assets with child integrity hashes are not invalidated.I did some minor refactoring in the process, but the only behavioral change for the default configuration was moving the hashes from
__webpack_require__.sriHashes
to${globalObject}.sriHashes
. In testing, I ran into issues where async chunks could not access__webpack_require__
.I still need to write tests, but the holidays are approaching, and I will be OOF for the last two weeks of the year. I wanted to make sure I got this out for the maintainers to look at before I disappeared for a while. I have verified that this does work in my group's project with a very large chunk graph (>1.5k chunks).