Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way of determining if an asset's filename contains a hash #9038

Closed
jeffposnick opened this issue Apr 12, 2019 · 15 comments · Fixed by #9687
Closed

Way of determining if an asset's filename contains a hash #9038

jeffposnick opened this issue Apr 12, 2019 · 15 comments · Fixed by #9687
Projects

Comments

@jeffposnick
Copy link

Feature request

What is the expected behavior?
I would like a way of examining each asset that's generated as part of a webpack compilation and determining whether the corresponding filename contains some flavor of hash (e.g. content hash, chunk hash, etc.—not sure how many possible hashes there are).

One way of accomplishing this might be a new boolean flag that's added to each asset, called hashInFilename or something similar.

Alternatively (less conveniently, but maybe more generally useful) it might be done by providing an property on each assets that reflects of the original filename template that was used to generate the final effective filename. For example, an asset whose final filename is 'app.abcd1234.js' might have a property filenameTemplate: 'app.[hash:8].js'. We could then infer whether a filename contains a hash or not by examining this value for one of the well-known hash substitution placeholders.

What is motivation or use case for adding/changing the behavior?
Adding a hash to a filename results in a content-addressable URL that can be safely cached indefinitely, without having to worry about adding in cache-busting URL parametres, or otherwise implementing a means of bypassing a cache.

I work on the workbox-webpack-plugin, and being able to determine whether a given asset will have a content-addressable URL will make it easier for us to generate a service worker that knows whether it's necessary to cache-bust a request for that URL.

How should this be implemented in your opinion?
Ideally, this would be some extra metadata that got unconditionally added in to each asset as part of the output of a compilation, as per the earlier suggestions.

Barring that, it would be possible to implement the same sort of thing if there were a new event that could be hooked into via a plugin, and exposed the filename template value, and the final effective name for each asset when filenames are assigned.

I am not very clear on where all the logic for this is done inside of webpack, but I'm assuming the https://github.com/webpack/webpack/blob/9ededfa92da493e750cf7d573873dcc17bb43af4/lib/MainTemplate.js and https://github.com/webpack/webpack/blob/8a7597aa6eb2eef66a8f9db3a0c49bcb96022a94/lib/TemplatedPathPlugin.js modules are closely related to this suggestion

Are you willing to work on this yourself?
yes

@alexander-akait
Copy link
Member

Yes, we have this problem, example we need regenerate hash in terser based on output file, but we can't detect change hash right. Feel free to send a PR 👍

@jeffposnick
Copy link
Author

Which of the proposed solutions would work best for the terser use case?

@alexander-akait
Copy link
Member

Store hash as property of asset in assets and render hash when in emit phase

@jeffposnick
Copy link
Author

Store hash as property of asset in assets and render hash when in emit phase

Apologies if I'm misunderstanding, but won't every asset have a hash property assigned to it if I took the steps that you suggest, irrespective of whether the filename uses that hash or not?

I'd like to know whether any flavor of hash (it looks like the possible types might be hash, chunkhash, modulehash, or contenthash) actually is ends up used to derive the final filename for an asset. Knowing the hash for a given asset doesn't seem sufficient to answer that question.

@alexander-akait
Copy link
Member

I mean hash is any of

const REGEXP_HASH = /\[hash(?::(\d+))?\]/gi,
REGEXP_CHUNKHASH = /\[chunkhash(?::(\d+))?\]/gi,
REGEXP_MODULEHASH = /\[modulehash(?::(\d+))?\]/gi,
REGEXP_CONTENTHASH = /\[contenthash(?::(\d+))?\]/gi,
.

What we need in terser (and any minimizer):

const source = asset.source();
// Something changes with `source`
asset.hash = webpack[PublicFunctionApiForRegenerateHash](source);
``

In you case you need just `const hash = asset.hash`. Right?

@jeffposnick
Copy link
Author

I need to know whether an asset's filename contains some sort of hash value, regardless of what kind of hash value it is. You could, I guess, stick in two different kinds of hashes into the same filename, like filename: 'app.[contenthash:8].[chunkhash:8].js' or whatever. If someone were to do that, the answer that I need to get back is yes, the filename contains a hash.

Are you suggesting that there should be a new property on each asset, hash, that's set to whatever hash value(s) end up being used to construct the filename, and will be set to null to indicate when the filename doesn't contain any hash values at all?

If that's what you're suggesting, what should the value of hash be when there are multiple hashes used, likefilename: 'app.[contenthash:8].[chunkhash:8].js'?

I don't want to minimize a use case that would benefit terser, but I just want to make sure that if implement something, it provides a clean solution to my use case as well.

@alexander-akait
Copy link
Member

alexander-akait commented Apr 12, 2019

Are you suggesting that there should be a new property on each asset, hash, that's set to whatever hash value(s) end up being used to construct the filename, and will be set to null to indicate when the filename doesn't contain any hash values at all?

Yes

If that's what you're suggesting, what should the value of hash be when there are multiple hashes used, likefilename: 'app.[contenthash:8].[chunkhash:8].js'?

hash: { contenthash: 'value', chunkhash: 'value' }

I think this solution will help you and me.

@jeffposnick
Copy link
Author

Gotcha. Last(?) question: can I use a property name on the asset other than hash? I think that naming confused me when you made you original suggestion. How about calling the new property name hashesUsedInFilename instead?

So for three hypothetical assets, here's how it could behave:

Asset 1
filename template is asset1.[hash].js and the asset's hash value is 'abcdef123456', that would lead to a final asset filename of asset1.abcdef123456.js and a hashesUsedInFilename: { hash: 'abcdef123456'}

Asset 2
filename template is asset2.[contenthash:4].[chunkhash:4].js and the asset's contenthash value is 'abcdef123456' and chunkhash value is '123456abcdef', that would lead to a final asset filename of asset2.abcd.1234.js and a hashesUsedInFilename: { contenthash: 'abcdef123456', chunkhash: '123456abcdef'}. (Note that I'm including the full contenthash/chunkhash values here, even though only the first 4 character are used.)

Asset 3
filename template is asset3.js. That would lead to a final asset filename of asset3.js (i.e. no replacements) and a hashesUsedInFilename: null.

For my use case, I could check asset.hashesUsedInFilename !== null as a way of determining whether or not the URL will be content-addressable.

@alexander-akait
Copy link
Member

How about calling the new property name hashesUsedInFilename instead?

Maybe better, i think name will be discussion in PR, it is not hard to change 😄

filename template is asset2.[contenthash:4].[chunkhash:4].js and the asset's contenthash value is 'abcdef123456' and chunkhash value is '123456abcdef', that would lead to a final asset filename of asset2.abcd.1234.js and a hashesUsedInFilename: { contenthash: 'abcdef123456', chunkhash: '123456abcdef'}. (Note that I'm including the full contenthash/chunkhash values here, even though only the first 4 character are used.)

Not sure, maybe we should include hash as is

filename template is asset3.js. That would lead to a final asset filename of asset3.js (i.e. no replacements) and a hashesUsedInFilename: null.

maybe undefined? No hash - no property.

Anyway const hasHash = Boolean(asset.hashesUsedInFilename) should works in all cases.

/cc @sokra need your advice

@jeffposnick
Copy link
Author

@developit pointed out that similar hash replacement logic also happens in https://github.com/webpack/loader-utils/blob/master/lib/interpolateName.js for assets that originate from a loader, so presumably I'll need to treat replacements that take place there similarly to ones that happen in

const replacePathVariables = (path, data) => {

If anyone from the webpack team is aware of other places in the codebase that are also responsible for replacing [*hash] placeholders in the filenames with actual hash values, could you let me know so that I can account for them as well?

@TheLarkInn
Copy link
Member

We did considerably alter hashing in webpack 5 so I think we should consider how this works there as well because I don't want to implement a v5 API that obsoltes v4's need also.

@jeffposnick Is the amount of cachebusting that large for devs running new builds every time currently?

@alexander-akait
Copy link
Member

@TheLarkInn problem not in memory, problem what we need detect/change existing hashes

@jeffposnick
Copy link
Author

👋 @TheLarkInn

I am definitely interesting in seeing something implemented that plays nicely with v5. If there are changes planned for v5 related to the [*hash*] filename replacement code path, then perhaps it makes sense to wait until those are finalized before attempting this new functionality.

As @evilebottnawi says, this is not related to optimizing resource consumption during the build process. It's more about needing some specific build metadata—specifically the hash values that contributed to an asset's final filename—exposed in a way that could be read by a plugin. Having that build metadata exposed would allow my workbox-webpack-plugin to make the right choice regarding whether to add in cache-busting to the service worker's runtime requests.

@alexander-akait
Copy link
Member

/cc @jeffposnick can you look on #9687

@jeffposnick
Copy link
Author

That looks great—thanks @sokra & co.!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

4 participants