Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for long-term caching based on generated content, not source files for chunkhash #7787

Open
agibralter opened this issue Jul 25, 2018 · 5 comments

Comments

@agibralter
Copy link

@agibralter agibralter commented Jul 25, 2018

Feature request

Please natively support output chunk hashing that is based on chunks output rather than their input (i.e., output.filename = "[name]-[chunkhash].js").

Why? you may ask? Because hashing is mainly used for production asset caching. Right now, there are at least three ways the content of a chunk can change, and only one of those causes the [chunkhash] value to change.

When you deploy a new version of your code, you want browsers not to use old, stale assets. In theory, hashes should be based on the content of the files, such that when the files changes, the hash changes, and you can leverage the asset manifest to write out a new <script> or <link> tag with the new hash. And obviously, the advantage of [chunkhash] vs [hash] is that if you make a change that only changes a single chunk, you do not "invalidate" the cache of unchanged chunks, thus improving the performance for end users who have already downloaded unchanged chunks.

Going back to those three ways a chunk's content can change:

  1. You make a change to your entrypoint or its dependencies.
  2. You make a change to the webpack config (e.g. adding/removing/changing a plugin/loader).
  3. You upgrade a loader/plugin version.

Right now, only 1 is supported, which leaves a pretty glaring hole. You may, say, add source maps (2) only to discover that your CDN is still serving a stale version of your code without source maps because the [chunkhash] was not updated.

It seems like tools like https://github.com/erm0l0v/webpack-md5-hash may address this, but this seems like a pretty huge flaw in the expected behavior out-of-the-box.

What is the expected behavior?

The expected behavior is that when the content of a chunk changes, the hash for that chunk should change too.

What is motivation or use case for adding/changing the behavior?

As explained above, the motivation is the principle of least surprise. Right now, it's surprising that changing configuration, which may have profound effects on the output, silently slips by as an output asset with the same name as a stale version of the asset.

How should this be implemented in your opinion?

I think there are a few options:

  1. Reimplement [chunkhash], though I seem to remember reading issues about challenges with sourcemaps.
  2. Implement a new token value (e.g. output.filename = "[name]-[chunkhash]-[webpackhash].js") such that changing anything about your webpack config or its dependencies allows you to bust the cache.

Other Considerations

To hack around this in the mean time, I've employed a custom hashing function that uses the final JSONified value of the webpack config as hash salt:

// webpack.config.js
const config = {};

class Hasher {
  constructor() {
    const hash = require("crypto").createHash("sha256");
    hash.update(JSON.stringify(config));
    return hash;
  }
}

Object.assign(config, {
  output: {
    filename: isProd ? "[name]-[chunkhash].js" : "[name].js",
    hashFunction: Hasher
  },
  ...
})

module.exports = config;

This creates a custom hashing function that injects a JSONified version of the webpack config such that changes to webpack's configuration cause the hash to change. In theory, we could use output.hashSalt, but that cannot be lazily evaluated once the entire webpack config has been constructed. Furthermore, output.hashSalt does not get used for MiniCssExtractPlugin's [contenthash], but (confusingly) output.hashFunction does. Finally, this only accounts for changes in the webpack config itself—it does not account for underlying changes in plugins/loaders due to, e.g., version upgrades.

Are you willing to work on this yourself?
Yes! But I think I need help.

@holm

This comment has been minimized.

Copy link

@holm holm commented Aug 15, 2018

This has caught us by surprise multiple times. It is very surprising behaviour that a change in the contents of the output file does not change its filename.

This would be great to have supported natively, so it "just works". Currently we have found no work around other than essentially generating random filenames on every build.

@philipwalton

This comment has been minimized.

Copy link
Contributor

@philipwalton philipwalton commented Sep 26, 2018

I just ran into this issue as well, and it's indeed very surprising (and seemingly backwards)!

In my case, I updated my minification settings to fix a bug, but doing so didn't fix my production application because the deployed filenames were exactly the same.

I also noticed that [contenthash] isn't affected by minification at all, which means a developer could accidentally deploy unminified code, realize their mistake and redeploy only to have all their users still get the large, unminified versions because the filenames haven't changed.

In my opinion this behavior should definitely change. At minimum, the documentation should make it clear that [contenthash] is not based on the final, output file's contents.

@zaaack

This comment has been minimized.

Copy link

@zaaack zaaack commented Jan 9, 2019

Any update here? Using the content hash of output file can save lots of work of integration with our other infrastructures, like check file's hash match the name, and ensure hash will change after content change, hoping we can see this feature out of the box soon.

@philipwalton

This comment has been minimized.

Copy link
Contributor

@philipwalton philipwalton commented Oct 3, 2019

It looks like this issue might be partially fixed. @sokra can you confirm?

I couldn't find anything about it in the CHANGELOG, but—at least in the case of changing minification settings—a new hash is now being generated for me.

However, I'm still able to create two builds where the same chunk across each build has identical content but the hashes are different (e.g. changes to the minification settings that don't affect the output of a particular chunk).

Ideally, config changes that don't affect the content wouldn't invalidate a chunk.

@jakub-g

This comment has been minimized.

Copy link

@jakub-g jakub-g commented Oct 9, 2019

@philipwalton:

In my case, I updated my minification settings to fix a bug, but doing so didn't fix my production application because the deployed filenames were exactly the same.

I also noticed that [contenthash] isn't affected by minification at all, which means a developer could accidentally deploy unminified code, realize their mistake and redeploy only to have all their users still get the large, unminified versions because the filenames haven't changed.

FYI a linked ticket in terser-webpack-plugin: webpack-contrib/terser-webpack-plugin#18
A partial fix mentioned in this comment: webpack-contrib/terser-webpack-plugin#18 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.