-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minify modules instead of chunks #104
Comments
We have discussion about this for webpack@5 Minify modules has flaws:
I think here only one solution - reduce memory usage on terser side or increase memory for node |
It's interesting to hear this has already been discussed! What do you mean with inefficient compression though? I feel very confident that compression modules would be the same as compressing the chunk (if we ignore the module wrapper code). I definitely see the part about still needing to minimize the chunk for the non-module code. Maybe Terser could be configured to perform a more lightweight optimization there. But I haven't really thought about it much. That a plugin can alter things after What are the cases where you saw overload btw? I haven't really considered those yet. |
non module webpack code, also it is bad practice uglify whole file (other bundlers do same), i can provide examples also don't forget what some plugins can emit js files and we compress them (we don't parser them only compress), so implementation on module level increase memory cpu/memory usage
It is still required to send whole js file to terser so it is increase memory usage Also parallel option icrease overload on cpu and in theory increase compilation time. As i said above here doesn't exists gold solution:
|
You can write a terser-loader to archive this. The optimization is worse but if you don't care you can boost performance a lot (especially with module level caching: watch, cache-loader or persistent caching in webpack 5). You will end up with some variables not mangled: You can try to invoke the terser-webpack-plugin in addition to this with Also possible |
The real benefit from terser is only manifest after modules are concatenated, so I don't think a loader would do much good. Using terser on a loader would greatly reduce the optimizations that Terser is able to do, whereas using it as a plugin that optimizes modules after they are concatenated would retain the same optimizations for those modules. |
Terser runs fastest when the input code is smaller! @sokra it's possible to pre-warm the terser mangle cache with any global webpack variables you need to use from within, by using the mangle cache options. This will enable you to mangle |
@fabiosantoscode can you provide example? |
@evilebottnawi sure! const terser = require("terser")
const { code } = terser.minify(`
const foo = __webpack_require__("foo").default;
`, {
module: true,
mangle: {
cache: {
props: new Map([
["__webpack_require__", "a"],
["__other_stuff__", "b"],
])
}
}
})
console.log(code) // -> a("foo").default |
@fabiosantoscode how it is decrease minification time and how less? |
In webpack's case you'll want to create a copy of this Map for each module. Terser mutates it. If you don't want to change the user-provided |
I don't have any benchmarks on hand, but I've read this fact (that minification time is reduced) several times before. There's also a nice advantage which is, if a user has an issue with Terser they can reproduce it more easily since chunk generation is out of the picture. |
Here's what the options look like with
|
we need investigate this, maybe benchmark will be great |
Probably something can be whipped up quickly with the lodash repository and a bash script or something. I have my hands full atm since there's a few bugs popping up on Terser. |
@fabiosantoscode no need rush, just todo 👍 |
I have a need of generating a terser-loader that minifies before bundling. The use case is that I'd like to design custom rules for minifing/managling each module instead of applying global mangling rules on the entire bundle. Also there's some Webpack transformations that happen during bundling as well that interfere with contextual mangling rules that I'd like to sidestep by minifing before bundling. The name cache will be of course important in this process so modules can refer to the correctly mangled names for consistency across modules. However some mangling rules should be internalized inside modules verses applied globally, so this loader approach would help. I don't believe there's a Terser loader project, is this something that would help address this issue? I will look into if it if others agree. Other suggestions are welcome. |
@J-Rojas it is ineffective compression and out of scope this plugin, also other plugins (include plugins in webpack) can emit new js assets and you don't compress them. Using loader can't solve problem with memory and cpu usage |
@evilebottnawi you are right that the loader is out of scope. I've begun a new project repo for this effort. So far it has addressed my use case, and I don't agree about the insufficient compression. It seems to be very much on par with minifying as a whole. |
Seriously? Webpack provide own boilerplate code and you can't optimize this code using loader. Also as i said above other plugins can emit JS asset too, so they will be unuglified too. You don't win memory and CPU load - only ineffective uglified code. We development webpack a lot of time and have tried all approaches. |
No need to become defensive. I don't know anything about the memory and cpu load issues as I do not have a problem with this. I'm addressing my use case and the emperical evidence with my approach using a name cache across minified loaded files shows a similar code size. I'm minifying across at least 150 files. Regardless I will continue with my approach to address my use case. Thanks for the input. |
I'll leave this here for anyone interested: |
@evilebottnawi I was able to use terser-loader and webpack-terser-plugin in a 2 phase approach to satisfy my requirements (additional property mangling with per module rules) and also achieve superior compression size. Using this approach with a project that utilizes over 400 modules, I was able to get an additional 22% compression prior to gzip, and 10.5% after gzip. So I'd say skillful use of terser-loader can achieve as good if not better compression. I'll probably do a write up about this eventually when I have more time. |
@J-Rojas you can compress multiple times using multiple plugins, but it is very bad for performance, also sometimes it can create bugs due bugs on terser side (but very rare) |
@evilebottnawi thanks for pointing out that Terser bugs are very rare, it's very nice of you <3 |
Is it possible to mark a part of the code as preminimized and let terser skip it/emit it unmodified? |
Maybe with a comment like |
@sokra those changes would have to go inside the Terser module, since it would parse the code while looking for these tokens. It should be possible to do this, but it's outside of the scope of this repository. However it would be preferable to control which modules are minified via configuration instead of having to modify the code itself. If you are bundling vendor code together, it would create maintenance problems to have to modify this code if it requires specialized minification. Hence the motivation for the loader approach. |
I think I see what @sokra is thinking about. Let me give a concrete example. Imagine you have these files:
When these files are bundled with module concatenation turned on, you end up with three modules:
But even though you have 3 modules, you only have two chunks. Only the module containing The really important thing for Terser to minify are these 3 modules. Terser isn't aware of any kind of module loading so it will always process these as isolated. But because If you look at the webpack output, it looks like this:
@sokra mentioned this:
The 3 modules I mentioned before can be minified in isolation because their body, within the function closure, do not share anything with the outside. For the
If they were minified separately, it would be useful to leave a hint for Terser indicating that the function should be ignored because it was already minified. This way Terser would ignore the pre-minified modules and only minify the webpack module loading logic around the modules proper:
This approach would still require two Terser passes: one that processed modules after concatenation, and one that processed all chunks at the end including any extra js assets. The difference is the first pass would process much smaller pieces of code, which leads to a better load distribution between workers and less parse-related resource consumption. Then the second pass would be much faster because Terser would ignore all modules that were already minified. |
Here potential problem as tree shaking, we can lose some |
And we still have big memory usage, because big file still in memory, i think better solution here is searching how we can optimize memory/cpu usage on terser side |
The big file might still be in memory, but what consumes resources is not the size of the file itself but rather the result of parsing the file. With the It's true that Terser could optimize resource usage. But Terser already has a way to do this: feeding Terser the isolated modules guarantees it will use the least amount of resource usage possible. In this particular case |
maybe you can provide example? usually memory consumption increased only when source map enabled
|
https://github.com/vmware/clarity is a project I benchmarked in the past and stored the results in https://github.com/filipesilva/angular-cli-perf-benchmark.
The important part is this:
The first time the build ran, it took 4962 MB of RAM. Subsequent times it only took around 1700 MB because it was using the You can reproduce these results by cloning the repo, adding circleci to it, and uncommenting the I understand that the I hope we can agree that the performance of At some point a user might need to turn off parallelization to reduce memory because their CI machine doesn't have enough. Then they have to turn off source maps too. Lastly they have to artificially split chunks. None of these are things a user wants to do for their app, it's things they have to do because otherwise the build will fail. |
Thanks for repo, i still think we can optimize terser, potentially terser can split source file on parts for reducing memory usage (he already parse code so it should be not hard). in fact, this should be done on the |
@filipesilva the multiple pass approach is what I'm using in my project with 400+ module files. I'm using This does not address memory consumption since the final output file is still processed and the larger the file, the more memory consumed. That is a Terser specific issue and would likely need some significant changes to solve it. |
Any help on finding out where Terser is using too much RAM are appreciated. I haven't gotten the time to learn how people optimize RAM usage these days and get Terser even through the simplest of testing/inspector stuff. In the meantime I'm switching some stuff to use bitfields for CPU reasons, I guess this will save a little bit of RAM as well. I don't see Terser going multiprocess unless it somehow gains bundler abilities. Which is not off the table. However I really feel that optimising modules one by one could be really beneficial, and parallelism would be better too. Computers with 4 cores and just 2 chunks are just wasting 50% of potential CPU time. Even if the RAM story in Terser goes perfect, that's still a lot of wasted CPU! |
WRT already-compressed code I think something like the annotation @filipesilva mentioned would be pretty cool. Terser (and UglifyJS as well) historically sucks at re-compressing compressed files. Oh, the mysteries life has for us. I have zero idea of why that might be happening. |
@fabiosantoscode i don't think we have problems with cpu usage, only memory, i think it can be easy debugging, just create big file (or get from reproducible test repo above) and run terser using own cli with Example of code (we have terser ast in
I think there are a lot of small optimizations on terser side and they can potentially reduce memory consumption |
@J-Rojas in your setup I believe you're still running Terser over the modules twice, which is what I'd like to avoid in order to reduce the resource usage. @fabiosantoscode in the webpack world, @evilebottnawi I'll get a sample of a medium sized bundle and a large bundle, profile CPU and memory usage, and open an issue at the terser repository. Maybe there's some low-effort optimizations that can be done that yield significant benefits. |
@filipesilva thanks, maybe we can use typed arrays, map and set (weak) in terser, what should be potentially decrease memory usage We have already pretty much optimized everything on our side, anyway if somebody have ideas PR/feedback welcome |
Terser performance tracking issue (terser/terser#478), including a benchmark repo (https://github.com/filipesilva/terser-performance). |
@filipesilva cool that you did that. One have to use a non-minimized terser version to see anything. I looked at the profile, but didn't see anything obvious.
|
The reason for MAP and MAP.splice is that some transformations can return multiple statements or expressions. Otherwise array.map() would be great. Regarding DEFMETHOD, I do agree with you. For the other points I'll have a look at each and see what can be done. However this still doesn't fix the memory usage, just CPU usage. An interesting exercise would be to call Babel or acorn on the unminified chunk and see how much memory they do use and where. Because Terser's memory allocations are concentrated in the parsing phase (creating a ton of AST nodes) |
Oh yes you are right, missed the |
True @sokra. Probably |
@filipesilva today i will release a new version of terser plugin, it is reduce memory usage around 80-90% for big projects, small projects also have memory improving (now we don't create unnecessary workers when files is less than CPU cores + concurrences when files more than workers). Maybe be you can again run benchmarks #104 (comment) and provide information here? |
@evilebottnawi awesome, thanks for letting me know! Once a release is out I can re-run the benchmarks. |
@evilebottnawi tried the same project as before, but had to use more recent user code and dependencies, and I also ran it on my machine instead of on CI. The numbers in this comment shouldn't be compared with my earlier comment. This project produces around 50 chunks, with around five big ones (~1MB) and the rest are small ones (~50KB). With
With
The first number in the parenthesis here is the important one, since it's the resource usage for the first build. The second and third build use the So for both average and peak memory usage I don't see an improvement with Total number of processes used went down, but build time doesn't seem to have really been affected much. For this specific project it looks like #211 didn't do much difference locally. I imagine it helped on CI the situation described by @cjlarose in #143 (comment), but mostly because the fork problem was fixed. If you want to try and use the same benchmarking tool for other cases, you can globally install it with |
Thanks for information, I investigate that in near future, maybe we should improve memory consuming not only in terser plugin 😄 |
@filipesilva The measurements that you're reporting are consistent with what I'd expect for upgrading from 2.3.2 to 2.3.3. The memory improvements made in 2.3.3 (specifically #211) reduce the total required maximum heap size because it takes a portion of the code that would allocate and retain large amounts of memory and instead makes it so that we avoid new allocations until necessary (when a worker becomes available) and release references as we make progress. But what you're measuring is average and peak memory usage (probably RSS) when using a In 2.3.2, it was possible for In 2.3.3, So upgrading to 2.3.3 while keeping your |
@cjlarose yeah I didn't think the 2.3.3 changes would affect the problem this issue was opened to address. If I understood the changes in 2.3.3 correctly, they help reduce the This part of what you said is not really obvious to me:
I don't think that's accurate, IIRC there are also periodic GC. For instance, if you are running webpack builds on watch mode, you are continuously allocating new chunks of memory. But if you look at the memory usage of the process, it will be going up and down. So under that model, 2.3.3 should also bring some memory savings if there's a GC pass. |
I want to close this issue, but we potential can revisit it in future, main idea - uglify modules and whole files, we need support from terser comments like |
Feature Proposal
terser-webpack-plugin
operates on webpack chunks directly via theoptimizeChunkAssets
compilation hook. At this point individual chunks exist, each containing a collection of modules wrapped in the Webpack module loader. A single chunk can contain many modules.Terser does not understand the indirection provided by the Webpack module loader and will end up optimizing each module individually. Providing a whole chunk to terser will yield the same optimizations as providing the individual modules contained in that chunk.
It's still important to optimize modules as late as possible because Webpack will concatenate modules. In fact, this concatenation is what enables most of the savings with Terser, since that allows Terser to analyse more code in a single module.
So a better place to execute terser would could be one of the hooks below, optimizing the individual modules:
optimizeModulesAdvanced
afterOptimizeModules
optimizeChunkModulesAdvanced
afterOptimizeChunkModules
I don't know which one is better. But all of them seem to provide modules and are around the time
optimizeChunkAssets
runs as well.Feature Use Case
On large builds, individual chunks might be very large and require a lot of memory and CPU to process. In angular/angular-cli#13734 (comment) I benchmark the peak memory usage of several projects and saw that the parallel terser processing can greatly contribute to it.
A concrete example is of a project that used around 1gb memory most of the , and when it spawned processes for terser it had to process several small chunks plus one or two large chunks. The small chunks used between 15 and 80mb memory, but the large chunks used up to 400mb and took much longer to process. By processing a large quantity or smaller modules, worker processes can use less host machine resources on average and spread the load more evenly.
The text was updated successfully, but these errors were encountered: