Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't leave comments with whitespace after removing annotations #526

Closed
sventschui opened this issue Nov 30, 2019 · 13 comments
Closed

Don't leave comments with whitespace after removing annotations #526

sventschui opened this issue Nov 30, 2019 · 13 comments

Comments

@sventschui
Copy link

Over at preact we ship minified code but want the code to be tree-shakeable. While implementing this we faced some issues with inheritance. To solve this same issue babel, etc. construct classes/prototypes inside an IIFE that is marked as /*#__PURE__*/. We now want to do the same (see preactjs/preact#2156 (comment)) but face the issue of terser removing all annotations here:

print("//" + c.value.replace(r_annotation, " ") + "\n");

I'd like to ask whether you'd be open to extend terser with a config option to preserve these annotation comments? I'd be happy to PR this change.

Another thing I asked myself is whether this might have side effects. When something is inlined it might be dangerous to keep the __INLINE__ annotation. I'm not super deep into the magic of minification so you might be able to answer this question quicker/better than me.

Bug report or Feature request?

Feature request

Version (complete output of terser -V or specific git commit)

n/a

Complete CLI command or minify() options used

Note the new option --keep-annotations that might take all, a regex or a list of strings

terser --comments /__PURE__/ --keep-annotations all

terser input

import { Component } from 'preact'
const Suspense = /*#__PURE__*/(function createSuspense() {
  function Suspense() {}
  Suspense.prototype = new Component()
  return Suspense;
}())

terser output or error

import { Component } from 'preact'
const Suspense = /**/(function createSuspense() {
  function Suspense() {}
  Suspense.prototype = new Component()
  return Suspense;
}())

Expected result

import { Component } from 'preact'
const Suspense = /*#__PURE__*/(function createSuspense() {
  function Suspense() {}
  Suspense.prototype = new Component()
  return Suspense;
}())
@fabiosantoscode
Copy link
Collaborator

That's pretty reasonable :) if you're willing to PR this, please do go ahead! Thanks in advance!

@kzc
Copy link
Contributor

kzc commented Dec 1, 2019

Pure annotations were removed upon output on purpose. Comments can shift in a number of compress optimizations and can potentially lead to invalid code with non-pure calls inadvertently marked as pure and dropped. This most commonly happens when a user incorrectly annotates a non-call expression (such as a function declaration) as pure. Retaining pure annotations would be 100% safe if compress is disabled. Otherwise to do it correctly would require a fair amount of effort.

@kzc
Copy link
Contributor

kzc commented Dec 1, 2019

Here's another example of the difficulty of retaining pure annotations correctly - even when the annotations are used appropriately on calls:

Given:

$ cat annotation.js 
function f(x) {
    console.log(x);
    return console.log.bind(console);
}
(function() {
    var c = /*@__PURE__*/f(1);  // can't be pure since it is used below
    c(2);
    /*@__PURE__*/ f(3);
    f(4);
})();

we'd expect the minified output to produce:

$ cat annotation.js | terser -c | node
1
2
4

Now let's misspell the pure annotation to see what happens:

$ cat annotation.js | sed 's/PURE/PURe/g' | terser -c -b comments=/PUR/
function f(x) {
    return console.log(x), console.log.bind(console);
}

/*@__PURe__*/ f(1)(2), 
/*@__PURe__*/ f(3), f(4);

And if we convert PURe back to PURE and minify that we get the incorrect result:

$ cat annotation.js | sed 's/PURE/PURe/g' | terser -c -b comments=/PUR/ | sed 's/PURe/PURE/g' | terser -c | node
4

because pure annotations are greedy and choose the longest subsequent call (or new) expression.

There are many other different ways retaining pure annotations can go wrong if done incorrectly.

@tmorehouse
Copy link

I am finding that /*#__PURE__*/ annotations (/*#__PURE__*/) are being rendered as /* */ when I specify a --comments="/#__PURE__/" cmd line option. No matter how I modify the RegExpr string, I get empty comments /* */ in the final output

@sventschui
Copy link
Author

As @kzc pointed out: This is intentional as the annotations might be in the wrong place after minification. They‘re open for a PR but the solution is not as easy as just preserving them as one needs to make sure the annotations are at the right place after minification

@tmorehouse
Copy link

tmorehouse commented Dec 20, 2019

@sventschui I just find it bizarre that it renders /* */ instead of nothing or /*#__PURE__*/* (
it keeps opening and closing comment tags but not the content of the comment, based on the value of --comments). i.e.:

terser  --comments "/__PURE__/"

@fabiosantoscode
Copy link
Collaborator

That's very interesting. We can easily check if the comments are just whitespace after removing the annotation and remove them. Let's do that.

@fabiosantoscode fabiosantoscode changed the title Allow to preserve the annotations in comments <s>Allow to preserve the annotations in comments</s> Don't leave comments with whitespace after removing annotations Dec 20, 2019
@fabiosantoscode fabiosantoscode changed the title <s>Allow to preserve the annotations in comments</s> Don't leave comments with whitespace after removing annotations Don't leave comments with whitespace after removing annotations Dec 20, 2019
@tmorehouse
Copy link

tmorehouse commented Dec 20, 2019

@fabiosantoscode But I don't think that solves my problem of leaving /*#__PURE__*/ comments alone (I want to leave them in our minified esm bundle for tree shaking by users).

@fabiosantoscode
Copy link
Collaborator

Yeah, I think we don't want to do that. At first I thought why not, then I read @kzc's reasoning and further thought of the maintenance burden of trying to move code with these annotations and preserve the annotations. Plus, I've been thinking of a new way of annotating which is a lot more resilient. It involves an npm module with do-nothing functions that have special meaning to the compiler. This approach of using comments works great if the only compiler reading them is UglifyJS/Terser and they are removed before output. To achieve something that can be read by multiple compilers a more solid approach is required.

@tmorehouse
Copy link

tmorehouse commented Dec 20, 2019

We currently have --comments "/^!/" to preserve banner comments, which works fine with terser, but single line comments like /*#__PURE__*/, when we set --comments "/__PURE__/" get converted to /* */ in the output. This makes it almost impossible for tree shaking to work on minified ESM bundles (specifically for exported function calls).

Rollup, which we use for building the bundle, converts this:

export foo = /*#__PURE__*/ bar({ some: content })

to this:

var foo =
/*#__PURE__*/
bar({ some: content })

And we would like to preserve the PURE comment. Currently terser preserves the comment delimiters, but not the content of the comment:

var foo=/* */Zx({ some: Ab })

@kzc
Copy link
Contributor

kzc commented Dec 20, 2019

I advised the individuals who implemented pure annotations in Rollup to achieve a high degree of compatibility with uglify and its forks. Rollup does not perform most of the optimizations that uglify-js and terser do, so it manages to avoid the problems associated with retaining pure annotations (as far as I know). Please take the time to read and understand the posts above detailing why pure annotations were intentionally removed by uglify-js and terser to avoid incorrect results.

There are other issues that I haven't mentioned in this thread. Even if my example above #526 (comment) were to hypothetically emit the following correctly annotated output with added parentheses:

$ cat annotation2.js
function f(x) {
    return console.log(x), console.log.bind(console);
}
(/*@__PURE__*/ f(1))(2), 
/*@__PURE__*/ f(3), f(4);

terser would continue to correctly process it:

$ cat annotation2.js | terser -bc
function f(x) {
    return console.log(x), console.log.bind(console);
}

f(1)(2), f(4);
$ cat annotation2.js | terser -bc | node
1
2
4

however Babel with default settings would not retain the necessary parentheses due to this known issue:

$ cat annotation2.js | babel
"use strict";

function f(x) {
    return console.log(x), console.log.bind(console);
}
/*@__PURE__*/f(1)(2),
/*@__PURE__*/f(3), f(4);

so that when this Babel output would be (re)run through terser it would produce this undesirable result:

$ cat annotation2.js | babel | terser -bc
"use strict";

function f(x) {
    return console.log(x), console.log.bind(console);
}

f(4);
$ cat annotation2.js | babel | terser -bc | node
4

This is why uglify and its forks are as conservative as they are and elide pure annotations upon output. Workarounds can be devised for all of these issues, but they are not as trivial as they may appear.

fabiosantoscode pushed a commit that referenced this issue Jan 10, 2020
* preserve pure comments and remove empty comments #526

* feat: add preserve_annotations option

* test: add tests for #526

* refactor: remove unwanted program option
@jridgewell
Copy link
Collaborator

Closing this as fixed with #550

@Finesse
Copy link

Finesse commented Jun 9, 2022

I have exactly the requirement that sventschui described. I understand the risk of leaving the "pure" comments, and I want to leave them despite the risk. Thank you for making this possible.

A note for people from Google: besides adding a comments option you need to add a preserve_annotations option:

const terserConfig = {
  module: true,
  format: {
    comments: /^\s*[#@]__PURE__\s*$/,
    preserve_annotations: true,
  },
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants