Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible CSS URI issue #180

Closed
client9 opened this issue Mar 4, 2018 · 13 comments
Closed

Possible CSS URI issue #180

client9 opened this issue Mar 4, 2018 · 13 comments

Comments

@client9
Copy link

client9 commented Mar 4, 2018

Hello, thanks for much for working minify and the matching parsers!

In CSS, the URI minifier seems to picking the wrong format to get smallest output

Here's a reduced test case:

$ date
Sun Mar  4 12:14:15 PST 2018

$ go get -u github.com/tdewolff/minify/cmd/minify

$ cat test.css
p{background-image:url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23fff' viewBox='0 0 8 8'%3E%3Cpath d='M2.75 0l-1.5 1.5 2.5 2.5-2.5 2.5 1.5 1.5 4-4-4-4z'/%3E%3C/svg%3E")}

$ minifiy test.css > test-out.css

$ cat test-out.css
p{background-image:url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9IiNmZmYiIHZpZXdCb3g9IjAgMCA4IDgiPjxwYXRoIGQ9Ik0yLjc1LjBsLTEuNSAxLjVMMy43NSA0bC0yLjUgMi41TDIuNzUgOGw0LTQtNC00eiIvPjwvc3ZnPg==)}

$ ls -l test.css test-out.css 
-rw-r--r--  1 nickg  staff  248 Mar  4 12:03 test-out.css
-rw-r--r--  1 nickg  staff  211 Mar  4 12:01 test.css

$ gzip -9 test.css test-out.css 

$ ls -l test.css.gz test-out.css.gz 
-rw-r--r--  1 nickg  staff  251 Mar  4 12:03 test-out.css.gz
-rw-r--r--  1 nickg  staff  195 Mar  4 12:01 test.css.gz

In addition, a flag to prevent base64 encoding might be wise. I've found that base64 encoding might make an absolute file smaller, but it also wrecks compression. In other words, the larger original format sometimes compresses better than the smaller base64 version.

It is also highly possible there is user error in my analysis. Thoughts welcome and thanks again.

n

@tdewolff
Copy link
Owner

tdewolff commented Mar 7, 2018

Thanks for the bug report, I'm surprised it hadn't been noticed before! I figured out what is wrong, but I want to double check which characters need to be percent-encoded. Looks like the strict spec says / needs to be escaped, but in your test case you have not.

@client9
Copy link
Author

client9 commented Mar 7, 2018

hi! here's how I found it, which may or may not be useful for you.

I wrote a tool that looks at HTML files and then cuts out unused selectors in a CSS file. I used your excellent parser to read the CSS file. I didn't intend to write a CSS minifier, but if you use your parser just to echo out the CSS tokens, you get a damn good minifier. Thats the short version, my tool also removes empty @media rules, a few other things, but it does't touch the property values at all (e.g. 1000 -> 1e3 etc)

Anyways its over at https://github.com/client9/csstool

I tested it on bootstrap.min.css (version 4.0)

$ css minify < bootstrap.min.css > bootstrap.min.css-1
$ minify --mime text/css < bootstrap.min.css > bootstrap.min.css-2
$ ls -l bootstrap.min.css*

-rw-r--r--@ 1 nickg  staff  144877 Feb 19 21:57 bootstrap.min.css
-rw-r--r--  1 nickg  staff  144826 Mar  6 21:34 bootstrap.min.css-1
-rw-r--r--  1 nickg  staff  145034 Mar  6 21:35 bootstrap.min.css-2

weird! So my dumb minifier (based on your parser) was actually better than the nodejs minifier that bootstrap uses (assuming I don't have errors -- very possible).

But surprising that your minifier in spite of all the extra optimizations it has, is larger... and so I started to investigate why...

again using your parser I whipped up a css indenter (which works but isn't quite done):

$ css format < bootstrap.min.css > bootstrap.min.css-indent
$ css format < bootstrap.min.css-1 > bootstrap.min.css-1-indent
$ css format < bootstrap.min.css-2 > bootstrap.min.css-2-indent
$ ls -l bootstrap.min.css*indent
-rw-r--r--  1 nickg  staff  174784 Mar  6 21:43 bootstrap.min.css-indent
-rw-r--r--  1 nickg  staff  174784 Mar  6 21:43 bootstrap.min.css-1-indent
-rw-r--r--  1 nickg  staff  174992 Mar  6 21:43 bootstrap.min.css-2-indent

$ diff bootstrap.min.css-indent bootstrap.min.css-1-indent
# nothing

the orig bootstrap and my dumb one are the same. ok.

here's the full diff the original (indented) and the one from minify

$ diff bootstrap.min.css-1-indent bootstrap.min.css-2-indent
1,6c1,4
< /*!
<  * Bootstrap v4.0.0 (https://getbootstrap.com)
<  * Copyright 2011-2018 The Bootstrap Authors
<  * Copyright 2011-2018 Twitter, Inc.
<  * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE)
<  */
---
> /*!* Bootstrap v4.0.0 (https://getbootstrap.com)
> * Copyright 2011-2018 The Bootstrap Authors
> * Copyright 2011-2018 Twitter, Inc.
> * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE)*/
57c55
<   font-family: -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";
---
>   font-family: -apple-system,BlinkMacSystemFont,segoe ui,Roboto,helvetica neue,Arial,sans-serif,apple color emoji,segoe ui emoji,segoe ui symbol;
390c388
<   font-family: SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;
---
>   font-family: SFMono-Regular,Menlo,Monaco,Consolas,liberation mono,courier new,monospace;
3008c3006
<   background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 8 8'%3E%3Cpath fill='%23fff' d='M6.564.75l-3.59 3.612-1.538-1.55L0 4.26 2.974 7.25 8 2.193z'/%3E%3C/svg%3E");
---
>   background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4IDgiPjxwYXRoIGZpbGw9IiNmZmYiIGQ9Ik02LjU2NC43NWwtMy41OSAzLjYxMi0xLjUzOC0xLjU1TDAgNC4yNiAyLjk3NCA3LjI1IDggMi4xOTN6Ii8+PC9zdmc+);
3014c3012
<   background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 4 4'%3E%3Cpath stroke='%23fff' d='M0 2h4'/%3E%3C/svg%3E");
---
>   background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0IDQiPjxwYXRoIHN0cm9rZT0iI2ZmZiIgZD0iTTAgMmg0Ii8+PC9zdmc+);
3029c3027
<   background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='-4 -4 8 8'%3E%3Ccircle r='3' fill='%23fff'/%3E%3C/svg%3E");
---
>   background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9Ii00IC00IDggOCI+PGNpcmNsZSByPSIzIiBmaWxsPSIjZmZmIi8+PC9zdmc+);
3042c3040
<   background: #fff url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 4 5'%3E%3Cpath fill='%23343a40' d='M2 0L0 2h4zm0 5L0 3h4z'/%3E%3C/svg%3E") no-repeat right .75rem center;
---
>   background: #fff url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0IDUiPjxwYXRoIGZpbGw9IiMzNDNhNDAiIGQ9Ik0yIDAgMCAyaDR6bTAgNUwwIDNoNHoiLz48L3N2Zz4=) no-repeat right .75rem center;
3599c3597
<   background-image: url("data:image/svg+xml;charset=utf8,%3Csvg viewBox='0 0 30 30' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath stroke='rgba(0, 0, 0, 0.5)' stroke-width='2' stroke-linecap='round' stroke-miterlimit='10' d='M4 7h22M4 15h22M4 23h22'/%3E%3C/svg%3E");
---
>   background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMzAgMzAiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PHBhdGggc3Ryb2tlPSJyZ2JhKDAsIDAsIDAsIDAuNSkiIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbWl0ZXJsaW1pdD0iMTAiIGQ9Ik00IDdoMjJNNCAxNWgyMk00IDIzaDIyIi8+PC9zdmc+);
3633c3631
<   background-image: url("data:image/svg+xml;charset=utf8,%3Csvg viewBox='0 0 30 30' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath stroke='rgba(255, 255, 255, 0.5)' stroke-width='2' stroke-linecap='round' stroke-miterlimit='10' d='M4 7h22M4 15h22M4 23h22'/%3E%3C/svg%3E");
---
>   background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMzAgMzAiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PHBhdGggc3Ryb2tlPSJyZ2JhKDI1NSwgMjU1LCAyNTUsIDAuNSkiIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbWl0ZXJsaW1pdD0iMTAiIGQ9Ik00IDdoMjJNNCAxNWgyMk00IDIzaDIyIi8+PC9zdmc+);
4616c4614
<   font-family: -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";
---
>   font-family: -apple-system,BlinkMacSystemFont,segoe ui,Roboto,helvetica neue,Arial,sans-serif,apple color emoji,segoe ui emoji,segoe ui symbol;
4712c4710
<   font-family: -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";
---
>   font-family: -apple-system,BlinkMacSystemFont,segoe ui,Roboto,helvetica neue,Arial,sans-serif,apple color emoji,segoe ui emoji,segoe ui symbol;
4791c4789
<   border-width: 0 .5rem .5rem .5rem;
---
>   border-width: 0 .5rem .5rem;
4868,4869c4866,4867
<   -webkit-perspective: 1000px;
<   perspective: 1000px;
---
>   -webkit-perspective: 1e3px;
>   perspective: 1e3px;
4949c4947
<   background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23fff' viewBox='0 0 8 8'%3E%3Cpath d='M5.25 0l-4 4 4 4 1.5-1.5-2.5-2.5 2.5-2.5-1.5-1.5z'/%3E%3C/svg%3E");
---
>   background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9IiNmZmYiIHZpZXdCb3g9IjAgMCA4IDgiPjxwYXRoIGQ9Ik01LjI1LjBsLTQgNCA0IDQgMS41LTEuNUw0LjI1IDRsMi41LTIuNUw1LjI1LjB6Ii8+PC9zdmc+);
4952c4950
<   background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23fff' viewBox='0 0 8 8'%3E%3Cpath d='M2.75 0l-1.5 1.5 2.5 2.5-2.5 2.5 1.5 1.5 4-4-4-4z'/%3E%3C/svg%3E");
---
>   background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9IiNmZmYiIHZpZXdCb3g9IjAgMCA4IDgiPjxwYXRoIGQ9Ik0yLjc1LjBsLTEuNSAxLjVMMy43NSA0bC0yLjUgMi41TDIuNzUgOGw0LTQtNC00eiIvPjwvc3ZnPg==);
7432d7429
< /*# sourceMappingURL=bootstrap.min.css.map */

thanks again for a great parser and minifier let me know if you need help with any of this, especially for verification

n

@tdewolff
Copy link
Owner

Looking at http://www.ietf.org/rfc/rfc3986.txt section 2.2 I'm not fully convinced what is right or wrong. Given If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed. I would assume characters such as / would not need to be percent-encoded...but I can't find conclusive information about this other than it seems to work in most browsers.

@client9
Copy link
Author

client9 commented Mar 15, 2018

hi @tdewolff - yeah I think the spec is complete especially in cases of composition.

FWIW, i found this example in bootstrap 4.0.0 which is apparently is used by 3.6% of the internet, so I'd guess it would be noticed quickly if it caused problems.

Bootstrap appears to use clean-css with level 1 optimizations for it's minification.

hope this helps.

regards,

n

@MichaelMonashev
Copy link

@lexborisov , do you know RFC for this case?

@MichaelMonashev
Copy link

@tdewolff
Copy link
Owner

See https://tools.ietf.org/html/rfc3986#page-12 which says : / ? # [ ] @ ! $ & ' ( ) * + , ; = all need to be percent-encoded. I haven't found many specific mentionings of data URIs, except for https://tools.ietf.org/html/rfc2397, which refers to https://tools.ietf.org/html/rfc2396 and says ; / ? : @ & = + $ , need to be percent-encoded, which is already considerably less. It also says it needs those characters percent-encoded when they don't conflict with their reserved meaning. The latter is what is used by url.QueryEscape (the function used by the data URI minifier).

Looking at https://en.wikipedia.org/wiki/Data_URI_scheme though, it seems : ; , don't required percent-encoding.

See https://codepen.io/tigt/post/optimizing-svgs-in-data-uris for a better explaination, however https://perishablepress.com/stop-using-unsafe-characters-in-urls/ suggests that we can keep ; / ? : @ = & as regular characters: those characters are not used for theire "defined" purpose.

In the end, I'm really not sure what to do. Data URIs seem to refer to an older RFC that is less strict, and given some definitions it seems even those reserved characters mostly don't have a specific meaning for data URIs (so we don't have to percent-encode?). On the otherhand, many people refer to the newer RFC which is far stricter, but also says represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component, and it makes a distinction between gen-delims and sub-delims. I just have NO idea what this means in the end. Most people say "stay save" and encode them all, but as I read the RFC I feel like there is some leeway.

@dchenk
Copy link
Contributor

dchenk commented Jul 26, 2018

@tdewolff

Looks like the strict spec says / needs to be escaped, but in your test case you have not.

Actually, your minifying rules are wrong here. The original CSS string (url("data:image/svg+xml;ch[...etc.]) has the URI surrounded by double quotes. So it was valid. No additional escaping was necessary.

But simply dropping the surrounding quotes makes the CSS invalid.

I suggest that you add an option to the CSS minifier to not drop quotes around the url() construct. I would rather not have any quotes removed from the URIs there.

@tdewolff
Copy link
Owner

tdewolff commented Jul 26, 2018

@dchenk Quotes can be removed within the url( function as per the spec. As to escaping /, this is debatable as it is ill-defined in the spec. However, you may prove me wrong by quoting the specifications for CSS3, see https://developer.mozilla.org/en-US/docs/Web/CSS/CSS3

See for the url definition: https://drafts.csswg.org/css-values-3/#urls it specifically allows removing quotes. Important note to self: it appears we always need to escape (, ) and whitespace, as per https://drafts.csswg.org/css-syntax-3/#consume-url-token. Also, what are <url-modifiers>?

@dchenk
Copy link
Contributor

dchenk commented Jul 26, 2018

@tdewolff you should be reading the actual spec, not MDN. (I mean you probably are, but that link you gave isn't sufficient.)

The spec says (https://www.w3.org/TR/CSS2/syndata.html#uri):

Some characters appearing in an unquoted URI, such as parentheses, white space characters, single quotes (') and double quotes ("), must be escaped with a backslash so that the resulting URI value is a URI token: '(', ')'.

In other words, the code url("data:image/svg+xml;ch...and-so-on") is valid. And simply dropping the double quotes without escaping the string is invalid.

@dchenk
Copy link
Contributor

dchenk commented Jul 26, 2018

@tdewolff a simple solution to consider would be first doing strconv.Quote on strings within url() and then simply slicing out the string within the double quotes. This way all the usual suspects would be escaped.

@tdewolff
Copy link
Owner

tdewolff commented Jul 26, 2018

When dropping the quotes, the string should already be escaped properly. If not, you can submit a bug report. The issue was that some believe that we escape too many characters, but this is ill-specified.

@tdewolff
Copy link
Owner

I've fixed two bugs in the parser and data URI minifier:

  • bad percent-encoded values are ignored; the rest of the URI is now still unencoded
  • sometimes ASCII was chosen over base64 eventhough ASCII was longer than base64

The problem still remains that for data URIs it is ill defined which characters should be escaped and which not. Right now I still follow the same rules as decribed in https://tools.ietf.org/html/rfc2397, which referenced uric in https://tools.ietf.org/html/rfc2396:

uric          = reserved | unreserved | escaped
      reserved      = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                      "$" | ","
      unreserved    = alphanum | mark
      mark          = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                      "(" | ")"

Other characters than unreserved are percent-encoded, but one could argue that some of the other characters don't need to be encoded as they have no special meaning. There is however no specification or RFC about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants