Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A separate MIME type for svgz files is needed #701

Closed
mqudsi opened this issue Jun 8, 2019 · 25 comments · Fixed by #716
Closed

A separate MIME type for svgz files is needed #701

mqudsi opened this issue Jun 8, 2019 · 25 comments · Fixed by #716

Comments

@mqudsi
Copy link

mqudsi commented Jun 8, 2019

Presently, both .svg and .svgz files share a mime type image/svg+xml. This is a problem because it means that no transport protocol/layer/application can correctly serve its equivalent of HTTP's Content-Encoding based off of the MIME type alone.

For example, nginx (and Apache?) specify the type of files to be dynamically compressed for serving (the client's Accept-Encoding permitting) based off the MIME type, typically mapped from file extension in a separate (systemwide or application-specific) configuration file (in nginx's case, mime.types).

If mime.types maps svgz to image/svg+xml (and it does by default), it will be compressed the same way .svg files (obviously mapped to image/svg+xml) are. Any sysadmin worth her salt will be applying at least gzip encoding transforms to image/svg+xml, as the savings are enormous (as we all know, text -- and especially verbose formats like XML -- compresses very nicely). But that means that any statically extant .svgz files will be double-encoded, then decoded (read: decompressed) only once by the client, before attempting to render them (incorrectly) as image/svg+xml not in need of any additional transforms.

I propose a separate image/svgz+xml or similar (image/svg+xml+gzip?) that will allow transport applications/layers to distinguish between svg and svgz files via their MIME type alone, so that sysadmins do not need to choose between being able to serve svgz files and serving uncompressed plain svg files.

(There is plenty of precedent here; e.g. .docx and .xlsx files are actually zip files but have their own MIME type to prevent exactly this sort of confusion.)


More practically speaking, a browser served a svgz file as generated by the myriad of SVG editors/compressors/optimizers/etc without a gzip Content-Encoding will not be able to render the image due to broken encoding (tested in Firefox, Internet Explorer, and Chrome); i.e. there is no client-side heuristic already in place to address this.

@longsonr
Copy link

longsonr commented Jun 8, 2019

So we'd double every mime type by having a z equivalent. That doesn't sound useful.

@mqudsi
Copy link
Author

mqudsi commented Jun 8, 2019

So we'd double every mime type by having a z equivalent. That doesn't sound useful.

Thanks for your dismissive reply. I just explained precisely why it is not only useful but needed. SVG is unique in that as a plain text format a compressed version of the same content was adopted (standardized?) as a first-tier file format, directly generated by various editors and encoders, without any effort to mask the compression type.

Take for example PNG, which is also losslessly compressed (LZW) but that compression is internal to the file format, e.g. the magic bits for PNG files say "PNG" and not "LZW".

So, yes, we'd double every mime type by having a z equivalent but only for MIME types that are shared by two entirely different file types that shouldn't have shared a single MIME in the first place; I think that's what you meant to say.

@longsonr
Copy link

longsonr commented Jun 8, 2019

I don't think we'd implement this in Firefox.

@dirkschulze
Copy link
Contributor

@mqudsi As you said, there should be two headers:

  • the content type and
  • the content compression.

The former is the MIME type and the latter would be the GZIP compression. It is unfortunate that Apache (and maybe others?) only relies on the extension to MIME type mapping but this seems to be an implementation detail.

In general, as you noted as well, the compression happens dynamically nowadays and there is no need to even generate .svgz files anymore. That didn't used to be the case in the past.

IMO .svgz should be seen as a legacy and not exercized anymore. It would take years to get clients and backends implementing it and the confusion might be even bigger. It doesn't seem worth the cost to potentially avoid the double compression.

@css-meeting-bot
Copy link
Member

The SVG Working Group just discussed A separate MIME type for svgz files is needed.

The full IRC log of that discussion <krit> topic: A separate MIME type for svgz files is needed
<krit> GitHub: https://github.com//issues/701
<krit> chris_: there is a content type and a content encoding
<krit> chris_: I remember ppl had mime types for svg+gzip
<krit> chris_: over time it got clarified and the SVG spec is the way it is
<krit> chris_: there also was a desire to do compression for the server since it was uncommon to zip dynamically.
<krit> chris_: if we change the mime type now, svgz would stop working.
<krit> chris_: I think the original commenter is wrong
<krit> chris_: Maybe I can give the issue creator give the history and maybe it convinces him
<krit> myles: how can applications differ between the 2 types today?
<krit> chris_: I think implementations just look at the first bytes.
<krit> chris_: In moder use you just drop an SVG file on the server and the server will do the compression on the fly.
<krit> chris_: we were competing with flash and it was important to have small files.
<krit> chris_: but having a separate mime type might never fly.
<krit> chris_: of course I can argue against having 2 types sharing the same mime type but that is history.
<krit> krit: chris_ will reply on the thread and we will pick it up again if there is a negative response.
<krit> trackbot, end telcon

@svgeesus
Copy link
Contributor

In the early days of the Web, the distinction between the type and the encoding was not clear. So for example a MIME type was registered for zip archives, and another one for gzipped files (application/gzip). This was a bad design.

The decision (from memory, around 1997-8) to use a single Internet Media Type (MIME type) for SVG and to use Content-Encoding to indicate the presence of compression (whether on-the-fly compression or static compression generated by some authoring tool) was thus result of experience in th eIETF and W3C, and the specific registration benefited from feedback from the IETF, and remains a sound architecture to this day.

In the early days, most servers did not do on the fly compression. There was a need for authoring tools to be able to emit the compressed form and for content creators who did not have control over server configuration to get the correct result. Which is why .svgz was standardized.

Nowadays, on the fly encoding is common and indeed there are several types (such as Brotli encoding, which does a better job on SVG than gzip). So nowadays on many servers the performance gain can be realized by just dropping a .svg file onto the server.

But using .svgz files is a long established practice, and works well.

Your suggested change would simply have the effect that there would be a new Internet Media type with no support, so people would not use it as the images would not be displayed.

As @longsonr said, a complete duplication of Internet Media types is a very poor solution.

@mqudsi said:

More practically speaking, a browser served a svgz file as generated by the myriad of SVG editors/compressors/optimizers/etc without a gzip Content-Encoding will not be able to render the image due to broken encoding

Correct. And that would indicate an error in the filetype mapping on that server. .svgz means both Content-Type and Content-Encoding should be set. And I know that Apache can be set to do that (indeed, I thought is was the default mapping out of the box). So this is not a problem that seems to occur much in practice, and your suggested solution would not solve it. Instead, just ensure the server is configured correctly.

@svgeesus svgeesus self-assigned this Jun 24, 2019
@mqudsi
Copy link
Author

mqudsi commented Jun 25, 2019

The issue isn't that Apache can't set content type and content encoding, the issue is that some web servers (I know for a fact that nginx is one such server) use content type to determine whether a file should be dynamically compressed or not, then it is served with the detected mime type plus the correct content encoding (and this part is working correctly), e.g.

	gzip_types text/html application/javascript application/json application/xml+rss image/bmp image/svg+xml text/css text/javascript text/plain text/xml;

This directive says "when a resource with a mime type from this list is requested, apply a gzip transform to it, and serve the compressed content with the Content-Encoding: gzip header"

A plain Jane .svg file has a content type image/svg+xml which appears in the example gzip_types list above, so it is correctly compressed (significantly bringing down its size, given the byte-level duplication in text -- and particularly in xml -- files).

The problem is that this directive cannot distinguish between a request for example.com/file.svg and example.com/file.svgz because both have the same content type, so both will be gzipped on the fly, which would be OK if there were a separate content type the latter could be served with, as the dynamically compressed .svg will have the same Content-Type header as the dynamically compressed .svgz file (Content-Encoding: gzip). The end result is that the client in both cases receives a response with

...
Content-Encoding: gzip
Content-Type: image/svg+xml
...

and so has no way of knowing that the resulting file still needs to be decompressed again to actually be a valid SVG (and not SVGZ) file.

The server should either serve a .svgz file as-is with Content-Encoding: gzip and Content-Type: image/svg+xml or it may (pointlessly) recompress it on-the-fly and serve it with Content-Encoding: gzip but then it needs to indicate that the response is not a text document but rather still gzip-encoded.

It wouldn't matter if applications with svg support could dynamically distinguish between a svgz and a svg file without having the correct extension (either by having a shared header that indicates the actual encoding, but that would mandate changes to the file format which is obviously never going to happen) or by simply falling back to trying to gzip deflate then attempt to once again decode as svg+xml if/when the initial decode-as-plain-svg step fails, but for example (@longsonr) Firefox won't decode an at-rest svgz file as svg as it doesn't attempt to decode it as gzip.

Ultimately, the problem is that .svgz files do not have magic header bits to tell whatever client application is opening them that they are gzip-compressed svg files, meaning that without an outwardly-visible indicator that is correctly preserved across transformations, they have no idea that they should decode it first. On Windows where there is no internal concept of mime types, the extension is used to make that distinction. In the web world, extensions have zero significance and the content-type header is used alone to make that decision, and unfortunately it fails in this case.


Note that I can configure nginx (via the mime.types file) to map requests to .svgz files to a different mime type than image/svg+xml which would stop it from dynamically compressing .svgz files but still have true .svg files compressed on-the-fly, but then the response will have Content-Type: foo instead of Content-Type: image/svg+xml because the same content type that is used to determine dynamic compression is also served to the client.

Personally, I don't really care as I'm fully in control of what types we serve. But please understand that this isn't a situation shared by any other file type and so comparisons with gzipped versions of other media types are not appropriate. .svgz isn't me (or whoever else falls prey to this) deciding on their own to gzip a regular svg file and then give it a .svgz extension rather than a .svg.gz extension, it's a regular person using a regular application option to save an SVG document into a format + extension that's been around for a long time, with no indication that this would cause problems in certain deployment scenarios.

It's also important to note that there are almost no drawbacks for adding a mime type here. Applications that ignore the mime type and use only the extension to determine how a file is opened will continue to do so. Applications that rely exclusively on the mime type will continue to fail open the file in this particular case (as it has never been possible to decode a svgz file based purely off the mime type without the content-encoding as well).

@AmeliaBR
Copy link
Contributor

I'm skeptical that this confusion is solvable at this point. Any change introduces compatibility issues.

But I agree that .svgz files on the web are often more pain then they are worth. Even if you serve them correctly, none of the browsers I've tested will compress it again on saving, creating a mismatch with the file extension when you try to open the file.

I would be happy to add a warning to the spec, that

  • Serving .svgz over HTTP requires correct configuration and some web servers do not support the necessary configuration options.
  • The recommended approach is for website authors to use uncompressed .svg files with the best server-enabled compression supported by the client, including the use of more recent compression methods (e.g., Brotli).
  • As an alternative (e.g., if the uncompressed .svg file is very large), it may be possible to configure the server to recognize a .svg.gz file extension as representing a pre-compressed SVG file. (E.g., for nginx this is supported with the gzip_static directive). The website author would need to rename the .svgz file to .svg.gz before uploading.

I'd also hope that requests from web developers might convince servers to add support, but I'm not sure if that will happen. I found a wontfix nginx feature request to add support for .svgz to the gzip_static directive.

@mqudsi
Copy link
Author

mqudsi commented Jun 25, 2019

I can certainly live with that.

@svgeesus
Copy link
Contributor

I found a wontfix nginx feature request to add support for .svgz to the gzip_static directive.

Which sounds bad, but in that bug they explain how to set the server up correctly:

location ~ \.svgz$ { add_header Content-Encoding gzip; }

so wontfix here genuinely does mean that nginx is not actually broken (although their default mime types setup could certainly include this by default).

In terms of the SVG specification and Internet Media Types, though, there is nothing to fix here.

@dirkschulze
Copy link
Contributor

@mqudsi Can we get to a conclusion that a specification note like suggested by @AmeliaBR in #701 (comment) would be a fair enough compromise? @AmeliaBR could you create a PR with the proposed change please?

@mqudsi
Copy link
Author

mqudsi commented Jun 28, 2019

@dirkschulze yup, that's fine :)

@tatarize
Copy link

tatarize commented Jul 2, 2019

Y'all should just deprecate .svgz. Seems like the best way to fix this that introduces no compatibility issues. The feature of "this is just a gzip containing this type of file", is fine but it doesn't need to be done on a per-file-spec basis.

@tatarize
Copy link

tatarize commented Jul 2, 2019

If something like text/plain;compression=gzip got added into MIME it would certainly capture the whole of that feature and might actually be a useful at times svgz would be something like image/svg+xml;compression=gzip but having a different file type with the same MIME type seems like the best thing is to stop doing that.

@AmeliaBR
Copy link
Contributor

AmeliaBR commented Jul 2, 2019

@tatarize I agree with you for a web context. But .svgz is very useful for local file system use, especially on systems that don't support file associations based on stacked file extensions like .svg.gz

@geekley
Copy link

geekley commented Apr 29, 2020

Currently, it's impossible to use svgz on data URIs, right?

@longsonr
Copy link

Currently, it's impossible to use svgz on data URIs, right?

Well a data URI has two text formats, URI encoding and base64 encoding. zipping creates creates binary format data so there's that problem you'd have to address. If you did that you could signal the content type where you currently put base64.

This isn't what we're discussing here though.

@geekley
Copy link

geekley commented Apr 29, 2020

@longsonr Yeah, I know that.
The problem is that, because of this issue, browsers can only recognize the binary data in a data:image/svg+xml;base64,... as a malformed svg, not svgz (since there is no MIME type for svgz or any parameter like ;compression=gzip). So browsers won't parse the data, its impossible.

Chrom(ium):

This page contains the following errors:
error on line 1 at column 1: Encoding error
Below is a rendering of the page up to the first error.

Firefox:

XML Parsing Error: not well-formed
Location: data:image/svg+xml;base64,...
Line Number 1, Column 1:

Of course, you could argue that's a limitation of the data URI spec. However, this change would easily solve this.

@geekley
Copy link

geekley commented Apr 29, 2020

Another possible solution would be if browsers were required to detect the gzip magic number 0x1F8B (control char + non-ascii, so not a valid xml?) and automatically interpret the file as svgz even when served with a content-type for svg.
I'm not saying it's a good solution, but it's possible, since there is no conflict, apparently.

@longsonr
Copy link

Another possible solution would be if browsers were required to detect the gzip magic number 0x1F8B (control char + non-ascii, so not a valid xml?) and automatically interpret the file as svgz even when served with a content-type for svg.
I'm not saying it's a good solution, but it's possible, since there is no conflict, apparently.

We won't be doing that.

@Bsplesk
Copy link

Bsplesk commented Oct 25, 2020

Linux/Gnome/KDE .... etc
command: $ xdg-mime query filetype test.svgz
result: image/svg+xml-compressed

-compressed = more, more problem upload files.
svgz, svgrar, svg7zip, svgnowzip, svgformat, svgcompact, svgmin ..etc - Bad Design.

@adamretter
Copy link

So I agree that .svg and .svgz should have separate media types, the file formats themselves are completely different. One is gziped (binary data), the other is XML (text).

Consider a system which allows a user to upload files via HTTP POST or PUT, perhaps that system wants to do something special with XML and/or text documents.

When the file is uploaded, the Content Type in the HTTP request is set to image/svg, but the server now has no idea if it is receiving an svg or svgz file.
The server would have to do extra file-type determination just for SVG, whereas for the majority of other formats, using the Media Type is sufficient. This seems very silly!

@sdroamt0
Copy link

sdroamt0 commented May 13, 2021

Linux/Gnome/KDE .... etc
command: $ xdg-mime query filetype test.svgz
result: image/svg+xml-compressed

-compressed = more, more problem upload files.
svgz, svgrar, svg7zip, svgnowzip, svgformat, svgcompact, svgmin ..etc - Bad Design.

suppose plainsvg.svg is a plain svg and gzippedsvg.svgz is a gzipped svg then if you do:

$ mv  gzippedsvg.svgz  gzippedsvg.svg
$ xdg-mime query filetype gzippedsvg.svg

you get: image/svg+xml

and if you do:

$ mv  plainsvg.svg  plainsvg.svgz
$ xdg-mime query filetype plainsvg.svgz

you get: image/svg+xml-compressed

it seems that command use ( also ? ) the file extension... not ( only? ) the magic number! (better the file command to determine file type?)

@geekley
Copy link

geekley commented Apr 18, 2023

So ... we're just gonna ignore that data URIs work for every widely used file format except svgz?
Shouldn't W3C/IETF come up with a way to solve that? Even if it means updating data URI spec?
Doing a quick search, apparently it was expected/verified to work before (unless I understood it wrong?) in 2010 (in fact, without any changes, just by auto-detecting it):
https://mailarchive.ietf.org/arch/msg/pkix/7XbZ6Ylg8-n-ACnu7TPz3PPFss0/

I don't know about current Opera, but there's no way to make it work, at least in Firefox. Unless HTML <img> has some attribute I'm unaware of that would simulate a Content-Encoding: gzip for data URIs, or something like that (though it still wouldn't work when opening the data URL in a new tab). I don't think it has.

Btw, I'm not advocating for any specific solution or spec change necessarily, it's just so awkward that something that should work for every file type with a MIME doesn't work in this case because of some technicality-deadlock-thing.

if browsers were required to detect the gzip magic number 0x1F8B

Perhaps the word "require" is too strong here, but what about "recommend"? Something along the lines of:

In contexts where a gzip encoding cannot be specified, it's recommended that user agents interpret files with MIME type image/svg+xml where the binary data starts with the gzip magic number 0x1F8B as an SVGZ file, as if it had been served with HTTP header Content-Encoding: gzip.

Or maybe even "allowed":

[...] user agents are allowed to interpret [...] as SVGZ [...]

IMO it would be the best compromise so that we could at least get this to eventually work in some way. What would be the impediment or downsides? Or maybe there's some other better way to do it?
It would be unfortunate if this becomes a "wontfix" kinda thing.

@TaaviE
Copy link

TaaviE commented Nov 13, 2023

SVGZ files are currently being used inside X.509 certificates for the logotype extension (OID: 1.3.6.1.5.5.7.1.12) for the BIMI standard. The currently used MIME type in those certificates is image/svg+xml. This is rather misleading considering it's actually an svgz in the case of BIMI. There may also be other uses (even with same OID, if not others) where that might not be the case. Not being able to instantly tell which is which does cause confusion and can cause mistakes.

While the web indeed has Content-Encoding: gzip and nobody should do svgz + gzip instead of svg + gzip, this is not really an assumption that can made for other use-cases.

(In the end there are many formats that are compressed or use a compressed container and they have their own MIME type, for a very good reason, they're different formats rather than just .zip, they carry different semantics and need different handling.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.