Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with XML content requirements and external entities: make MathML and SVG1.1 invalid #1338

Closed
iherman opened this issue Oct 9, 2020 · 98 comments · Fixed by #1368
Closed
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Topic-XML The issue affects XML processing

Comments

@iherman
Copy link
Member

iherman commented Oct 9, 2020

The current text on XML conformance says that an XML-based media MUST meet:

...
External identifiers MUST NOT appear in the document type declaration [XML]
...

This constraint apply regardless of whether the given Publication Resource is a Core Media Type Resource or a Foreign Resource.

However, that creates problems with SVG 1.1. The conformance rules of SVG1.1 require the presence of:

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" SYSTEM "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

The DTD itself, as described in the relevant appendix, is making a heavy use of external entities (via its usage of DTD Modularization). As a consequence, per the current document, no conform SVG 1.1 file is valid for EPUB 3.3.

Although the upcoming SVG 2 removed this requirement, there are lots of perfectly valid SVG 1.1 out there (SVG 2 is not yet a Recommendation, b.t.w., only a CR). Most notably, Adobe Illustrator generates SVG 1.1 files with those DTD-s in them.

A similar problem may occur with MathML 3, whose official DTD relies on the same approach (although MathML is more permissive and it does not require the usage of that DTD).

This problem was originally raised in w3c/epubcheck#1114.

@iherman iherman added the Topic-ContentDocs The issue affects EPUB content documents label Oct 9, 2020
@iherman
Copy link
Member Author

iherman commented Oct 9, 2020

Note that, in w3c/epubcheck#1114 two possible ways forward were already proposed:

@murata2makoto
Copy link
Contributor

murata2makoto commented Oct 10, 2020

@iherman

The conformance rules of SVG1.1 require the presence of:

No, it does not. The rules automatically add the DTD before validation.

The DTD itself, as described in the relevant appendix, is making a heavy use of external entities (via its usage of DTD Modularization). As a consequence, per the current document, no conform SVG 1.1 file is valid for EPUB 3.3.

The DTD heavily uses external parameter entities, which never occur in documents. But if memory is correct, it does not define any parsed entities.

@iherman
Copy link
Member Author

iherman commented Oct 10, 2020

@murata2makoto

@iherman

The conformance rules of SVG1.1 require the presence of:

No, it does not. The rules automatically add the DTD before validation.

Checking it again, I stand corrected:-) The validity process requires the presence or the addition of the DTD. So my sentence:

per the current document, no conform SVG 1.1 file is valid for EPUB 3.3.

should rather say:

per the current documents, conform SVG 1.1. files may not be valid for EPUB 3.3

But, taking into account that tools produce SVG files with the DTD, and they have the perfect right to do so, the problem is not less severe for us. (I have just ran into a different application, called draw.io, that can produce nice diagrams in SVG and adds the DTD.)

The DTD itself, as described in the relevant appendix, is making a heavy use of external entities (via its usage of DTD Modularization). As a consequence, per the current document, no conform SVG 1.1 file is valid for EPUB 3.3.

The DTD heavily uses external parameter entities, which never occur in documents. But if memory is correct, it does not define any parsed entities.

I believe you are right. But the presence of the external parameter entities in the DTD is enough to raise a red flag for EPUB.

@murata2makoto
Copy link
Contributor

@iherman

But the presence of the external parameter entities in the DTD is enough to raise a red flag for EPUB.

I do not understand. Since we do not allow external DTD subsets, external parameter entities are never encountered.

I think that you would like to use the output of SVG software that always embed references to the DTD at the W3C web site. But such references have caused troubles to the W3C web site in the past.

@iherman
Copy link
Member Author

iherman commented Oct 10, 2020

I think that you would like to use the output of SVG software that always embed references to the DTD at the W3C web site.

Indeed. Such software implementations are correct in doing so, they exist, and our users may want to use them. Adobe Illustrator is probably the most prominent, and certainly very important example. I do not think we can expect, say, Adobe Illustrator users to manually edit their generated SVG files by removing that DTD reference.

I.e., this is a very practical deployment issue.

But such references have caused troubles to the W3C web site in the past.

I do not know about the past, but the troubles are not in the present. My original issue came from the fact that most of the logos used in the W3C publications (e.g., all recommendations) in the past 3-4 years (at least) use SVG files that have those DTD-s, and all the Web Browsers accept those without further ado. Converting them into valid EPUB is not possibly without manually modifying the SVG files.

I suspect that this issue never came up in EPUB land so far because Reading Systems tacitly accepted such SVG files (as the core Web engines accepted them). What brought it to the fore is the much stricter newer version of epubcheck that rejects those files. (This started the whole discussion in w3c/epubcheck#1114 after all...)

I cc @rdeltour explicitly here, to call on the epubcheck implementers to give their opinion.

@murata2makoto
Copy link
Contributor

@iherman

I still do not think that this is a problem. Yes, human users or software always have to remove external DTD subsets, if any. IDPF prohibited them, since parsing documents containing external DTD subsets has caused unnecessarily network access.

@mattgarrish
Copy link
Member

Yes, human users or software always have to remove external DTD subsets, if any.

But why are we putting this requirement on authors and not reading systems? It's a reading system optimization pushed onto authors to carry out when a developer could suppress any resolution of DTD references. Isn't that pretty standard anyway, as it's been a while since I processed XML documents but W3C used to blacklist you if you tried to continuously resolve their DTDs.

(And just for the record, I don't see that this affects MathML. MathML can only be embedded in HTML so can't have its own DTD declaration.)

@iherman
Copy link
Member Author

iherman commented Oct 10, 2020

@iherman

I still do not think that this is a problem. Yes, human users or software always have to remove external DTD subsets, if any. IDPF prohibited them, since parsing documents containing external DTD subsets has caused unnecessarily network access.

Well, this is where we disagree. I believe it is a problem that one of the official content format has an unnecessary restriction compared to its official definition.

But, actually, I do not understand your note, @murata2makoto. In your comment in w3c/epubcheck#1114 (comment) you seem to say that there is no such problem, that content with such DTD is o.k., and the bug is by epubcheck. In which case I believe we have to add some explanation in the spec.

@murata2makoto
Copy link
Contributor

@mattgarrish

But why are we putting this requirement on authors and not reading systems? It's a reading system optimization pushed onto authors to carry out when a developer could suppress any resolution of DTD references. Isn't that pretty standard anyway, as it's been a while since I processed XML documents but W3C used to blacklist you if you tried to continuously resolve their DTDs.

I might have agreed if this proposal was made when EPUB 3.0 was being designed. But the charter for EPUB 3.3 does not allow any changes that might destroy existing reading systems. If this DOCTYPE statement occurs in an SVG document, some existing EPUB 3 reading systems might try to fetch an external DTD subset. This causes network access, which is slow or unavailable.

@murata2makoto
Copy link
Contributor

@iherman

In your comment in w3c/epubcheck#1114 (comment) you seem to say that there is no such problem, that content with such DTD is o.k., and the bug is by epubcheck. In which case I believe we have to add some explanation in the spec.

My comment might not be clear. But I'm saying that the DOCTYPE statement in SVG must not be allowed by EPUB 3.3.

@mattgarrish
Copy link
Member

If this DOCTYPE statement occurs in an SVG document, some existing EPUB 3 reading systems might try to fetch an external DTD subset.

But the only thing that stops this from happening now is the idea that only perfectly valid publications are ever loaded into reading systems. That seems unlikely given that there are often ways to side-load content.

I agree we shouldn't cause backwards incompatibilities, but It would be interesting to survey whether moving the requirement to reading systems actually changes the way any have been implemented.

@iherman
Copy link
Member Author

iherman commented Oct 12, 2020

@murata2makoto the backward compatibility issue is something we should check. You say:

If this DOCTYPE statement occurs in an SVG document, some existing EPUB 3 reading systems might try to fetch an external DTD subset. This causes network access, which is slow or unavailable.

and I am not sure such reading system actually exist. What I would think is that reading systems, so far, did not care about the DTD-s (surely all RS-s that relied on one of the web rendering engines) and this whole issue came to the fore because epubcheck reports this issue although it did not care about it before. In other words, I would expect there were a number of, technically, invalid EPUB content in circulation that were never flagged as such and that become invalid now.

In other words, it may well be that the practical deployment of EPUB content as well as RS-s were actually both ignoring a constraint in the document...

@murata2makoto
Copy link
Contributor

@iherman

I am not sure such reading system actually exist.

It may well exist. If my memory is correct, a key member of IDPF argued that the DOCTYPE statement for SVG must not be allowed for this reason. This happened during the development of EPUB3.

@llemeurfr
Copy link

llemeurfr commented Oct 12, 2020 via email

@murata2makoto
Copy link
Contributor

@llemeurfr

Sorry. That is irrelevant. Surely, valid SVG documents will be correctly handled by RSs that do not validate SVG. I'm saying that invalid SVG might be handled very differently by existing RSs.

@iherman
Copy link
Member Author

iherman commented Oct 13, 2020

I have just realized today that epubcheck also refuses an SVG file (ie, an XML file) which starts with the standard

<?xml version="1.0" encoding="UTF-8"?>

Is that indeed forbidden by the standard? Or is it an epubcheck issue?

@dauwhe
Copy link
Contributor

dauwhe commented Oct 13, 2020

I have just realized today that epubcheck also refuses an SVG file (ie, an XML file) which starts with the standard

<?xml version="1.0" encoding="UTF-8"?>

Is that indeed forbidden by the standard? Or is it an epubcheck issue?

What's the error message? This seems absurd.

@iherman
Copy link
Member Author

iherman commented Oct 13, 2020

That it is invalid. But, if this is not an EPUB restriction then I should bring it to the epubcheck issues. I just wanted to check...

@dauwhe
Copy link
Contributor

dauwhe commented Oct 13, 2020

That it is invalid. But, if this is not an EPUB restriction then I should bring it to the epubcheck issues. I just wanted to check...

How are you using the SVG? This SVG image validates fine in an EPUB (it's used as img src)

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 15.1.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->

<svg version="1.1" id="box" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
	 width="108px" height="108.391px" viewBox="0 0 108 108.391" enable-background="new 0 0 108 108.391" xml:space="preserve">
...

@mattgarrish
Copy link
Member

Ya, I suspect you have some other issue. The SVG samples validate fine with xml declarations. Are you sure there isn't a curly quote or something in your actual source?

There's definitely nothing in the spec that forbids an xml declaration, although they're generally pointless because version 1.0 and UTF-8 encoding are defaults.

@murata2makoto
Copy link
Contributor

BTW, non-validating XML processors might fetch external DTD subsets (and external parameter entities) for obtaining entity declarations (and telling which whitespace is significant). But I do not know how many such processors are used.

@iherman
Copy link
Member Author

iherman commented Oct 14, 2020

Ya, I suspect you have some other issue. The SVG samples validate fine with xml declarations. Are you sure there isn't a curly quote or something in your actual source?

There's definitely nothing in the spec that forbids an xml declaration, although they're generally pointless because version 1.0 and UTF-8 encoding are defaults.

The error report of epubcheck is as follows:

FATAL(RSC-016): epub-overview-33.epub/cover_image.svg(2,6): Fatal Error while parsing file: The processing instruction target matching "[xX][mM][lL]" is not allowed.
ERROR(RSC-005): epub-overview-33.epub/cover_image.svg(-1,-1): Error while parsing file: The processing instruction target matching "[xX][mM][lL]" is not allowed.

it is a stand-along SVG file, used as a cover. If I remove the processing instruction, then things are accepted.

I still wonder whether there is a problem somewhere in our spec that triggers this.

Cc: @rdeltour @danielweck

@mattgarrish
Copy link
Member

I still wonder whether there is a problem somewhere in our spec that triggers this.

No, you're violating XML rules. If you put an xml declaration in a file, you can't have any whitespace before it (only a BOM is allowed before). Is there an extra hard return before your declaration by any chance?

For reference, see the prolog requirements in the XML spec.

@iherman
Copy link
Member Author

iherman commented Oct 14, 2020

No, you're violating XML rules. If you put an xml declaration in a file, you can't have any whitespace before it (only a BOM is allowed before). Is there an extra hard return before your declaration by any chance?

Ouch! Touché :-) This was the problem. Sorry for the noise...

@rdeltour
Copy link
Member

I suspect that this issue never came up in EPUB land so far because Reading Systems tacitly accepted such SVG files (as the core Web engines accepted them). What brought it to the fore is the much stricter newer version of epubcheck that rejects those files. (This started the whole discussion in w3c/epubcheck#1114 after all...)

I cc @rdeltour explicitly here, to call on the epubcheck implementers to give their opinion.

I don't think this is is new in EPUBCheck (but haven't doubled checked with older versions). The conformance criteria dates back to before EPUB 3.2, and was already implemented in EPUBCheck if I remember correctly.

What EPUBCheck 4.2.x brought are schema changes, which are orthogonal to this specific issue.

@mattgarrish
Copy link
Member

mattgarrish commented Oct 14, 2020

I agree with @rdeltour here. Nothing has changed in a long time on this front. External identifiers are forbidden and have been since 3.0, so it's always been an issue. There really isn't a lot of ambiguity about what that means -- you can't reference external files, which would contain the external entities that would make some uses valid.

It's not been much of an issue as most people don't use SVG, and HTML has dropped the external subset from its default doctype so the discussion isn't even relevant to that format.

But in the interests of putting this issue to rest, I mostly agree with @llemeurfr comment in the other thread that we reformulate the restriction (but adding the RS ignoring is normative), so:

  • we add a requirement to ignore external subsets to the reading system specification
  • we change the authoring restriction to using entities declared in an external subset because reading systems will not resolve such entities

That would allow a doctype with an external identifier, but disallow content that relies on such.

@iherman
Copy link
Member Author

iherman commented Oct 15, 2020

@mattgarrish just to be sure I understand what you propose:

  • we add a requirement to ignore external subsets to the reading system specification

You mean we add an extra item in EPUB 3.3. RS saying that external subsets must simply be ignored for all XML content? (At this moment there is no counterpart for XML in that document)

  • we change the authoring restriction to using entities declared in an external subset because reading systems will not resolve such entities

When you say "change" you mean removing the second item in EPUB 3.3. Core, right?

That would allow a doctype with an external identifier, but disallow content that relies on such.

That sounds o.k. to me, but that does not answer #1338 (comment).

@mattgarrish
Copy link
Member

That sounds o.k. to me, but that does not answer #1338 (comment).

I'm not following. Why does it matter if the DTD makes use of entities, and how is that problematic for authoring?

If the entities only exist for parsing the DTD, and reading systems don't validate, then is the issue just theoretical?

@mattgarrish
Copy link
Member

mattgarrish commented Oct 25, 2020

Therefore a reading system does not "choose" to resolve entities or not.

I don't think we're that far apart. That's why I'm suggesting we only note this in a security considerations section without making normative statements that potentially aren't followable. I expect browser cores already do the right thing, but it is still a security consideration.

It's like banning external identifiers in authoring. It doesn't fully solve the problem, and any vendor could continue to ban these at ingestion if they really don't want them. But not using them is something we advise as a best practice (which could be part of where core media types go).

We're arguably making a lot out of an issue that's never been questioned as problematic. I'd be surprised if any appreciable number of publishers use/define entities in a DTD subset, let alone the external, and I doubt anyone will rush to begin if we remove the restriction.

@llemeurfr
Copy link

From the start of this thread, it seems that two different notions are conflated:

  1. external ids in doctype declarations, which are not dereferenced in a non-validating XML parser and therefore not harmful for RSes.
  2. external ids in external entities, which cause problems for RSes even if non-validating.

Or maybe we conflate "external identifier" and "external entity". And also "document type declaration" and "document type definition".

The current contraint in the spec (External identifiers MUST NOT appear in the document type declaration) forbids both. A sentence like "External entities MUST NOT be defined in a document type definition" would only forbid the second notion, which would be ok (especially is we also state the RS should/must not validate content)

@llemeurfr
Copy link

llemeurfr commented Oct 25, 2020

I'd be surprised if any appreciable number of publishers use/define entities in a DTD subset

Just a note: I tried to include internal entities in the html doctype of a resource using Sigil and BlueGriffonEPUB : Sigil made a mess of the XHTML doc at save time, and BlueGriffon made another type of mess. So I tend to agree with you : entities in content documents must be really rare.

@murata2makoto
Copy link
Contributor

murata2makoto commented Oct 25, 2020

@mattgarrish

I'm wondering if one way out of this particular problem is to move this to a reading system security section.

I strongly believe that requirements on data are always preferrable since data conformance is very enforceable (e.g., by epubcheck). Moving things to requirements on applications is always less satisfactory, although it is sometimes inevitable. We should first make clear what is allowed as an SVG or MathML document.

We're arguably making a lot out of an issue that's never been questioned as problematic. I'd be surprised if any appreciable number of publishers use/define entities in a DTD subset, let alone the external, and I doubt anyone will rush to begin if we remove the restriction.

Some publishers might if they have a reason. We shouldn't easily allow what is not needed. The currently proposed wording allows what is not needed.

@murata2makoto
Copy link
Contributor

@llemeurfr

From the start of this thread, it seems that two different notions are conflated:

  1. external ids in doctype declarations, which are not dereferenced in a non-validating XML parser and therefore not harmful for RSes.

External identifiers of external DTD subsets may well be referenced by a non-validating XML processor. This is one of my points. See Clause 5 of the XML Rec.

  1. external ids in external entities, which cause problems for RSes even if non-validating.

I do not see any reasons to allow such external (parsed) entities in the case of SVG.

Or maybe we conflate "external identifier" and "external entity". And also "document type declaration" and "document type definition".

Yes, some terms are not easy to use correctly. I checked the definitions of these terms for discussions around this issue.

The current contraint in the spec (External identifiers MUST NOT appear in the document type declaration) forbids both. A sentence like "External entities MUST NOT be defined in a document type definition" would only forbid the second notion, which would be ok (especially is we also state the RS should/must not validate content)

I would like to continue to disallow 2) in EPUB 3.3. I can live with small changes to 1) if they are needed to use the output of SVG editors.

@iherman
Copy link
Member Author

iherman commented Oct 26, 2020

This thread is becoming quite confusing (for me at least), and I would love to see it converge somehow. Also, as it was mentioned several times, many of the situations in the thread are hypothetical as far as current EPUB deployments are concerned, because very few of our authors make use explicitly of remotely complex XML facilities; the only real practical issue seems to be that some tools would dump some DTDs into their output that make content unusable. With the future push to move away from XML altogether (e.g., with SVG-in-HTML and/or SVG 2 coming to the fore, or if, either in this version of EPUB or the next one, we move towards HTML5) this issue may become increasingly moot. I.e., I am not sure we have to thrive for something that is complete and exhaustive from an XML point of view…

I have seen some concrete proposals in the thread (since the WG call); can we try to see if we can keep to one of them?

  1. The WG resolution:
    1. Remove bullet “External identifiers MUST NOT appear in the document type declaration [XML].”
    2. Add reading system conformance saying that reading systems SHOULD NOT validate against DTDs.
  2. Modification of the resolution, by @murata2makoto
    1. Remove bullet “External identifiers MUST NOT appear in the document type declaration [XML].”
    2. Add reading system conformance saying that reading systems SHOULD NOT retrieve external DTD subsets or external parameter entities.
  3. Slight modification of (2) above by @shiestyle in a separate sub-thread
    1. Remove bullet “External identifiers MUST NOT appear in the document type declaration [XML].”
    2. Add reading system conformance saying that reading systems SHOULD NOT validate against DTDs and SHOULD NOT retrieve external DTD subsets or external parameter entities.
  4. Modification of the resolution, by @mattgarrish
    1. Remove bullet “External identifiers MUST NOT appear in the document type declaration [XML].”
    2. Not mention the issue in the normative part of reading system conformance
    3. Add some text to the security considerations of the spec drawing attention to the danger of using any kind of external DTD-s or external parameter entities. (Possibly referring to RFC 7303, as proposed by @murata2makoto in a separate comment.)

Did I miss/misinterpret something here? Would it be possible to see if we can converge towards one of these approaches and solve this issue?

@iherman
Copy link
Member Author

iherman commented Oct 26, 2020

Notes on my previous comment

  • I think that, whatever we end up doing, we should mention the issue in the security section of the content spec, as proposed in (4)
  • Personally, and regardless of which formal resolution we take, I would also add an editorial note somewhere warning authors that using any kind of DTDs and/or XML entity definition is not recommended due to the uncertainty on whether the underlying rendering engine would handle it appropriately.

@murata2makoto
Copy link
Contributor

@iherman

can we try to see if we can keep to one of them?

I don't think that we are ready to do so.

Also, as it was mentioned several times, many of the situations in the thread are hypothetical as far as current EPUB deployments are concerned, because very few of our authors make use explicitly of remotely complex XML facilities

The door has been closed in the past. But the tentative resolution opens a can of worms. I don't think that we should open the can and hope no worms will cause any trouble.

I think that we should divide this issue into smaller ones and close one at a time. My four issues (#1354, #1355, #1357, and #1358) are intended to solve easier subissues.

@mattgarrish mattgarrish added Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation and removed Topic-ContentDocs The issue affects EPUB content documents labels Oct 26, 2020
@iherman
Copy link
Member Author

iherman commented Oct 26, 2020

I think that we should divide this issue into smaller ones and close one at a time. My four issues (#1354, #1355, #1357, and #1358) are intended to solve easier subissues.

I am not sure I agree and I am afraid that this issue may get out of hand. Those issues try to define a class of XML files that are acceptable for EPUB, looking at variations of possibilities with DTDs, entities, etc, that the XML specification offer. This is inherently a complex task. The question we have to ask ourselves is: who would really care? Is there really a user need behind these? All these possibilities around SVG would be unused by any SVG authoring tool I know, and I do not think that will change. The same would hold for SMIL or MathML, or other XML vocabularies we may care about. As I said above, with the future push to move away from XML altogether (e.g., with SVG-in-HTML and/or SVG 2 coming to the fore, or if, either in this version of EPUB or the next one, we move towards HTML5) this issue may become increasingly moot.

The door has been closed in the past. But the tentative resolution opens a can of worms. I don't think that we should open the can and hope no worms will cause any trouble.

I agree with your concern. So maybe we want to go back to the original issue an stop there. Here is a proposal:

The specification text should say

  • External identifiers MUST NOT appear in the document type declaration [XML], except for the identifiers defined explicitly for W3C specifications like SVG 1.1, MathML 1.1, SMIL, or SSML [ref]; these DTD-s should be ignored.

We may want to add the XHTML1.1 DTD-s tot the lot, too, but not necessarily.

That is it. We do not really open the door, we just acknowledge the reality for our usage of W3C Recommendations.

(We should be careful about the “Perfect is the enemy of good” effect…)

@murata2makoto
Copy link
Contributor

@iherman

My #1358 is a small step for achieving what you sketched at the end of your message.

@mattgarrish
Copy link
Member

Is there really a user need behind these?

I find it doubtful. No offense, as I grew up on SGML, but we're partying like it's 1999 in this thread... :)

XSLT is the the more usual language of choice these days if publishers want to insert boilerplate, repeated strings, etc.

Your proposal seems similar to where we were somewhere way back in this thread (although I'm not sure how we disagreed on the wording and I'm not going to try and locate why). External identifiers are disallowed except for a controlled list and reading systems ignore (that has to be separately in the RS spec, though).

Works for me.

@iherman
Copy link
Member Author

iherman commented Oct 26, 2020

I have just realized that we should not forget to add the ONIX DTD (or DTDs?) to our controlled list:

<!DOCTYPE ONIXMessage SYSTEM "http://www.editeur.org/onix/2.1/02/reference/onix-international.dtd">

after all, it should be possible to add the ONIX metadata as part of the package...

@iherman
Copy link
Member Author

iherman commented Oct 26, 2020

Your proposal seems similar to where we were somewhere way back in this thread

yep, we made a big circle...

@mattgarrish
Copy link
Member

I have just realized that we should not forget to add the ONIX DTD (or DTDs?) to our controlled list:

Maybe no harm, but not sure how critical it is. My understanding is they're pushing to move everyone to 3.0 now and 3.0 records aren't supposed to be distributed with a doctype.

I don't believe ONIX records actually end up in EPUBs all that often, either.

@iherman
Copy link
Member Author

iherman commented Oct 26, 2020

To be honest, I do not know. SOmething to be checked, possibly with Graham...

@clapierre
Copy link
Contributor

ONIX files are out-of-band metadata that publishers send out separate from the EPUB. a lot of times aggregators and distributors get only the ONIX files to determine what EPUBs they want. Also publishers can send these ONIX files with updates say pricing changes etc. without updating the EPUB itself.

@iherman
Copy link
Member Author

iherman commented Oct 26, 2020

If we go down the road of listing some 'acceptable' XML DTD-s (see #1338 (comment)) then adding an ONIX DTD is painless. I realize that it does not happen often, but we should not make it impossible to add an ONIX file into the EPUB container...

@murata2makoto
Copy link
Contributor

I just created #1369, #1370, #1371, and #1372. Hopefully, all sub-issues are easier

@murata2makoto
Copy link
Contributor

Created another issue (#1373) for ONIX.

@avneeshsingh
Copy link

+1 to have the solution that works for ONIX also. Most of times ONIX is external metadata but during ISO process of EPUB Accessibility, we observed interest in having ONIX accessibility metadata also as internal metadata.

@murata2makoto
Copy link
Contributor

murata2makoto commented Oct 28, 2020

This article is interesting. It mentions a massive DDoS attack on the W3C web site all the time.

@iherman
Copy link
Member Author

iherman commented Nov 6, 2020

This issue was discussed in a meeting.

  • RESOLVED: Merge PR #1368 to address outstanding DTD issues, and close GH issues 1369-1373
View the transcript Wendy Reid: we had resolutions at the F2F, and further discussions on github
… and came to a happy place
Matt Garrish: #1368
Matt Garrish: where we ended up was…
… we put in an allowance for a specific set of external identifiers that we have put in an appendix
… we have SVG and MathML that are allowed to be used in content docs or in separate files
… and we made a restriction against external entities in the internal DTD subset
… so it prevents some security issues but eases authoring
… so we’ll no longer force people to remove SVG DTDs from tool-generated files
… I’m hoping this is it :)
Ivan Herman: tech comment
… in fact, the changes are such that
… makes possible something that I’m not sure we really use
… I can define as part of an internal entity something that won’t go out to the network
… I’m not sure if this feature is in use
… formal comment
… there was a formal resolution on the previous version; this PR slightly changes that
… can we get a formal resolution to merge, and also close a bunch of issues which were examples of the problem?
Proposed resolution: Merge PR #1368 to address outstanding DTD issues, and close GH issues 1369-1373 (Wendy Reid)
Garth Conboy: +1
Matt Garrish: +1
Ivan Herman: +1
Charles LaPierre: +1
Matthew Chan: +1
Wendy Reid: +1
Brady Duga: +1
George Kerscher: +1
Laura Brady: +1
Bill Kasdorf: +1
Ben Schroeter: +1
Resolution #1: Merge PR #1368 to address outstanding DTD issues, and close GH issues 1369-1373

@mattgarrish mattgarrish added the EPUB33 Issues addressed in the EPUB 3.3 revision label Nov 9, 2020
@mattgarrish mattgarrish added the Topic-XML The issue affects XML processing label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation Topic-XML The issue affects XML processing
Projects
None yet
Development

Successfully merging a pull request may close this issue.