New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML5/XHTML5 syntax requirement #87

Closed
GoogleCodeExporter opened this Issue Mar 24, 2015 · 10 comments

Comments

Projects
None yet
1 participant
@GoogleCodeExporter
Copy link

GoogleCodeExporter commented Mar 24, 2015

EPUB Content Documents 3.0, Draft at 15 Feb 2011, section 2.1.1:

    > An XHTML Content Document must meet all of the following 
    > criteria:
    > ...
    > It must meet the conformance constraints for XML documents
    > defined in XML Document Content Conformance [Publications30]
    >
    > It must use the XHTML syntax [HTML5].

One of the headline features for EPUB 3 is that it embraces HTML5. One of the 
headline features for HTML5 is that it offers a much more permissive syntax. It 
is never clear from the specification (at least in my reading) why the 
strictness of XHTML is demanded, despite the HTML5 claim.

This is a small thing, but HTML5 parsers are now commonplace, so I can't 
foresee any great strain on Reading System developers. HTML5 treats MathML and 
SVG as more or less native, meaning XHTML is not required to deploy them as 
inline child documents.

The cost of this strictness is a much higher barrier to entry for ebook 
creators. I'm willing to be told there are benefits, but are they worth it? The 
regular implication in the spec that only an authoring tool can produce a valid 
EPUB strikes me as sad.

Original issue reported on code.google.com by joseph%i...@gtempaccount.com on 21 Feb 2011 at 12:10

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

I'm also curious if this should be reconsidered in light of the (early look) at 
the results of the W3C TAG's HTML/XML Task Force, which may find that many XML 
environments could process HTML5 using a combination of an HTML5 parsing front 
end that spoke an input to an XML processing model (these already exist).

See, for example: http://norman.walsh.name/2011/02/08/html-xml.

Original comment by abdela...@gmail.com on 21 Feb 2011 at 12:17

  • Added labels: Type-ReviewComment
@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

XHTML strictness allows us to use ePub files for archiving purposes. Strictness 
also makes display/design issues easier to track down and is an appropriate 
standard of quality for book content, IMHO. Quality of eBooks is bad enough 
without going backwards on coding quality.

Original comment by jos...@ebookarchitects.com on 21 Feb 2011 at 1:10

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

I don't understand the "archiving purposes" point. You're welcome to be as 
strict in your own code as you like. Inflicting it on others reeks of 
sanctimony, however.

I would contend that the general quality of ebooks is bad because the barriers 
to entry are too high, not too low. Excellent web designers look at EPUB and 
decide it's not worth the effort.

Original comment by joseph%i...@gtempaccount.com on 21 Feb 2011 at 1:19

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

We should all keep in mind that the primary reason for the WGs decision to stay 
with the XML serialization in this version of EPUB was reading system forwards 
compatibility; our general design principle that deployed EPUB2 reading systems 
should be able to open EPUB3 content without failing [miserably]. (Note the 
intentional use of "open" -- we are not saying "be able to render the content 
100% according to the authors intent", as there are multiple new features in 
EPUB3 that EPUB2 reading systems wont grok, such as for example MathML embedded 
without ops:switch.)

While I don't possess complete statistics, we have been informed by WG members 
that a substantive number of deployed systems (such as those based on Adobe 
RMSDK IIRC) require XML wellformedness at this time.

(As for abdelazers comment: note also that the existence of html-to-sax 
libraries such as the one by Henri Sivonen serves authoring tool implementors 
well straight away, as it allows authors to use the HTML syntax during 
authoring if they so wish: serializing to the XHTML syntax can be done late, at 
"compile time".)


Original comment by markus.g...@gmail.com on 21 Feb 2011 at 10:46

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

Oh, I wasn't aware that forward compatibility was a design principle.

What are the implications of this principle for this line (from OPF 2.0.1 
s2.4.1.2):

    OPS Publications must include an NCX.

Admittedly that's somewhat off-topic. I can raise this as a separate issue if 
you prefer.

Original comment by josephpe...@gmail.com on 21 Feb 2011 at 10:07

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

>OPS Publications must include an NCX.
>Admittedly that's somewhat off-topic. I can raise this 
>as a separate issue if you prefer.

Fortunately, this is a different case since so far - and as opposed to the HTML 
serialization issue - no manufacturer of a widely deployed reading system has 
stepped forward to say that the absence of the NCX would cause their 2.0 
systems to fail catastrophically. If and only if somebody does step forward 
with such information will we have to revisit the nature of the NCX-to-XHTML 
Navigation Document migration as it stands in the current spec draft.

(Note also that in 2.0.1 support for NCX in reading systems was optional 
(although content had to contain it); and not all reading systems implemented 
it. This changes in 3.0: XHTML Navigation Content Documents must be provided in 
content, and must be supported by reading systems.)

If you believe that support for text/html in content is important (and note 
again, there are obviously no constraints on what serialization is used at 
authoring time) it might be more fruitful to propose that the spec include a 
forward-looking statement (along the lines of "future major revisions may also 
allow the html serialization"). Statements like that have occurred before (such 
as the one about nav support becoming required for reading systems [1]). 
Whether or not the WG will support such a statement being added I cannot tell, 
but there's nothing stopping it from being proposed.

[1] para 5 in http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.4.1.2

Original comment by markus.g...@gmail.com on 23 Feb 2011 at 2:10

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

Yes, I think EPUB3 should maintain the requirement for the XHTML
serialization of HTML.

In the broader web, there are significant legacy requirements for
the non-XML syntax, that is not the case for EPUB. Further, while
it may eventually be the case that error-correcting HTML5 parsers
are as cheap and common as XML parsers, that is not the case
today.

I think removing this restriction would impose a significant and
unnecessary burden on EPUB consumers.

Original comment by normanwalsh on 23 Feb 2011 at 3:15

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015



This idea — that EPUBs are crafted by saintly scribes in high places of 
knowledge far from the hoi-polloi of the anything-goes web — is one of the 
more pervasive and wrongheaded beliefs to affect this exercise.

To get good designers, let their expertise be transferable. To encourage 
experimentation by new entrants, push for alignment with the various tutorials 
and snippets of instruction a Google search or an O'Reilly book teaches. You 
shouldn't continue to enshrine an arcane and steadily abandoned set of markup 
rules just because you fear the wrath of one Mr Sorotokin.

Forward compatibility is an inhibiting and complicating goal. But EPUB 2 
support in Reading Systems isn't going away. Summon the nerve to stick up for 
content authors.

Original comment by joseph%i...@gtempaccount.com on 23 Feb 2011 at 7:56

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

Original comment by markus.g...@gmail.com on 18 Mar 2011 at 10:18

  • Changed state: Acknowledged
@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Mar 24, 2015

Closing this issue; the WG is not revisiting its decision to stay with the 
XHTML syntax in this version of EPUB.

Original comment by markus.g...@gmail.com on 26 Apr 2011 at 7:22

  • Changed state: Answered
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment