WebIDL update #285

iherman · 2018-07-30T12:58:21Z

My first attempt to bring the WebIDL part in sync with the current manifest. Some of the significant changes (some of those may be subject to further discussions!)

The previous IDL referred to "title" as an array of strings, which was correct. I have made a slight update on the manifest definition emphasizing the fact that the 'name' value may be an array, too.
The previous IDL included a "children" tag, which did not have a direct counterpart in the manifest. I believe the main reason for its presence was to handle hierarchical TOC-s. However, the manifest as of today extracts/contains a link to a TOC, itself encoded in HTML, and not the list of resources themselves. Of course, a user agent may want to extract those (or may choose to simply display the HTML) but I am not sure it is a matter of the IDL to have this.
I was wondering about having a single "Contributors" dictionary item for all the authors, editors, etc., with an extra flag providing the categorization, but I thought spelling things out, ie, reflecting what is in the manifest, is cleaner...

There are also some other, minor changes, mostly editorial.

See also #284

Preview | Diff

llemeurfr · 2018-07-30T13:05:27Z

@iherman re. title, an array of arrays is a serious modification of the model. What makes you think that this is needed?

The use cases I know of are:

alternative languages
title + subtitle
alternate wording for search.

title is not the same as e.g. contributor, where each item is a different entity.

iherman · 2018-07-30T13:09:35Z

@iherman re. title, an array of arrays is a serious modification of the model. What makes you think that this is needed?

I mean "array of strings", not array of arrays... To be clear, it should be possible to say:

"name" : ["War and Peace", "Guerre et Paix"]

HadrienGardeur · 2018-07-31T17:43:27Z

webidl/manifest.webidl

+             sequence<PublicationLink>          resources;
+             sequence<PublicationLink>          links;
+
+             DOMString                          accessibilityReport;


Why do we need a specific string for that? It should be already covered by links/resources.

HadrienGardeur · 2018-07-31T17:44:07Z

webidl/manifest.webidl

+             sequence<PublicationLink>          links;
+
+             DOMString                          accessibilityReport;
+             DOMString                           privacyPolicy;


Same comment, should already be covered by links/resources.

HadrienGardeur · 2018-07-31T17:44:13Z

webidl/manifest.webidl

+
+             DOMString                          accessibilityReport;
+             DOMString                           privacyPolicy;
+             sequence<DOMString>                cover;


Same comment, should already be covered by links/resources.

HadrienGardeur · 2018-07-31T17:46:15Z

webidl/publication_link.webidl

-             DOMString                  encodingFormat;
-             DOMString                  name;
-             sequence<DOMString>        rel;
-             sequence<PublicationLink>  children;


A UA will need to parse the HTML containing the ToC to extract it. I think that there's no reason to use a separate structure for handling the ToC when the WebIDL can play that role as well.

HadrienGardeur · 2018-07-31T17:46:55Z

webidl/publication_link.webidl

-             sequence<PublicationLink>  children;
+    required DOMString           url;
+             DOMString           encodingFormat;
+             DOMString           name;


This means that name won't be localizable.

Changed in aefe40f

HadrienGardeur · 2018-07-31T17:49:33Z

webidl/manifest.webidl

+             DOMString                          accessibilityAPI;
+             DOMString                          accessibilityControl;
+             DOMString                          accessibilityFeature;
+             DOMString                          accessibilityHazard;


This is expressed using an array of strings on the example available at schema.org: https://schema.org/accessibilityFeature

Changed in aefe40f

HadrienGardeur · 2018-07-31T17:50:41Z

webidl/manifest.webidl

+    required DOMString                          url;
+    required DOMString                          type;
+
+             DOMString                          accessMode;


This is expressed using an array of strings on the example available at schema.org: https://schema.org/accessMode

Changed in aefe40f

HadrienGardeur · 2018-07-31T17:50:48Z

webidl/manifest.webidl

+    required DOMString                          type;
+
+             DOMString                          accessMode;
+             DOMString                          accessModeSufficient;


This is expressed using an array of strings on the example available at schema.org: https://schema.org/accessMode

Changed in aefe40f

HadrienGardeur · 2018-07-31T17:51:41Z

webidl/organization.webidl

+
+dictionary Organization {
+    required sequence<LocalizableString> name;
+             DOMString                   id;


We need an example for this in the spec.

Added in aefe40f

HadrienGardeur · 2018-07-31T17:51:47Z

webidl/person.webidl

+
+dictionary Person {
+    required sequence<LocalizableString> name;
+             DOMString                   id;


We need an example for this in the spec.

Added in aefe40f

HadrienGardeur · 2018-07-31T17:54:14Z

"name" : ["War and Peace", "Guerre et Paix"]

@iherman I find this example quite confusing and would recommend that we limit name to:

string
object
array of object

Whenever an object is used, both @value and @language should be required.

iherman · 2018-08-01T05:19:27Z

@HadrienGardeur the example in #285 (comment) is indeed confusing, or more exactly would reflect bad authoring practice. Indeed, if an author want to add multilingual title, then a more correct example would have been, for example:

"name": [
   "War and Peace",
   {
        "@value": "Guerre et Paix",
        "@language": "fr"
    }]

(Provided english is set somewhere as a default language.)

But that example was just to illustrate one point for @llemeurfr, namely that it does make sense to have an array as a value for name. These details were not really relevant...

That being said, on your comment on the comment, I believe

"name" : { "@value" : "War and Peace" }

should be acceptable, too. This is a perfectly valid JSON-LD, actually accepted by schema.org processors, too; furthermore, all strings are converted into this idiom by the JSON-LD expansion algorithm that may be used along the line. We cannot invalidate that. We can recommend to use either a string or an object with a language, but we should not require the presence of language.

(B.t.w., just for the good order, if we want to discuss this further, we should move it into a separate issue. This is not an issue that should validate or invalidate the WebIDL mapping, which is the issue we are discussing here.)

llemeurfr · 2018-08-01T07:07:14Z

@iherman we are touching there again on a very important issue, that cannot be discussed in this PR, I agree: are we creating a sort of "semantic metadata framework", in which authors can generate flexible data given their intimate knowledge of JSON-LD? or are we creating a interchange format as a JSON structure, that authors and UA must closely follow, validated by a JSON schema, but which "by design" is JSON-LD compliant?

I'll open a specific issue for that.

HadrienGardeur · 2018-08-01T09:51:20Z

@iherman I just saw your latest commit. For localizable strings, I think that we actually need a sequence of them since a string can be localized in multiple languages.

iherman · 2018-08-01T09:52:55Z

@HadrienGardeur

I have made the changes except for the introduction of accessibilityReport, privacyPolicy, cover, and toc.

You are absolutely right that those are extracted from resources or links, and that is how the manifest is defined. However, as the infoset calls this out explicitly, I felt that the WebIDL should reflect, as much as possible, the infoset, too, and that is why I included them there. I do not feel very strongly about it, but I think that it makes the relationship to the infoset more explicit.

See what others feel about this...

HadrienGardeur · 2018-08-01T10:19:41Z

@iherman even if we keep separate IDL elements for cover, privacyPolicy and accessibilityReport, these can't be represented as a string or a sequence of strings.

Since we would be duplicating information from links, resources or readingOrder, we should at least be consistent with how they're expressed and use a PublicationLink or a sequence of PublicationLink for those.

I think that the toc is unique, in the sense that the UA will have to fetch and parse HTML to properly populate this info. I know that you have your doubts about that, but this is definitely something that every UA will need to do and I don't see any good reason why it can't be included in the same IDL.

iherman · 2018-08-01T11:58:58Z

Since we would be duplicating information from links, resources or readingOrder, we should at least be consistent with how they're expressed and use a PublicationLink or a sequence of PublicationLink for those.

You are right. That is a bug; this is necessary for things like using the description of a cover image as an alternate text. I will take care of this soon.

The problem with toc: we then have to spec exactly how the structure of the toc should be expressed in HTML, mainly when it comes to hierarchical toc-s.

HadrienGardeur · 2018-08-01T13:15:10Z

The problem with toc: we then have to spec exactly how the structure of the toc should be expressed in HTML, mainly when it comes to hierarchical toc-s.

@iherman this is IMO unavoidable.

llemeurfr · 2018-08-01T13:18:34Z

@iherman, if not, UAs won't be able to do anything interesting with the HTML ToC.
The alternative we proposed was a predefined machine readable json ToC but it was dismissed by the group.

TzviyaSiegman · 2018-08-01T15:57:28Z

We could consider restricting the way that the HTML ToC can ve written as is done in EPUB. See https://w3c.github.io/publ-epub-revision/epub32/spec/epub-packages.html#sec-package-nav-def.

iherman · 2018-08-01T17:11:49Z

@TzviyaSiegman @HadrienGardeur that works for me, if the community is (and has been) fine with it).

@mattgarrish will have a pleasure reproducing his own text in the new draft;-)

Back to the original item fir this PR, I guess the WebIDL would have to be modified by

re-introducing the child reference in the PublicationLink
the TOC is a sequence of Publication links.

mattgarrish · 2018-08-01T17:27:06Z

Funny, I was just reading the nav restrictions again while reviewing 3.2. There have to be ways we can loosen the restrictions and still have parseable data (e.g., ol or ul as the list type, even if semantic purity is lost, treating text outside of a tags as decorative and ignorable, etc.).

TzviyaSiegman · 2018-08-01T17:40:39Z

There have to be ways we can loosen the restrictions and still have parseable data (e.g., ol or ul as the list type, even if semantic purity is lost, treating text outside of a tags as decorative and ignorable, etc.).

This seems like a good idea to me. Provide direction to machines, not restrictions to humans.

dauwhe · 2018-08-01T17:43:33Z

If we start profiling HTML in web publications, we will likely alienate the browser community as well as massively confuse authors.

danielweck · 2018-08-01T17:50:19Z

@iherman

"The problem with toc: we then have to spec exactly how the structure of the toc should be expressed in HTML, mainly when it comes to hierarchical toc-s."

@HadrienGardeur

"this is IMO unavoidable."

@llemeurfr

"if not, UAs won't be able to do anything interesting with the HTML ToC. The alternative we proposed was a predefined machine readable json ToC but it was dismissed by the group."

@TzviyaSiegman

"We could consider restricting the way that the HTML ToC can ve written as is done in EPUB."

We've been down that rabbit hole before :)

I do not know how widespread this criticism of EPUB3 Navigation Document is, but here is what I heard on numerous occasions:

The fact that the NavDoc is a "microdata" model restriction on what would otherwise be totally arbitrary HTML markup has forced (some) content creators to:

(1) exclude the NavDoc from the spine reading order / list of render-able documents (i.e. linear or non-linear resources), effectively treating the mandatory unique NavDoc HTML only as a machine-readable data source.
(2) author a separate HTML document (unfortunately not marked with any sort of "NavDoc" metadata, so not reliably discoverable by reading systems) in order to gain full markup flexibility and to provide fancy layout / styling (e.g. SVG instead of HTML lists).

This simply reflects the fact that (some) content creators prefer ; at the cost of some degree of duplication ; a clear distinction between:

optimized, machine-readable data structure designed to be ingested by a reading system / user agent, with the expectation that it will be rendered in an arbitrary, application-specific fashion.
authored web document (custom structure and semantics, desired styling), with the expectation that it will be faithfully rendered by a web browser (or an web browser engine / webview component inside a reading system).

This dichotomy could be expressed in Web Publications by having a separate machine-readable Table Of Contents in the JSON serialization of the manifest infoset, but we excluded this option (if I remember correctly, because of duplication, and also due to HTML offering better string localization than JSON).

I am not suggesting we should revisit this decision, just making observations about how the restricted EPUB3 NavDoc markup syntax was received in the "real world".

TzviyaSiegman · 2018-08-01T18:24:37Z

@danielweck Agreed. I do think we need to look at the way users approach the ToC, not the way authors and publishers believe that users approach the ToC.

I have not done formal testing, but as a user I always use to machine-rendered ToC because that is what is accessible from all locations in the publication (the always-present hamburger button). I think @mattgarrish's proposal of instructing UAs to disregard unneeded information is a good idea.

mattgarrish · 2018-08-01T23:16:21Z

Is it correct that the table of contents pointed to from the manifest doesn't have to be visible in the page?

This would cover the case of wanting to provide a table of contents but not want it explicitly in the reading order (probably rare, but EPUB doesn't require navigation in spine).

More to the point, though, it could also potentially address Daniel's concern of allowing two tables of contents if whatever parsing algorithm we come up with wouldn't produce a desirable result. I'm hoping we can avoid duplication, but having a plan for the inevitable is also a wise idea.

iherman · 2018-08-02T04:58:06Z

(Purely administrative comment)

It seems that the TOC issue is much more complicated, and I am afraid of a PR staying hanging and becoming problematic to merge if other changes come to the fore. May I suggest:

In the case the refreshed WebIDL is acceptable for everyone involved except for the TOC item (which is, currently, just a simple pointer to a TOC somewhere else), let us merge what we have
We open a separate issue on the details of the TOC, knowing that a decision may affect the WebIDL later

@HadrienGardeur @mattgarrish @TzviyaSiegman @llemeurfr ?

laudrain · 2018-08-02T07:06:32Z

May I recall here the Toronto F2F resolution on TOC:

TOC should be present

if present, it MUST be an element with doc-toc role pointed to from manifest

(likely) element in landing page

could elsewhere

In publishing "real world" we need both options Daniel described

a clear distinction between:

optimized, machine-readable data structure designed to be ingested by a reading system / user agent, with the expectation that it will be rendered in an arbitrary, application-specific fashion.

authored web document (custom structure and semantics, desired styling), with the expectation that it will be faithfully rendered by a web browser (or an web browser engine / webview component inside a reading system).

If we can have in the same HTML document, a proper structure and styling for TOC AND a data structure for a11y with doc-toc role attribute, it could avoid duplication.
But I have no problem with duplication to achieve these 2 functions and will not give up on any of the 2.

mattgarrish · 2018-08-02T21:17:19Z

But I have no problem with duplication to achieve these 2 functions and will not give up on any of the 2.

I don't think anything prevents you from creating as many tables of contents as your heart desires. You just point the manifest to whatever one is best for machine processing.

My one concern is that this probably forces a separate document unless both the machine and human versions can exist in the same file without the user encountering both, which is why I wonder if one can be suppressed visually without affecting its parsing (I'm not sure why not, but am just curious).

mattgarrish · 2018-08-02T21:18:31Z

But how did we get on to this in a PR? Where are we in terms of integrating the changes?

iherman · 2018-08-03T04:56:39Z

@mattgarrish

But how did we get on to this in a PR? Where are we in terms of integrating the changes?

At the moment, the WebIDL entry for a TOC is simply a reference to the TOC, wherever it is. If we had a clear specification of the TOC in terms of HTML, this could be replaced by some JSON representation of of the TOC instead.

Hence my proposal to merge this PR as is (unless there are other issues which, at this moment, there aren't any more) and move this discussion to a specific issue.

laudrain · 2018-08-03T06:51:40Z

@mattgarrish Toronto F2F resolution made me hope that it could only one document, particularly in that sentence:

if present, it MUST be an element with doc-toc role pointed to from manifest

So then I don't think there an issue with this. I agree with @iherman to merge this PR as is

iherman · 2018-08-03T10:18:30Z

@mattgarrish are you o.k. merging?

mattgarrish · 2018-08-03T10:59:12Z

Yes, I'm fine with this PR as far as it represents where we are right now. We can re-litigate the toc elsewhere.

iherman added 2 commits July 30, 2018 14:24

First version updating the WebIDL-s

eaabbea

Ready for PR

da35c1e

iherman requested review from HadrienGardeur and mattgarrish July 30, 2018 12:58

HadrienGardeur suggested changes Jul 31, 2018

View reviewed changes

llemeurfr mentioned this pull request Aug 1, 2018

Manifest: are we creating a JSON-LD metadata framework or a JSON structure? #288

Closed

Handling some of the issues raised by @HadrienGardeur

aefe40f

Minor formatting error

bc9fc8f

changed the type of accessibility report, privacy policy, and covers

bcf7927

HadrienGardeur approved these changes Aug 3, 2018

View reviewed changes

iherman mentioned this pull request Aug 3, 2018

Do we need a more detailed definition for the HTML TOC format? #291

Closed

mattgarrish merged commit b276a48 into master Aug 3, 2018

mattgarrish deleted the webidl-update branch August 3, 2018 10:59

WebIDL update #285

WebIDL update #285

Conversation

iherman commented Jul 30, 2018 • edited by pr-preview bot Loading

llemeurfr commented Jul 30, 2018

iherman commented Jul 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HadrienGardeur commented Jul 31, 2018

iherman commented Aug 1, 2018

llemeurfr commented Aug 1, 2018

HadrienGardeur commented Aug 1, 2018

iherman commented Aug 1, 2018

HadrienGardeur commented Aug 1, 2018

iherman commented Aug 1, 2018

HadrienGardeur commented Aug 1, 2018

llemeurfr commented Aug 1, 2018

TzviyaSiegman commented Aug 1, 2018

iherman commented Aug 1, 2018

mattgarrish commented Aug 1, 2018

TzviyaSiegman commented Aug 1, 2018

dauwhe commented Aug 1, 2018

danielweck commented Aug 1, 2018 • edited Loading

TzviyaSiegman commented Aug 1, 2018

mattgarrish commented Aug 1, 2018

iherman commented Aug 2, 2018 • edited Loading

laudrain commented Aug 2, 2018

mattgarrish commented Aug 2, 2018

mattgarrish commented Aug 2, 2018

iherman commented Aug 3, 2018

laudrain commented Aug 3, 2018

iherman commented Aug 3, 2018

mattgarrish commented Aug 3, 2018

iherman commented Jul 30, 2018 •

edited by pr-preview bot

Loading

danielweck commented Aug 1, 2018 •

edited

Loading

iherman commented Aug 2, 2018 •

edited

Loading