Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebIDL update #285

Merged
merged 5 commits into from
Aug 3, 2018
Merged

WebIDL update #285

merged 5 commits into from
Aug 3, 2018

Conversation

iherman
Copy link
Member

@iherman iherman commented Jul 30, 2018

My first attempt to bring the WebIDL part in sync with the current manifest. Some of the significant changes (some of those may be subject to further discussions!)

  • The previous IDL referred to "title" as an array of strings, which was correct. I have made a slight update on the manifest definition emphasizing the fact that the 'name' value may be an array, too.
  • The previous IDL included a "children" tag, which did not have a direct counterpart in the manifest. I believe the main reason for its presence was to handle hierarchical TOC-s. However, the manifest as of today extracts/contains a link to a TOC, itself encoded in HTML, and not the list of resources themselves. Of course, a user agent may want to extract those (or may choose to simply display the HTML) but I am not sure it is a matter of the IDL to have this.
  • I was wondering about having a single "Contributors" dictionary item for all the authors, editors, etc., with an extra flag providing the categorization, but I thought spelling things out, ie, reflecting what is in the manifest, is cleaner...

There are also some other, minor changes, mostly editorial.

See also #284


Preview | Diff

@llemeurfr
Copy link
Contributor

@iherman re. title, an array of arrays is a serious modification of the model. What makes you think that this is needed?

The use cases I know of are:

  • alternative languages
  • title + subtitle
  • alternate wording for search.

title is not the same as e.g. contributor, where each item is a different entity.

@iherman
Copy link
Member Author

iherman commented Jul 30, 2018

@iherman re. title, an array of arrays is a serious modification of the model. What makes you think that this is needed?

I mean "array of strings", not array of arrays... To be clear, it should be possible to say:

"name" : ["War and Peace", "Guerre et Paix"]

sequence<PublicationLink> resources;
sequence<PublicationLink> links;

DOMString accessibilityReport;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a specific string for that? It should be already covered by links/resources.

sequence<PublicationLink> links;

DOMString accessibilityReport;
DOMString privacyPolicy;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, should already be covered by links/resources.


DOMString accessibilityReport;
DOMString privacyPolicy;
sequence<DOMString> cover;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, should already be covered by links/resources.

DOMString encodingFormat;
DOMString name;
sequence<DOMString> rel;
sequence<PublicationLink> children;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A UA will need to parse the HTML containing the ToC to extract it. I think that there's no reason to use a separate structure for handling the ToC when the WebIDL can play that role as well.

sequence<PublicationLink> children;
required DOMString url;
DOMString encodingFormat;
DOMString name;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that name won't be localizable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in aefe40f

DOMString accessibilityAPI;
DOMString accessibilityControl;
DOMString accessibilityFeature;
DOMString accessibilityHazard;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expressed using an array of strings on the example available at schema.org: https://schema.org/accessibilityFeature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in aefe40f

required DOMString url;
required DOMString type;

DOMString accessMode;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expressed using an array of strings on the example available at schema.org: https://schema.org/accessMode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in aefe40f

required DOMString type;

DOMString accessMode;
DOMString accessModeSufficient;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expressed using an array of strings on the example available at schema.org: https://schema.org/accessMode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in aefe40f


dictionary Organization {
required sequence<LocalizableString> name;
DOMString id;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an example for this in the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in aefe40f


dictionary Person {
required sequence<LocalizableString> name;
DOMString id;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an example for this in the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in aefe40f

@HadrienGardeur
Copy link

"name" : ["War and Peace", "Guerre et Paix"]

@iherman I find this example quite confusing and would recommend that we limit name to:

  • string
  • object
  • array of object

Whenever an object is used, both @value and @language should be required.

@iherman
Copy link
Member Author

iherman commented Aug 1, 2018

@HadrienGardeur the example in #285 (comment) is indeed confusing, or more exactly would reflect bad authoring practice. Indeed, if an author want to add multilingual title, then a more correct example would have been, for example:

"name": [
   "War and Peace",
   {
        "@value": "Guerre et Paix",
        "@language": "fr"
    }]

(Provided english is set somewhere as a default language.)

But that example was just to illustrate one point for @llemeurfr, namely that it does make sense to have an array as a value for name. These details were not really relevant...


That being said, on your comment on the comment, I believe

"name" : { "@value" : "War and Peace" }

should be acceptable, too. This is a perfectly valid JSON-LD, actually accepted by schema.org processors, too; furthermore, all strings are converted into this idiom by the JSON-LD expansion algorithm that may be used along the line. We cannot invalidate that. We can recommend to use either a string or an object with a language, but we should not require the presence of language.

(B.t.w., just for the good order, if we want to discuss this further, we should move it into a separate issue. This is not an issue that should validate or invalidate the WebIDL mapping, which is the issue we are discussing here.)

@llemeurfr
Copy link
Contributor

@iherman we are touching there again on a very important issue, that cannot be discussed in this PR, I agree: are we creating a sort of "semantic metadata framework", in which authors can generate flexible data given their intimate knowledge of JSON-LD? or are we creating a interchange format as a JSON structure, that authors and UA must closely follow, validated by a JSON schema, but which "by design" is JSON-LD compliant?

I'll open a specific issue for that.

@HadrienGardeur
Copy link

@iherman I just saw your latest commit. For localizable strings, I think that we actually need a sequence of them since a string can be localized in multiple languages.

@iherman
Copy link
Member Author

iherman commented Aug 1, 2018

@HadrienGardeur

I have made the changes except for the introduction of accessibilityReport, privacyPolicy, cover, and toc.

You are absolutely right that those are extracted from resources or links, and that is how the manifest is defined. However, as the infoset calls this out explicitly, I felt that the WebIDL should reflect, as much as possible, the infoset, too, and that is why I included them there. I do not feel very strongly about it, but I think that it makes the relationship to the infoset more explicit.

See what others feel about this...

@HadrienGardeur
Copy link

@iherman even if we keep separate IDL elements for cover, privacyPolicy and accessibilityReport, these can't be represented as a string or a sequence of strings.

Since we would be duplicating information from links, resources or readingOrder, we should at least be consistent with how they're expressed and use a PublicationLink or a sequence of PublicationLink for those.

I think that the toc is unique, in the sense that the UA will have to fetch and parse HTML to properly populate this info. I know that you have your doubts about that, but this is definitely something that every UA will need to do and I don't see any good reason why it can't be included in the same IDL.

@iherman
Copy link
Member Author

iherman commented Aug 1, 2018

Since we would be duplicating information from links, resources or readingOrder, we should at least be consistent with how they're expressed and use a PublicationLink or a sequence of PublicationLink for those.

You are right. That is a bug; this is necessary for things like using the description of a cover image as an alternate text. I will take care of this soon.

The problem with toc: we then have to spec exactly how the structure of the toc should be expressed in HTML, mainly when it comes to hierarchical toc-s.

@HadrienGardeur
Copy link

The problem with toc: we then have to spec exactly how the structure of the toc should be expressed in HTML, mainly when it comes to hierarchical toc-s.

@iherman this is IMO unavoidable.

@llemeurfr
Copy link
Contributor

@iherman, if not, UAs won't be able to do anything interesting with the HTML ToC.
The alternative we proposed was a predefined machine readable json ToC but it was dismissed by the group.

@TzviyaSiegman
Copy link
Contributor

We could consider restricting the way that the HTML ToC can ve written as is done in EPUB. See https://w3c.github.io/publ-epub-revision/epub32/spec/epub-packages.html#sec-package-nav-def.

@iherman
Copy link
Member Author

iherman commented Aug 1, 2018

@TzviyaSiegman @HadrienGardeur that works for me, if the community is (and has been) fine with it).

@mattgarrish will have a pleasure reproducing his own text in the new draft;-)

Back to the original item fir this PR, I guess the WebIDL would have to be modified by

  1. re-introducing the child reference in the PublicationLink
  2. the TOC is a sequence of Publication links.

@mattgarrish
Copy link
Member

Funny, I was just reading the nav restrictions again while reviewing 3.2. There have to be ways we can loosen the restrictions and still have parseable data (e.g., ol or ul as the list type, even if semantic purity is lost, treating text outside of a tags as decorative and ignorable, etc.).

@TzviyaSiegman
Copy link
Contributor

There have to be ways we can loosen the restrictions and still have parseable data (e.g., ol or ul as the list type, even if semantic purity is lost, treating text outside of a tags as decorative and ignorable, etc.).

This seems like a good idea to me. Provide direction to machines, not restrictions to humans.

@dauwhe
Copy link
Contributor

dauwhe commented Aug 1, 2018

If we start profiling HTML in web publications, we will likely alienate the browser community as well as massively confuse authors.

@danielweck
Copy link
Member

danielweck commented Aug 1, 2018

@iherman

"The problem with toc: we then have to spec exactly how the structure of the toc should be expressed in HTML, mainly when it comes to hierarchical toc-s."

@HadrienGardeur

"this is IMO unavoidable."

@llemeurfr

"if not, UAs won't be able to do anything interesting with the HTML ToC. The alternative we proposed was a predefined machine readable json ToC but it was dismissed by the group."

@TzviyaSiegman

"We could consider restricting the way that the HTML ToC can ve written as is done in EPUB."

We've been down that rabbit hole before :)

I do not know how widespread this criticism of EPUB3 Navigation Document is, but here is what I heard on numerous occasions:

The fact that the NavDoc is a "microdata" model restriction on what would otherwise be totally arbitrary HTML markup has forced (some) content creators to:

(1) exclude the NavDoc from the spine reading order / list of render-able documents (i.e. linear or non-linear resources), effectively treating the mandatory unique NavDoc HTML only as a machine-readable data source.
(2) author a separate HTML document (unfortunately not marked with any sort of "NavDoc" metadata, so not reliably discoverable by reading systems) in order to gain full markup flexibility and to provide fancy layout / styling (e.g. SVG instead of HTML lists).

This simply reflects the fact that (some) content creators prefer ; at the cost of some degree of duplication ; a clear distinction between:

  1. optimized, machine-readable data structure designed to be ingested by a reading system / user agent, with the expectation that it will be rendered in an arbitrary, application-specific fashion.
  2. authored web document (custom structure and semantics, desired styling), with the expectation that it will be faithfully rendered by a web browser (or an web browser engine / webview component inside a reading system).

This dichotomy could be expressed in Web Publications by having a separate machine-readable Table Of Contents in the JSON serialization of the manifest infoset, but we excluded this option (if I remember correctly, because of duplication, and also due to HTML offering better string localization than JSON).

I am not suggesting we should revisit this decision, just making observations about how the restricted EPUB3 NavDoc markup syntax was received in the "real world".

@TzviyaSiegman
Copy link
Contributor

@danielweck Agreed. I do think we need to look at the way users approach the ToC, not the way authors and publishers believe that users approach the ToC.

I have not done formal testing, but as a user I always use to machine-rendered ToC because that is what is accessible from all locations in the publication (the always-present hamburger button). I think @mattgarrish's proposal of instructing UAs to disregard unneeded information is a good idea.

@mattgarrish
Copy link
Member

Is it correct that the table of contents pointed to from the manifest doesn't have to be visible in the page?

This would cover the case of wanting to provide a table of contents but not want it explicitly in the reading order (probably rare, but EPUB doesn't require navigation in spine).

More to the point, though, it could also potentially address Daniel's concern of allowing two tables of contents if whatever parsing algorithm we come up with wouldn't produce a desirable result. I'm hoping we can avoid duplication, but having a plan for the inevitable is also a wise idea.

@iherman
Copy link
Member Author

iherman commented Aug 2, 2018

(Purely administrative comment)

It seems that the TOC issue is much more complicated, and I am afraid of a PR staying hanging and becoming problematic to merge if other changes come to the fore. May I suggest:

  • In the case the refreshed WebIDL is acceptable for everyone involved except for the TOC item (which is, currently, just a simple pointer to a TOC somewhere else), let us merge what we have
  • We open a separate issue on the details of the TOC, knowing that a decision may affect the WebIDL later

@HadrienGardeur @mattgarrish @TzviyaSiegman @llemeurfr ?

@laudrain
Copy link

laudrain commented Aug 2, 2018

May I recall here the Toronto F2F resolution on TOC:

TOC should be present

  • if present, it MUST be an element with doc-toc role pointed to from manifest
  • (likely) element in landing page
  • could elsewhere

In publishing "real world" we need both options Daniel described

a clear distinction between:

  1. optimized, machine-readable data structure designed to be ingested by a reading system / user agent, with the expectation that it will be rendered in an arbitrary, application-specific fashion.
  2. authored web document (custom structure and semantics, desired styling), with the expectation that it will be faithfully rendered by a web browser (or an web browser engine / webview component inside a reading system).

If we can have in the same HTML document, a proper structure and styling for TOC AND a data structure for a11y with doc-toc role attribute, it could avoid duplication.
But I have no problem with duplication to achieve these 2 functions and will not give up on any of the 2.

@mattgarrish
Copy link
Member

But I have no problem with duplication to achieve these 2 functions and will not give up on any of the 2.

I don't think anything prevents you from creating as many tables of contents as your heart desires. You just point the manifest to whatever one is best for machine processing.

My one concern is that this probably forces a separate document unless both the machine and human versions can exist in the same file without the user encountering both, which is why I wonder if one can be suppressed visually without affecting its parsing (I'm not sure why not, but am just curious).

@mattgarrish
Copy link
Member

But how did we get on to this in a PR? Where are we in terms of integrating the changes?

@iherman
Copy link
Member Author

iherman commented Aug 3, 2018

@mattgarrish

But how did we get on to this in a PR? Where are we in terms of integrating the changes?

At the moment, the WebIDL entry for a TOC is simply a reference to the TOC, wherever it is. If we had a clear specification of the TOC in terms of HTML, this could be replaced by some JSON representation of of the TOC instead.

Hence my proposal to merge this PR as is (unless there are other issues which, at this moment, there aren't any more) and move this discussion to a specific issue.

@laudrain
Copy link

laudrain commented Aug 3, 2018

@mattgarrish Toronto F2F resolution made me hope that it could only one document, particularly in that sentence:

if present, it MUST be an element with doc-toc role pointed to from manifest

So then I don't think there an issue with this. I agree with @iherman to merge this PR as is

@iherman
Copy link
Member Author

iherman commented Aug 3, 2018

@mattgarrish are you o.k. merging?

@mattgarrish
Copy link
Member

Yes, I'm fine with this PR as far as it represents where we are right now. We can re-litigate the toc elsewhere.

@mattgarrish mattgarrish merged commit b276a48 into master Aug 3, 2018
@mattgarrish mattgarrish deleted the webidl-update branch August 3, 2018 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants