Linking to alternate representations #159

HadrienGardeur · 2018-03-06T14:06:46Z

The ability to link to alternate representations and/or packaged versions of the same publication would be tremendously useful for a WP:

a WP could point to a pre-packaged PWP/EPUB4 instead of relying on the UA to package it
a WP could point to an EPUB2/3 representation of the same content (useful as a fallback for EPUB enabled UAs, such as Edge)
a WP could point to a PDF representation of the same content (for a print replica version or as a fallback since all modern browsers offer native PDF support)

Our current WebIDL allows this already, but this would need to be added to our infoset as well.

TzviyaSiegman · 2018-03-06T14:11:01Z

@HadrienGardeur if our WedIDL allows for this, why do we need to add to infoset? Is this metadata?

iherman · 2018-03-06T14:17:35Z

While I can see the value these would have, I am a bit afraid of trying to solve everything in one step. I would honestly to prefer postpone this for now, until after we settle our main issues around manifest and affordances...

BigBlueHat · 2018-03-06T14:30:49Z

Thanks to the wonders of HTTP and HTML, this is already provided for via <link rel="alternate"> and Link: <moby-dick.epub>; rel="alternate". Given that the WP address is spec'd to resolve to an HTML entry page, publishers can use either of those expressions to accomplish that goal all within the extensible awesomeness we call the Web. 😃

HadrienGardeur · 2018-03-06T14:32:31Z

@TzviyaSiegman our infoset is not strictly about metadata, this is also where we define the reading order and the list of resources as well.

The WebIDL is somehow ahead of the infoset since it provides a generic linking mechanism.

This is also useful for a number of other things currently listed in our infoset, such as the privacy policy for instance.

HadrienGardeur · 2018-03-06T14:47:00Z

@BigBlueHat I planned on using alternate for the rel as well, but without a proper media type those examples (HTML and HTTP) are not extremely useful.

IMO, a generic link element in the manifest is necessary as well and we can't just rely on the HTML document returned by the WP address.

Once again, not all UAs will fetch this document to process a WP (it's not part of the current lifecycle) and all collection-level info are IMO better suited to the manifest than in a given resource from the publication.

BigBlueHat · 2018-03-06T16:46:32Z

@BigBlueHat I planned on using alternate for the rel as well, but without a proper media type those examples (HTML and HTTP) are not extremely useful.

Which thing lacks a proper media type here?

IMO, a generic link element in the manifest is necessary as well and we can't just rely on the HTML document returned by the WP address.

Once again, not all UAs will fetch this document to process a WP and all collection-level info are IMO better suited to the manifest than in a given resource from the publication.

Especially as we explore what WAM can and can't provide us, we shouldn't assume that the entire infoset must be encoded in the manifest. The HTML entry page is a fabulous place (because Web extensibility) to put many of the infoset items. We just need to be willing to explore it more.

HadrienGardeur · 2018-03-06T16:55:18Z

Which thing lacks a proper media type here?

Your examples, they should instead use:

<link rel="alternate" type="application/epub+zip" href="moby-dick.epub">
Link: <moby-dick.epub>; rel="alternate"; type="application/epub+zip"

Especially as we explore what WAM can and can't provide us, we shouldn't assume that the entire infoset must be encoded in the manifest. The HTML entry page is a fabulous place (because Web extensibility) to put many of the infoset items. We just need to be willing to explore it more.

The WAM is not the only option out there, and we haven't agreed on either a serialization or the approach that we'll take to extensions.

I certainly agree that not all infoset items MUST be present in the manifest, but we shouldn't dump everything into HTML either just because we can.

For each and every infoset item, we need to carefully evaluate what's the best option and how/where we'll provide that piece of information.

There can be massive cost in complexity if infoset items get spread all over the resources of the publication and this is already a problem IMO with the default reading order.

BigBlueHat · 2018-03-06T17:08:47Z

For each and every infoset item, we need to carefully evaluate what's the best option and how/where we'll provide that piece of information.

Couldn't agree more. 😄

There can be massive cost in complexity if infoset items get spread all over the resources of the publication and this is already a problem IMO with the default reading order.

Each piece should be as close to it's likely usage vector as possible. The trouble is that we see the implementations (and their usage vectors) from opposite vantage points...usually.

However, if we're focused on the Web part of Web Publication, we'll need to extend up from the foundational layer of extensibility most interesting to the browser: HTTP => HTML => CSS => JavaScript.

WAM and the rest sit along side those four in service of the runtime. The actually apps are still defined, distributed, and built using those four formats.

HadrienGardeur · 2018-03-07T15:00:53Z

Each piece should be as close to it's likely usage vector as possible. The trouble is that we see the implementations (and their usage vectors) from opposite vantage points...usually.

I don't think that's the issue.

However, if we're focused on the Web part of Web Publication, we'll need to extend up from the foundational layer of extensibility most interesting to the browser: HTTP => HTML => CSS => JavaScript.

I think that's the issue, immediately thinking that putting additional info in HTTP/HTML is always the best option for browsers.

That's not always true and other things need to be considered:

collection vs resource level information
how a particular infoset item affects our lifecycle

You've provided a good example:

I suggested a link in the manifest because this is an alternate representation of the collection (the publication)
while you're suggesting a link in a particular resource of the publication (the HTML page returned by the WP address)

Right now, we only rely on this HTML page for discovery and nothing else. Putting this info in this page is problematic in multiple situations:

a browser won't fetch this resource if a publication is discovered from another resource of the publication with our current lifecycle
if you've already discovered the publication before and want to continue reading, there's absolutely no reason for the UA (browser for instance) to fetch the HTML page behind the WP address either

This is a perfect example of the reason why collection-level information are better suited in the manifest.

We could of course do what you've suggested, but this would make the lifecycle much more complex and require additional fetch requests from the UA.

That's not my definition of "the best option".

mattgarrish · 2018-03-08T22:23:02Z

Right now, we only rely on this HTML page for discovery and nothing else.

Isn't this more of a discovery case, though? If the alternative's primary purpose is to provide another option for non-WP aware user agents, doesn't it make more sense for that alternative to be in a location and form that such a user agent is more likely to process?

I have the same problem with the discovery accessibility metadata. These may not belong as members of the infoset, but a potential category of best practices for the entry page (with accessibility metadata gaining prominence through conformance to WCAG, for example).

The more we load into the infoset, the more work the user agent has to do to expose the information. If it's generally understood that the entry page is the place to provide this kind of information, all the user agent has to be mindful of is providing a way to access that page, which it seemingly should do regardless of this issue.

(Links to alternative representations probably also better belong as user-accessible links rather than buried in link elements/headers/properties, but that's entirely debatable, of course.)

HadrienGardeur · 2018-03-12T11:40:28Z

Isn't this more of a discovery case, though? If the alternative's primary purpose is to provide another option for non-WP aware user agents, doesn't it make more sense for that alternative to be in a location and form that such a user agent is more likely to process?

In certain situations it can be discovery, but that's not always the case.

Including a link in the HTML page returned by the address is IMO much less likely to work well for such UAs, simply because there are absolutely no reason why you would fetch that document in the first place under most scenarios.

It's also a slippery slope: it would make the EPUB/PWP alternate representation only discoverable on certain resources of the publication. Updating that reference would also mean updating all resources where there's such a link, which is less than ideal.

The only resource that a WP-enabled is required to fetch right now is the manifest, that's why this is by far the most logical place to include all publication wide info.

The more we load into the infoset, the more work the user agent has to do to expose the information. If it's generally understood that the entry page is the place to provide this kind of information, all the user agent has to be mindful of is providing a way to access that page, which it seemingly should do regardless of this issue.

I have a hard time following you here, and IMO it's actually the other way around.

It is much easier for a UA to simply parse the manifest (JSON) and get our WebIDL dictionary out of it than it is to do additional fetch requests and parse potentially multiple HTML documents on top of the manifest.

Our infoset for the reading order is already problematic for that reason, let's not make our lifecycle even more complex, for no good reason whatsoever.

The "entry page" (I hate that term, because most of the time that's not where you'll start reading the publication) is the place that is useful for:

pointing non WP aware clients to the publication
discovering the manifest (it's the only document where a link to the manifest is required)

It's hardly the best place to include additional items from our infoset though. Keep in mind that a publication will be discoverable from any resource of the publication, not just the "entry page".

(Links to alternative representations probably also better belong as user-accessible links rather than buried in link elements/headers/properties, but that's entirely debatable, of course.)

I agree about that.

User-accessible links make a lot more sense for the "entry page" because it's aligned with the mission of that page: make WPs accessible to non-WP aware UAs.

mattgarrish · 2018-03-12T13:45:20Z

I have a hard time following you here, and IMO it's actually the other way around.

If we assume that this a feature that user agents are going to provide, yes. But I tend to view alternatives as tangential to web publication themselves, so in the interests of simplicity they may not be something we need right away, or at least in not within the manifest.

If the user wants an alternative, they typically want the alternative up front. If I buy an ebook in a bookstore, the bookstore offers me the array of formats, not the EPUB file. If I want a different representation, going back to the store to get it is not a complicated task, and what people are used to doing already. In a free and open world, the entry page can be that location, and it allows search crawlers to find the alternatives, too (or alternatives can be included in an OPDS feed, where such options are similarly more useful).

There's a danger that we go the route EPUB often did and put everything we can think of into the manifest and overload implementations with features. As @iherman has already said, though, I think this is something we can take up later.

HadrienGardeur · 2018-03-12T13:57:43Z

@mattgarrish as I've explained in the first post, this is not limited to alternate formats that you select when you start reading a publication.

A user might begin reading a Web Publication online in a browser, and then decide to download that publication to continue reading on a dedicated reading device for example.
Even if you remain on the same device, it might be preferable for the browser to download the PWP variant at some point (to enable offline reading for instance).

IMO the problem with EPUB was mostly the complexity of some of those features ("welcome to refine hell"), the EPUB infoset wasn't exactly overloaded.

As I've said before, I'm not suggesting a new element just for this use case.
A generic link element at a manifest level, would easily enable this use case and quite a few things that are already in our infoset.

mattgarrish · 2018-03-13T13:22:14Z

the EPUB infoset wasn't exactly overloaded

Perhaps, but we sure came up with a lot of fallback methods that largely went nowhere.

I'm not disagreeing that finding a packaged version will have its uses, but do we also want to lead people down the path of multiple alternatives in the process not knowing if user agents will deliver on them?

Burying this kind of information where only user agents can get it always makes me uncomfortable. What if I discover the web publication in a vanilla browser and just want to download the packaged web pub?

(This also feels like it would open the door to multiple renditions, even if it's not specifically what you're proposing.)

HadrienGardeur · 2018-03-13T13:32:59Z

Perhaps, but we sure came up with a lot of fallback methods that largely went nowhere.

Fallbacks are IMO problematic and I'm glad that most of the time, we're not requiring any in our current draft.

There's of course, one big exception with reading order, where the fallback is to:

somehow guess where the publication might contain a nav
fetch the HTML document, parse it and extract a reading order out of it

This is what's making our lifecycle more complex than it should be. Any additional splitting of publication wide info into multiple HTML/JSON documents will only make the situation even worse.

Burying this kind of information where only user agents can get it always makes me uncomfortable. What if I discover the web publication in a vanilla browser and just want to download the packaged web pub?

Burying this kind of information in another document that won't be fetched is not a good option either.

Ideally you want to include this info:

as a machine readable link, in a document that the UA MUST fetch (manifest)
as a user facing link in a document that non-WP aware browsers will display (entry page + additional resources in the publication)

(This also feels like it would open the door to multiple renditions, even if it's not specifically what you're proposing.)

I'm opening the door to linking in general, and this is not limited to alternate formats or renditions. I've recently added #162 and #163 and they're both related to linking as well.

HadrienGardeur · 2018-03-15T16:59:47Z

Since we're considering using the WAM, the following text is also relevant for this discussion:

The document format defined in this specification provides a unified means of encapsulating metadata about a Web application in a way that we hope will avoid existing pitfalls with both proprietary and [HTML]'s meta/link tags. Those pitfalls include:

Developers have to duplicate the icons and application name in each page of a web site, leading to significant redundancy across pages. This is compounded if that information never gets used by the user agent (e.g., the user never bookmarks the web application).

Spreading metadata across multiple documents can cause data to fall out of sync.

If the metadata for a web application lives in a HTML document, that significantly increases the cost to user agents (and users) of checking for updates to the metadata of a site. Since the HTML file is likely to change often, it means that a user agent will often have to download the whole HTML file in order to check if any of the relevant meta tags have changed. If this resource contains inlined resources like JavaScript, images, or stylesheets, this could be a non-trivial download.

HadrienGardeur · 2018-07-02T14:26:03Z

This is now covered by our manifest using links and alternate from the IANA link registry:

"links": [
  {
    "url": "publication.epub",
    "rel": "alternate",
    "encodingFormat": "application/epub+zip"
  }
]

HadrienGardeur added the topic:manifest label Mar 6, 2018

HadrienGardeur mentioned this issue Mar 15, 2018

Search is a service and an affordance #149

Closed

HadrienGardeur mentioned this issue Apr 19, 2018

Scope of our infoset #176

Closed

HadrienGardeur mentioned this issue May 7, 2018

Reference resources that are not part of the publication #186

Closed

iherman mentioned this issue May 9, 2018

Is it acceptable to use HTML for the serialization of some infoset items, or should it all be in separate (JSON) file? #193

Closed

iherman self-assigned this May 31, 2018

HadrienGardeur mentioned this issue Jun 13, 2018

yet another 'resource list' in the manifest? #225

Closed

HadrienGardeur closed this as completed Jul 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linking to alternate representations #159

Linking to alternate representations #159

HadrienGardeur commented Mar 6, 2018

TzviyaSiegman commented Mar 6, 2018

iherman commented Mar 6, 2018

BigBlueHat commented Mar 6, 2018

HadrienGardeur commented Mar 6, 2018

HadrienGardeur commented Mar 6, 2018 •

edited

Loading

BigBlueHat commented Mar 6, 2018

HadrienGardeur commented Mar 6, 2018

BigBlueHat commented Mar 6, 2018

HadrienGardeur commented Mar 7, 2018 •

edited

Loading

mattgarrish commented Mar 8, 2018

HadrienGardeur commented Mar 12, 2018 •

edited

Loading

mattgarrish commented Mar 12, 2018

HadrienGardeur commented Mar 12, 2018

mattgarrish commented Mar 13, 2018

HadrienGardeur commented Mar 13, 2018 •

edited

Loading

HadrienGardeur commented Mar 15, 2018

HadrienGardeur commented Jul 2, 2018

Linking to alternate representations #159

Linking to alternate representations #159

Comments

HadrienGardeur commented Mar 6, 2018

TzviyaSiegman commented Mar 6, 2018

iherman commented Mar 6, 2018

BigBlueHat commented Mar 6, 2018

HadrienGardeur commented Mar 6, 2018

HadrienGardeur commented Mar 6, 2018 • edited Loading

BigBlueHat commented Mar 6, 2018

HadrienGardeur commented Mar 6, 2018

BigBlueHat commented Mar 6, 2018

HadrienGardeur commented Mar 7, 2018 • edited Loading

mattgarrish commented Mar 8, 2018

HadrienGardeur commented Mar 12, 2018 • edited Loading

mattgarrish commented Mar 12, 2018

HadrienGardeur commented Mar 12, 2018

mattgarrish commented Mar 13, 2018

HadrienGardeur commented Mar 13, 2018 • edited Loading

HadrienGardeur commented Mar 15, 2018

HadrienGardeur commented Jul 2, 2018

HadrienGardeur commented Mar 6, 2018 •

edited

Loading

HadrienGardeur commented Mar 7, 2018 •

edited

Loading

HadrienGardeur commented Mar 12, 2018 •

edited

Loading

HadrienGardeur commented Mar 13, 2018 •

edited

Loading