Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking to alternate representations #159

Closed
HadrienGardeur opened this issue Mar 6, 2018 · 17 comments
Closed

Linking to alternate representations #159

HadrienGardeur opened this issue Mar 6, 2018 · 17 comments
Assignees

Comments

@HadrienGardeur
Copy link

The ability to link to alternate representations and/or packaged versions of the same publication would be tremendously useful for a WP:

  • a WP could point to a pre-packaged PWP/EPUB4 instead of relying on the UA to package it
  • a WP could point to an EPUB2/3 representation of the same content (useful as a fallback for EPUB enabled UAs, such as Edge)
  • a WP could point to a PDF representation of the same content (for a print replica version or as a fallback since all modern browsers offer native PDF support)

Our current WebIDL allows this already, but this would need to be added to our infoset as well.

@TzviyaSiegman
Copy link
Contributor

@HadrienGardeur if our WedIDL allows for this, why do we need to add to infoset? Is this metadata?

@iherman
Copy link
Member

iherman commented Mar 6, 2018

While I can see the value these would have, I am a bit afraid of trying to solve everything in one step. I would honestly to prefer postpone this for now, until after we settle our main issues around manifest and affordances...

@BigBlueHat
Copy link
Member

Thanks to the wonders of HTTP and HTML, this is already provided for via <link rel="alternate"> and Link: <moby-dick.epub>; rel="alternate". Given that the WP address is spec'd to resolve to an HTML entry page, publishers can use either of those expressions to accomplish that goal all within the extensible awesomeness we call the Web. 😃

@HadrienGardeur
Copy link
Author

@TzviyaSiegman our infoset is not strictly about metadata, this is also where we define the reading order and the list of resources as well.

The WebIDL is somehow ahead of the infoset since it provides a generic linking mechanism.

This is also useful for a number of other things currently listed in our infoset, such as the privacy policy for instance.

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Mar 6, 2018

@BigBlueHat I planned on using alternate for the rel as well, but without a proper media type those examples (HTML and HTTP) are not extremely useful.

IMO, a generic link element in the manifest is necessary as well and we can't just rely on the HTML document returned by the WP address.

Once again, not all UAs will fetch this document to process a WP (it's not part of the current lifecycle) and all collection-level info are IMO better suited to the manifest than in a given resource from the publication.

@BigBlueHat
Copy link
Member

@BigBlueHat I planned on using alternate for the rel as well, but without a proper media type those examples (HTML and HTTP) are not extremely useful.

Which thing lacks a proper media type here?

IMO, a generic link element in the manifest is necessary as well and we can't just rely on the HTML document returned by the WP address.

Once again, not all UAs will fetch this document to process a WP and all collection-level info are IMO better suited to the manifest than in a given resource from the publication.

Especially as we explore what WAM can and can't provide us, we shouldn't assume that the entire infoset must be encoded in the manifest. The HTML entry page is a fabulous place (because Web extensibility) to put many of the infoset items. We just need to be willing to explore it more.

@HadrienGardeur
Copy link
Author

Which thing lacks a proper media type here?

Your examples, they should instead use:

  • <link rel="alternate" type="application/epub+zip" href="moby-dick.epub">
  • Link: <moby-dick.epub>; rel="alternate"; type="application/epub+zip"

Especially as we explore what WAM can and can't provide us, we shouldn't assume that the entire infoset must be encoded in the manifest. The HTML entry page is a fabulous place (because Web extensibility) to put many of the infoset items. We just need to be willing to explore it more.

The WAM is not the only option out there, and we haven't agreed on either a serialization or the approach that we'll take to extensions.

I certainly agree that not all infoset items MUST be present in the manifest, but we shouldn't dump everything into HTML either just because we can.

For each and every infoset item, we need to carefully evaluate what's the best option and how/where we'll provide that piece of information.

There can be massive cost in complexity if infoset items get spread all over the resources of the publication and this is already a problem IMO with the default reading order.

@BigBlueHat
Copy link
Member

For each and every infoset item, we need to carefully evaluate what's the best option and how/where we'll provide that piece of information.

Couldn't agree more. 😄

There can be massive cost in complexity if infoset items get spread all over the resources of the publication and this is already a problem IMO with the default reading order.

Each piece should be as close to it's likely usage vector as possible. The trouble is that we see the implementations (and their usage vectors) from opposite vantage points...usually.

However, if we're focused on the Web part of Web Publication, we'll need to extend up from the foundational layer of extensibility most interesting to the browser: HTTP => HTML => CSS => JavaScript.

WAM and the rest sit along side those four in service of the runtime. The actually apps are still defined, distributed, and built using those four formats.

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Mar 7, 2018

Each piece should be as close to it's likely usage vector as possible. The trouble is that we see the implementations (and their usage vectors) from opposite vantage points...usually.

I don't think that's the issue.

However, if we're focused on the Web part of Web Publication, we'll need to extend up from the foundational layer of extensibility most interesting to the browser: HTTP => HTML => CSS => JavaScript.

I think that's the issue, immediately thinking that putting additional info in HTTP/HTML is always the best option for browsers.

That's not always true and other things need to be considered:

  • collection vs resource level information
  • how a particular infoset item affects our lifecycle

You've provided a good example:

  • I suggested a link in the manifest because this is an alternate representation of the collection (the publication)
  • while you're suggesting a link in a particular resource of the publication (the HTML page returned by the WP address)

Right now, we only rely on this HTML page for discovery and nothing else. Putting this info in this page is problematic in multiple situations:

  • a browser won't fetch this resource if a publication is discovered from another resource of the publication with our current lifecycle
  • if you've already discovered the publication before and want to continue reading, there's absolutely no reason for the UA (browser for instance) to fetch the HTML page behind the WP address either

This is a perfect example of the reason why collection-level information are better suited in the manifest.

We could of course do what you've suggested, but this would make the lifecycle much more complex and require additional fetch requests from the UA.

That's not my definition of "the best option".

@mattgarrish
Copy link
Member

Right now, we only rely on this HTML page for discovery and nothing else.

Isn't this more of a discovery case, though? If the alternative's primary purpose is to provide another option for non-WP aware user agents, doesn't it make more sense for that alternative to be in a location and form that such a user agent is more likely to process?

I have the same problem with the discovery accessibility metadata. These may not belong as members of the infoset, but a potential category of best practices for the entry page (with accessibility metadata gaining prominence through conformance to WCAG, for example).

The more we load into the infoset, the more work the user agent has to do to expose the information. If it's generally understood that the entry page is the place to provide this kind of information, all the user agent has to be mindful of is providing a way to access that page, which it seemingly should do regardless of this issue.

(Links to alternative representations probably also better belong as user-accessible links rather than buried in link elements/headers/properties, but that's entirely debatable, of course.)

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Mar 12, 2018

Isn't this more of a discovery case, though? If the alternative's primary purpose is to provide another option for non-WP aware user agents, doesn't it make more sense for that alternative to be in a location and form that such a user agent is more likely to process?

In certain situations it can be discovery, but that's not always the case.

Including a link in the HTML page returned by the address is IMO much less likely to work well for such UAs, simply because there are absolutely no reason why you would fetch that document in the first place under most scenarios.

It's also a slippery slope: it would make the EPUB/PWP alternate representation only discoverable on certain resources of the publication. Updating that reference would also mean updating all resources where there's such a link, which is less than ideal.

The only resource that a WP-enabled is required to fetch right now is the manifest, that's why this is by far the most logical place to include all publication wide info.

The more we load into the infoset, the more work the user agent has to do to expose the information. If it's generally understood that the entry page is the place to provide this kind of information, all the user agent has to be mindful of is providing a way to access that page, which it seemingly should do regardless of this issue.

I have a hard time following you here, and IMO it's actually the other way around.

It is much easier for a UA to simply parse the manifest (JSON) and get our WebIDL dictionary out of it than it is to do additional fetch requests and parse potentially multiple HTML documents on top of the manifest.

Our infoset for the reading order is already problematic for that reason, let's not make our lifecycle even more complex, for no good reason whatsoever.

The "entry page" (I hate that term, because most of the time that's not where you'll start reading the publication) is the place that is useful for:

  • pointing non WP aware clients to the publication
  • discovering the manifest (it's the only document where a link to the manifest is required)

It's hardly the best place to include additional items from our infoset though. Keep in mind that a publication will be discoverable from any resource of the publication, not just the "entry page".

(Links to alternative representations probably also better belong as user-accessible links rather than buried in link elements/headers/properties, but that's entirely debatable, of course.)

I agree about that.

User-accessible links make a lot more sense for the "entry page" because it's aligned with the mission of that page: make WPs accessible to non-WP aware UAs.

@mattgarrish
Copy link
Member

I have a hard time following you here, and IMO it's actually the other way around.

If we assume that this a feature that user agents are going to provide, yes. But I tend to view alternatives as tangential to web publication themselves, so in the interests of simplicity they may not be something we need right away, or at least in not within the manifest.

If the user wants an alternative, they typically want the alternative up front. If I buy an ebook in a bookstore, the bookstore offers me the array of formats, not the EPUB file. If I want a different representation, going back to the store to get it is not a complicated task, and what people are used to doing already. In a free and open world, the entry page can be that location, and it allows search crawlers to find the alternatives, too (or alternatives can be included in an OPDS feed, where such options are similarly more useful).

There's a danger that we go the route EPUB often did and put everything we can think of into the manifest and overload implementations with features. As @iherman has already said, though, I think this is something we can take up later.

@HadrienGardeur
Copy link
Author

@mattgarrish as I've explained in the first post, this is not limited to alternate formats that you select when you start reading a publication.

A user might begin reading a Web Publication online in a browser, and then decide to download that publication to continue reading on a dedicated reading device for example.
Even if you remain on the same device, it might be preferable for the browser to download the PWP variant at some point (to enable offline reading for instance).

IMO the problem with EPUB was mostly the complexity of some of those features ("welcome to refine hell"), the EPUB infoset wasn't exactly overloaded.

As I've said before, I'm not suggesting a new element just for this use case.
A generic link element at a manifest level, would easily enable this use case and quite a few things that are already in our infoset.

@mattgarrish
Copy link
Member

the EPUB infoset wasn't exactly overloaded

Perhaps, but we sure came up with a lot of fallback methods that largely went nowhere.

I'm not disagreeing that finding a packaged version will have its uses, but do we also want to lead people down the path of multiple alternatives in the process not knowing if user agents will deliver on them?

Burying this kind of information where only user agents can get it always makes me uncomfortable. What if I discover the web publication in a vanilla browser and just want to download the packaged web pub?

(This also feels like it would open the door to multiple renditions, even if it's not specifically what you're proposing.)

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Mar 13, 2018

Perhaps, but we sure came up with a lot of fallback methods that largely went nowhere.

Fallbacks are IMO problematic and I'm glad that most of the time, we're not requiring any in our current draft.

There's of course, one big exception with reading order, where the fallback is to:

  • somehow guess where the publication might contain a nav
  • fetch the HTML document, parse it and extract a reading order out of it

This is what's making our lifecycle more complex than it should be. Any additional splitting of publication wide info into multiple HTML/JSON documents will only make the situation even worse.

Burying this kind of information where only user agents can get it always makes me uncomfortable. What if I discover the web publication in a vanilla browser and just want to download the packaged web pub?

Burying this kind of information in another document that won't be fetched is not a good option either.

Ideally you want to include this info:

  • as a machine readable link, in a document that the UA MUST fetch (manifest)
  • as a user facing link in a document that non-WP aware browsers will display (entry page + additional resources in the publication)

(This also feels like it would open the door to multiple renditions, even if it's not specifically what you're proposing.)

I'm opening the door to linking in general, and this is not limited to alternate formats or renditions. I've recently added #162 and #163 and they're both related to linking as well.

@HadrienGardeur
Copy link
Author

Since we're considering using the WAM, the following text is also relevant for this discussion:

The document format defined in this specification provides a unified means of encapsulating metadata about a Web application in a way that we hope will avoid existing pitfalls with both proprietary and [HTML]'s meta/link tags. Those pitfalls include:

  • Developers have to duplicate the icons and application name in each page of a web site, leading to significant redundancy across pages. This is compounded if that information never gets used by the user agent (e.g., the user never bookmarks the web application).
  • Spreading metadata across multiple documents can cause data to fall out of sync.
  • If the metadata for a web application lives in a HTML document, that significantly increases the cost to user agents (and users) of checking for updates to the metadata of a site. Since the HTML file is likely to change often, it means that a user agent will often have to download the whole HTML file in order to check if any of the relevant meta tags have changed. If this resource contains inlined resources like JavaScript, images, or stylesheets, this could be a non-trivial download.

@HadrienGardeur
Copy link
Author

This is now covered by our manifest using links and alternate from the IANA link registry:

"links": [
  {
    "url": "publication.epub",
    "rel": "alternate",
    "encodingFormat": "application/epub+zip"
  }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants