-
Notifications
You must be signed in to change notification settings - Fork 20
How do we identify a web publication and its components? #10
Comments
I think we'll need to point to the "manifest" -- we'll need to be able to download or package the entire publication and its constituent resources (given Brady's correct observation that with scripting scanning the markup can reliably determine what's really referenced). Also need to know what markup file should initially displayed and how to progress from there. If your URL is a "directory root," one could say everything under it is inherently part of the publication (which could resolve the scanning the markup issue [maybe]), but one will still need to find the manifest to know where to start rendering and the reading order thereafter. |
This reminds me of a concern about progressive enhancement. Say I point my browser at One option would be to give the first document you want displayed a special name, say, |
The case usually given for complexity is open textbooks and course packs, where content is aggregated from different locations without having to actually amass the resources under a single domain/directory. Does "everything" here only refer to html pages? How realistic is it that all the resources are going to be neatly stored together? What if my css is two levels higher up from the publication under a common folder? What if I'm pulling in css or scripts from another domain? I'm all for simplification, don't get me wrong, but I'm not optimistic about a model that requires the user agent to traverse and parse all the documents to figure out what is in scope and needed, if that's where this is leading.
Isn't this where we've considered using link/rel to establish the "belonging"? (And another case of why cross-domain publications get complicated quickly, since their parentage can only be established by starting at an author-controlled location, which then has to be maintained despite what the linked resources might indicate.) |
Do we need to design something that will support content documents ("spine items" in EPUB-speak) hosted on multiple origins?
|
So the URL of the WP would point to the “manifest” rather than a directory? This would then imply (I believe) that the manifest be discoverable from some sort of file. So what sort of file? I would argue that pointing to HTML would be better than the alternatives, given all user agents know what to do with HTML files. But that leaves open the question of whether this HTML file contains the manifest, or just points to the manifest. |
We need to consider it, at least. Intertwined with what I mentioned above is the problem of iframes and bringing in entire chunks of content below the level of the spine. We need to be open to how the web works and not just publications as we're used to making them. The problem doesn't seem confined to content documents but affects their constituent resources, as well, so we need some solution. Taking a publication offline is less of a problem than what happens to references in a packaged web pub. So while we can ignore the problem at this level, we probably do so at our own peril later. Or maybe we add rules farther down the chain that limit what a packaged web pub can reference? (That's kind of a nasty gotcha I'd hate to discover, though.) |
"I would argue that pointing to HTML would be better than the alternatives, given all user agents know what to do with HTML files. But that leaves open the question of whether this HTML file contains the manifest, or just points to the manifest." -- interesting. As long as the manifest was discoverable in a known location, I guess that would okay -- I think a browser might be interested in a first HTML page, whereas a Reading System would want to start with the manifest. "Do we need to design something that will support content documents ("spine items" in EPUB-speak) hosted on multiple origins?" -- I would think "no". |
This is already how the Web works. We routinely use URLs to a directory, and it is up to the server setup on what this means in practice. It can return the Bottom line: I believe your first statement, whereby |
I think the |
The scope notion would play nicely with the proposed packaging spec which IIRC relies on it quite a bit. Outlining how identification for web publications would work if it followed the expectations set by the rest of the web stack (e.g. web app manifests, atom/rss feeds, etc.):
This is the basic pattern used by feeds, web app manifests, service workers, etc: component files link to a central document with metadata, indication of scope, link to self, and an identifying URL. Even AMP uses a variation of this theme. And as I mentioned above sometimes the identifying URL and scope definitions are interrelated. E.g. atom feeds link to the URL whose updates they list (explicit id, implicit scope). This pattern gives us discovery (direct links to chapters let you discover the publication ID, its metadata, and all related assets) as well as a single source of truth for the publication ID, publication-level metadata, and publication assets (the manifest). And this guarantees that the publication id is itself a URL to a human-readable HTML resource that in turn lets you discover the manifest. Of course, this is just going from what you'd expect if you were coming at this from the web development community. I realise that they aren't the only constituency at play here. And this does not necessarily dictate anything about the format of the manifest. Although, if we're going by the principle of least surprise, most web developers would at least expect a JSON file. On service workersService workers achieve this process programmatically, but the pattern is very similar overall. Although a lot of service worker behaviour by necessity violates common developer expectations.
Basically, even though service workers are awesome, they do also have a deserved reputation for being confusing (this is only scratching the surface) so anything we can do to avoid that complexity is a win. That means not letting the publication manifest claim scope over cross-domain resources and not letting it control requests in any way. (Apologies for the brain dump. I didn't have time to edit this down to a concise note 😊) |
What you're describing is almost exactly what we do in Readium-2 @baldurbjarnason, there are only minor differences or observations that I need to add.
Ideally yes, but what if a resource is included in multiple Web Publications ? What if you can't change the HTML or HTTP headers for that resource ? IMO, such a link to a publication is an important part of how discovery is handled, but it's not an absolute requirement.
In Readium-2 we list all resources under two separate collections: This has some clear benefits over a simple
That's one of our only requirements. In Readium-2 we always provide a link that points back to the manifest. The other two requirements are:
That's pretty much the only difference between what you're describing and Readium-2/Readium Web Publication Manifest. The "root URL" (a link with One reason for that is tied to the fact that we'd like anyone to create a Web Publication by remixing content already available on the Web. On Service WorkersI really don't think that Service Workers should in any way influence our design for Web Publications. There are many different ways that content can be cached, and Service Workers are only one method among others. Let's keep our options open and let people use all the possibilities offered. |
So, to come back to the initial question, Readium-2 folks propose:
|
This issue was moved to w3c/wpub#5 |
Perhaps the simplest possible answer to these questions is just a URL:
https://www.example.com/MobyDick/
would both identify the publication and mean that everything whose URL starts with this is part of the publication.So I guess that I’m looking for reasons to make this more complicated :)
The text was updated successfully, but these errors were encountered: