-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the Readium Web Publication Manifest serialization #119
Comments
I don't need to add that EDRLab 100% supports this idea. |
We'll be interested in hearing more about this adventure -- I suspect this will be on the agenda for our first meeting of the new year (on the 8th). |
If we ignore the current requirements for the Readium Web Publication Manifest, here's what it would look like with the minimal infoset for WP (no title, implicit reading order):
With a title, a canonical link to the manifest and an explicit reading order, this becomes a valid Readium Web Publication Manifest as well:
|
Hi all, Hadrien asked me to comment. Last year I worked on a web-based offline reading app at NYPL using the Web Publication Manifest. It was used for NYPL's Subway Library project. (I'm no longer at NYPL.) The code is here: https://github.com/NYPL-Simplified/webpub-viewer. It can use either Service Worker or Application Cache to work offline. Subway Library used App Cache so that's more reliable. The service worker implementation has some bugs that I never got around to. I have a demo at https://opds-browser-demo.herokuapp.com/ that links to the reader (using a service worker) if you point it at an OPDS feed with open-access EPUBs - here it is with the Standard Ebooks feed. If you click the "Read Online" button for one of the books it will open the reader. |
Before we started working on Readium-2, I also built a prototype based on the same manifest syntax that became the Readium Web Publication Manifest: https://hadriengardeur.github.io/webpub-manifest/examples/viewer/ This prototype is based on an iframe and uses a Service Worker to cache all resources listed in the default reading order and the resource list, then serve them with a "network then cache" policy. There's also a similar prototype for comics based on |
The New York Public Library in partnership with New York State and the MTA (e.g. Subway) used this same architecture for it's web reader for a project called Subway Library. We have since moved the application to a new domain https://www.simplye.net/ where you can read online. Since we cache the resources in the browser for loss of connectivity (reading in a subway) it works as a general use web reader even in limited connectivity environments (mobile phone browsers.) It is bare bones as a reader but in terms of being accesible, we were able to satisfy accessibility requirements as well. |
Quick note. The link above is production so the content requires us geoblock the IP |
FYI: there is a NodeJS (TypeScript / JavaScript) implementation of Readium2 "streamer", hosted here: For example, scroll down this page to visualize Readium2's "webpub manifest" for the Children's Literature sample publication: Here's an EPUB.js prototype (reader interface) that links remotely to the raw JSON manifest URL (thanks to adequate HTTP CORS headers): Here's a NYPL web reader link (for tech-demo purposes only, this is probably out of date compared with their latest implementation): The "streamer" itself serves HTTP headers that ensure resources are efficiently prefetched and cached. The "reader" apps can use Service Worker, App Cache, etc. to improve the user experience further. This NodeJS Readium2 "streamer" can also be used as an OPDS playground, for instance to visualize the conversion from OPDS v1 to v2. The NodeJS Readium2 "streamer" can load EPUBs remotely, let's take for example The Adventures of Sherlock Holmes from Feedbooks: This next sample EPUB is fetched directly from a GitHub repository, or alternatively via the RawGit CDN (which tends to serve useful HTTP headers): |
@danielweck, @jce1028, @aslagle, thanks for all these great examples. I have some clarification questions.
Thanks again! It is great to see all these examples... |
What you're describing is discovery and IMO all of the examples above could be easily extended that way.
You can test that demo at https://hadriengardeur.github.io/webpub-manifest/examples/progressive-enhancements/index.html |
Just to be clear: although some of the cited examples involve Readium2 as backend runtime / HTTP API (i.e. live instances of "parser" and "streamer" modules that dynamically extract and serve publication resources), it is also possible to statically host the "webpub manifest" (JSON file) and all its linked content. In fact, I believe there is a NYPL implementation which does not deploy server-side Readium2 instances, instead the collection of publications is prepared ahead of time as a hierarchy of folders and files (exploded EPUB archives, organized in directories on the server's filesystem). A Service Worker and App Cache configuration is created based on the information provided by the JSON manifest of each publication (which is itself a translation from the original EPUB's OPF package definition, or a conversion from other publication formats). And of course an OPDS feed is generated to enable discovery, search, etc. So, going back to your original statement "all the URLs for a specific publication is handled via a Web API to a program residing somewhere in the cloud": I just want to eliminate the potential interpretation that Readium2's "webpub manifest" is intrinsically linked to Readium2's stack of software modules ("streamer", etc.), or a particular API. At its most basic level, the "webpub manifest" is totally agnostic to these aspects, it just consists in collections of links and associated metadata. |
@HadrienGardeur thanks for that example. Again, it looks great. To continue my quest of trying to understand:-): the current deployment is such that it includes the reference to (Please, do not take this as a criticism, what you did is great. I just want to understand where our limits are. This comes back to the issue of what we have to specify and how...) Another question: I see there is a reference to a WAM in the code. What is that used for (if anything)? |
That is indeed how the EPUB.js and NYPL demo links work (see my previous-previous post). The EPUB.js example demonstrates remote publication / reader-app (HTTP CORS), the NYPL example demonstrates same-origin publication / reader-app. In both cases, the full (absolute) URL to the JSON "webpub manifest" is passed as a querystring parameter in the reader-app URL, but this could of course be handled differently. |
Yeah, a number of those examples work the way you want @iherman but there are indeed potential issues with CORS:
|
My organization, EvidentPoint, plans to adopt usage of this publication manifest and architecture in the development of our readers. As a contributor to Readium, we have a goal to add this support in any RS based on ReadiumJS (pre Readium 2). We don't have much to show yet but we are actively pushing for this in the roadmap for the Readium project. |
The next version of EPUB.js is using the the manifest throughout the application and it has made switching to using web workers much easier. It will accept a manifest as input, or unzip and parse an archived Epub in a web worker. The worker returns a manifest with the cached links and, on display, a service worker fetches those links for the iframe src.
|
Thanks the additional data, @fchasen, @HadrienGardeur, @danielweck, @jccr. Just to have some equilibrium point in this, here are some my (obviously personal) conclusions, if my understanding is correct:
I think these are all really positive points and play in favor of going down the RWPM line (to be decided by the WG, of course). However, I wonder whether we can go one step further because, after all, our goal is to clearly state the differences v.a.v. the WAM. What would be analogous to the “Manifest Lifecycle” of the WAM (or can we simply reuse the same life cycle either by reference or as a copy in our spec)? And that probably touches upon some sort of a “reader life cycle” that should be defined, e.g., in terms of a “browsing context” (see also Issue 104). I have the impression that having, at least as a sketch, something in the document on this (or these) would make it much clearer in relation to a WAM and would help in making a decision. And, at some point, we MUST come up with these, so this may be the right time... Do you guys believe that we are in position of coming up with a first sketch? (P.S. I could imagine that some aspect of these life cycles and contexts may be slightly different depending on whether the RS is part of a browser or in a stand-alone old-skool:-) reader application. Although not ideal, but we may have to go down that line, too...) |
A manifest lifecycle doesn't make a whole lot of sense purely in the context of Readium, but this could be achieved for the RWPM in the context of the WP infoset. There are a few things that are specific to the WP infoset and would impact such a lifecycle:
(I personally really dislike the fact that the default reading order is so loosely defined in the WP infoset compared to Readium, we'd never do something like that in the context of Readium-2 for instance) Compared to the WAM, obtaining the manifest would be almost the same, while processing the manifest would be different (less items to process, but different ones) and we'd use a different model as well (vs the WebAppManifest dictionary). That said, a large portion of how this processing is independent from the serialization, for instance:
It might be something worth doing first within this WG, no matter which manifest format we end up with, we'll need to write that part of the spec anyway (since there's very little that we can re-use from WAM as well). |
Let us separate the term RWPM (i.e., the Readium one) and a future WPM (™ :-) i.e., Web Publication Manifest. The two may become 99% identical, or may be different, depending on the decision of the WG. But such a lifecycle does make sense in the context of the WP infoset, and we can just as well start with the RWPM to have a clear idea.
Going through the life cycle exersize would give us the possibility to get all these details fleshed out and discussed. Nothing is cast in concrete after all...
I probably repeat myself, but having a clear idea where the differences are is important, and it should be clearly written down. That gives us a rational basis for a final decision; as I said, I do not believe the choice between WAM and RWPM should be made on the basis of syntax, but much more on these differences related on the processing requirements of the WAM. |
OK, so I think we're mostly saying the same thing.
The infoset for WP is different from the WAM, which is why both serialization and the manifest lifecycle will be different as well (even if we end up using the WAM, most of the elements will be different).
Who should be working on this draft? The Readium community? This WG? Both together? I've never worked with WebIDL but I know that @jccr for instance recently committed a port of the Swift/Kotlin/Go/TypeScript R2 shared models to it: https://github.com/readium/readium-interfaces/blob/master/src/Streamer.idl#L42 |
We will see where and how this goes, but I am not sure WebIDL is necessary for the life cycle part. I guess the type 'procedure in text' of some form (like in the WAM might work better. I know that WAM uses WebIDL for, essentially, the formal definition of the various JSON terms (see their definition of terms). I do not know whether this is the best way of defining the terms (I must admit I have not given much thought to it yet). |
I started a new document for the lifecycle at: https://github.com/readium/readium-2/blob/master/misc/W3C/lifecycle.md This is based on the WAM lifecycle but the WebIDL and processing steps are going to be much simpler based on our current infoset. |
(Admin comment) It may be worth opening a separate issue on the lifecycle issue. Whatever the chosen syntax is, we will need something like that... |
This can be closed since we adopted JSON-LD + schema.org. |
In addition to exploring how the Web App Manifest could be extended to support the current infoset, I've also spent some time working on a comparaison between the Readium serialization and the infoset from the FPWD:
The Readium manifest is currently used in Readium-2 (Javascript/Typescript, Swift, Kotlin, Java and Golang) as well as Readium JS and EPUB.js.
It's also the core building block for OPDS 2.0 and the next gen parsers for OPDS 1.x/2.0 (Javascript/Typescript, Swift and Golang).
IMO, this group should really look into the work that the Readium community has done in the last 18 months, we're already working with Web Publications in every single implementation of Readium-2 (since they all use HTTP + a JSON manifest, even on native platforms) and the manifest is designed to be compatible with the EPUB 2.x/3.x infosets as well.
The text was updated successfully, but these errors were encountered: