Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the entry page be optional in the package? #33

Closed
llemeurfr opened this Issue Jan 15, 2019 · 60 comments

Comments

Projects
None yet
9 participants
@llemeurfr
Copy link
Contributor

llemeurfr commented Jan 15, 2019

The current consensus is that a WP MUST have a Primary Entry Page, i.e. an HTML page.

I has been argued by the Audio TF that audio books don't have HTML pages and adding a dummy page to a package would be a burden. Note that in this situation (no HTML), a JSON ToC should exist in the manifest or the ToC must be inferred from the track listing (i.e. the reading order)... this is discussed in w3c/wpub#369.

On the other side, I was made aware of a particular situation (Audiolib in France) where a book had a beautiful graphical ToC and the audio publisher integrated an image of this illustration in the audiobook as supplementary content and it would have been great to be able to use it as a real ToC. In such a situation, having an HTML entry page makes great sense.

When a package is exposed as a Web Publication, it is easy for a processor to create on the fly an entry page if there is none in the package. This HTML page will have a link to the manifest (or will embed the manifest) and its content will be created from the metadata found in the manifest. This is a tiny development.

We could therefore choose to have the HTML entry page optional in the package. It would offer simplicity for basic use cases and guarantee great results for advanced use cases.

Note: The other solution is to conclude that a JSON ToC is not an option and that an HTML Toc is imposed to audiobook publishers.

@GarthConboy

This comment has been minimized.

Copy link

GarthConboy commented Jan 15, 2019

I'm somewhat torn here. I view the entry page as unneeded fluff in a packaged audiobook, and thus can see disallowing or discouraging it. On the other hand, if we're going to have an HTML ToC (which I don't object to), there is no reason that couldn't be basically the only content of an entry page.

So, I guess, for me, it comes down to a JSON ToC decision -- if that's the path we go, I'd obviate the entry page (for packaged audiobooks).

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Jan 16, 2019

Regardless of the ToC in JSON issue, I would not want to disallow the HTML page. I think we all agreed that the packaging format we define would not only be for audio books but, possibly, for other profiles, too, like visual narratives. Having an exception for audio books is a bad idea from that standpoint...

@HadrienGardeur

This comment has been minimized.

Copy link

HadrienGardeur commented Jan 16, 2019

I find this discussion quite confusing: the requirement for having an entry page in a packaged publication has nothing to do with the ToC.

As a reminder:

  • the ToC is optional, it's never a requirement
  • it doesn't have to be in the entry page, it can be in a different HTML document linked in resources

Regardless of the ToC in JSON issue, I would not want to disallow the HTML page

There's also a massive difference between not requiring an HTML entry page and disallowing the entry page.

The entry page is completely useless for the packaged audiobook use case, it was mostly added to WP in order to always have a fallback option.

For packaged audiobooks:

  • most of them won't have a URL that returns HTML as their main identifier
  • none of them will be consumed by a "non WP aware" UA, since it'll require the UA to at least know about the packaging format

Based on this, I would highly recommend to:

  • require an external manifest at a well-known location (manifest.jsonld) for packaged audiobooks
  • allow but not require an entry page at a well-known location (index.html)

Forcing something (entry page) for the sake of consistency is IMO a bad approach (EPUB did a very similar mistake by forcing an HTML wrapper for FXL instead of allowing images in spine, which was a foolish decision that the whole ecosystem still have to struggle with today).

@llemeurfr

This comment has been minimized.

Copy link
Contributor Author

llemeurfr commented Jan 16, 2019

I agree with @HadrienGardeur 's conclusion, which has also some implication on #34.

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Jan 16, 2019

See #32 (comment).

I would modify this by saying that allow but not require manifest.jsonld _or index.html, but require that at least one of the two should be available. This should be the case, imho, for the general case.

Whether a particular profile, like audio books, would make further restrictions, that is another matter.

@dauwhe

This comment has been minimized.

Copy link

dauwhe commented Jan 16, 2019

I would modify this by saying that allow but not require manifest.jsonld _or index.html, but require that at least one of the two should be available. This should be the case, imho, for the general case.

Whether a particular profile, like audio books, would make further restrictions, that is another matter.

Packaging seems to divide neatly into two aspects: how to package, and what to package. Would there be utility in leaving the "what to package" to the profiles, and just use a "packaging" spec to define the restrictions on ZIP?

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Jan 16, 2019

@dauwhe this for me is the last resort (but is doable). I would like to maintain as much unity among profiles as possible; putting another way, minimize the compartmentalization of these.

@TzviyaSiegman

This comment has been minimized.

Copy link
Contributor

TzviyaSiegman commented Jan 16, 2019

I think it is worth considering @dauwhe's proposal seriously. Why should a packaging spec go beyond HOW to package? I believe that we are beginning to define details of the HOW based on requirements for specific file types. As we look at different modules, we will likely encounter different requirements. Perhaps restricting the definition of the package to HOW will enable the modules to coexist more peacefully.

@GarthConboy

This comment has been minimized.

Copy link

GarthConboy commented Jan 16, 2019

I can agree with Hadrien's last two bullets above.

I will push back a little on

the requirement for having an entry page in a packaged publication has nothing to do with the ToC

my linkage of the two was driven by the still undecided serialization of the ToC -- if it's only HTML, it might as well be in the entry page (though, said wouldn't be required). That's why I'd like to resolve this issue at least with eye out to where/how the ToC is encoded.

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Jan 23, 2019

This is just making a note. If the decision is that the primary entry page is not necessary, then the canonicalization section must be updated. Indeed, that algorithm makes use of the Document DOM Node of the primary entry page. The algorithm text must be modified allowing that parameter to be undefined.

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Jan 23, 2019

The canonicalization referred to #33 (comment) ensures the default settings of the 'name' (ie, the title) and the reading order (consisting of the single entry of the primary entry page); these would be required to change in the spec.

@HadrienGardeur

This comment has been minimized.

Copy link

HadrienGardeur commented Jan 23, 2019

I think that in general, if the scope and the name of the spec is changed to "Web Publication Manifest" instead of "Web Publications", most of the text about the entry page should be changed.

The first use of WP will be for audiobooks: they don't need an entry page. Visual narratives (comics, mangas & BDs) won't need an entry page either.

This goes beyond the canonicalization, it also affects the core requirements since a packaged audiobook won't have an address and therefore won't have an entry page at that address either.

It also affects the title:

If not included in the authored manifest, the user agent MUST use the value of the title element [html] of the Web Publication’s primary entry page, if present, when generating the canonical manifest.

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Jan 23, 2019

@HadrienGardeur I disagree. I think we should minimize these changes. The current agreement is that the WPM is, as much as possible, general and not audio book specific only. That is true even if the current rec track work is on audio books only.

@HadrienGardeur

This comment has been minimized.

Copy link

HadrienGardeur commented Jan 23, 2019

Well, if you go down that road, each profile will need to deviate from the core spec. I don't think that's a very good strategy.

The current WPM is not general at this point, it's a relic of how this spec was designed (as "Web Publications" instead of "Web Publication Manifest") when affordances were still considered to be within the scope of the core spec.

@llemeurfr

This comment has been minimized.

Copy link
Contributor Author

llemeurfr commented Jan 29, 2019

To sum up the current state of the discussion:

  • the WG has decided to create a packaging mechanism based on the ISO zip specification, which defines a well-know location for the manifest and the html entry page.

  • such container should be usable for any variant of web publication one decides to package, i.e. audio publications, digital visual narratives or academic papers.

  • html entry pages and perceived as a burden for the distribution of packaged audio publications (same for comics). But they are perceived as necessary once the audio publication is exposed on the Web.

  • an html ToC is rare in the audio publication sector. If it exists it is referenced from the manifest. The possibility to create a json ToC belongs to another issue.

  • problem: the Web Publication specification is (still) about both the json manifest AND the html entry page, therefore an entry page made optional in the package will impact sections like manifest canonicalization (see #issuecomment-456734392 and following).

  • proposed solution: if the entry page is missing in the container, it can be generated on the fly when the content of the container is processed to become a Web Publication (either dynamically using a "streamer" or via a simple batch script). It means that to create a WP from a package, a basic unzip is not enough and a simple pre-processing is required. Let's call it "WP-setter".

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Jan 30, 2019

@llemeurfr,

my proposed solution is a bit different (actually, I am actually not sure it is very different, maybe just more detailed), taking into account that the lightweight package is for WP in general and not only for audio books:

  • Either the HTML entry page or the separate JSON-LD Manifest MUST be in the package. It is not required to have both, though.
  • The user agent MUST first look for the entry page with the name index.html
    • if found, it MUST follow the rules as described in the WP spec to extract the Web Publication Manifest;
    • otherwise, the user agent looks for the Web Publication Manifest with the name manifest.json to extract the Web Publication Manifest.

This means that, e.g., audiobook publishers have the possibility to ignore the entry page when creating the package, whereas a scholarly publisher can use the (more natural) embedded manifest. And anything in between, with mixtures of HTML and other media.

I do realize that this is a bit of an extra load on user agents, because they must be prepared to go the extra mile to parse the HTML entry page and extract the manifest. On the grand scale of things I do not think this is a really a big deal, however, compared to the overall job of creating a decent user agent (this is also based on my experimentation which has proven to be a pretty straightforward).

This also allows for your "WP-Setter" approach if a packaged publication is to be turned into a bona fide WP.

@HadrienGardeur

This comment has been minimized.

Copy link

HadrienGardeur commented Jan 30, 2019

@iherman that approach is fine (except that we've been using manifest.jsonld instead of manifest.json, which is more unique to WP).

Specific profiles can then further restrict things, for example the audiobook profile could require the presence of a separate manifest.

That said, while the packaging spec would be lightweight enough to accommodate the needs of dedicated profiles, that's not the case of the WP spec itself.
Including a url in the manifest and responding to that URL with an entry page are still requirements in WP, which is problematic when the entry page is not required in the package.

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Jan 30, 2019

@iherman that approach is fine (except that we've been using manifest.jsonld instead of manifest.json, which is more unique to WP).

Yep, you're right, it should be .jsonld

Including a url in the manifest and responding to that URL with an entry page are still requirements in WP, which is problematic when the entry page is not required in the package.

Sure, if we go down that line, we will have to review the WP spec with this special goggle on...

@llemeurfr

This comment has been minimized.

Copy link
Contributor Author

llemeurfr commented Jan 30, 2019

@iherman I can buy this "entry page OR manifest" in the container. The overhead is of treating both cases is already there for WP user agents.

@GarthConboy

This comment has been minimized.

Copy link

GarthConboy commented Jan 30, 2019

I can buy this too. Almost kinda like it. :-)

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Feb 4, 2019

If the final decision is that we MUST have an entry page to make the content directly usable on the Web, here is the minimal PEP that is needed:

<html>
    <head>
        <link rel="publication" href="manifest.jsonld">
    </head>
</html>

By adding this, a (unpacked) package is a bona fide Web Publication (in case the name of the JSON-LD manifest file is fixed for a package).

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 4, 2019

@iherman to my mind the entry page and manifest serve two different purposes; The Manifest describes the package, the entry page describes how to display it.

It seems to me that the manifest is required in both audio-only and 'web publication' packages. The entry page is not.

Therefore, what would be the implications of making the manifest required, and the landing page optional?

If a publisher desires to control the presentation of the audio, they can include a 'landing page resource' in the manifest. The reference could then be to package-local resources OR web-based resources.

For example a distributor may have a customized web-based audio player that is linked by reference, rather than packaged.

I'm not a web developer, so I won't pretend to understand if this is sensible.

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Feb 4, 2019

@geoffjukes

@iherman to my mind the entry page and manifest serve two different purposes; The Manifest describes the package, the entry page describes how to display it.

on a very high level, that can be said (although in the current set up some items, like the title, MAY come from the PEP (ie, the primary entry page, ie, index.html).

Therefore, what would be the implications of making the manifest required, and the landing page optional?

That would not work on the Web. A usual user agent cat do something with any HTML file; it the only thing it sees is a bare JSON file, it can (possibly) display it as a text, but that is as far it goes. The PEP, as HTML, can serve (beyond the display proper) a bunch or roles: set up the 'origin' for the publication that will be important for security reasons if any kind of JS file is used, create an environment (almost a minor operating system) to run those JS files, etc. HTML has turned into the starting point for just about everything:-)

Hence the PEP is a MUST for the publication on the Web as some sort of a starting point. That also means that if a packaged publication ever wants to be used (unpacked) on the Web, then the PEP must be present.

The other way round... well, that is the issue.

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 4, 2019

@iherman Thanks for the clarification, that certainly makes sense.

In essence, the PEP becomes the 'is web enabled' indicator, and would therefore be required in the end-user package. It would not be present in the business-to-business packages, and would not be required (but there is no need to call out that distinction in the specification).

I would be amenable to generating the minimal PEP when we are "packaging for the web", with the caveat that I would like the manifest be required, as it is the most useful component from a business to business perspective.

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 4, 2019

Perhaps part of the problem is that we're trying to base everything on Web publications instead of a broader concept of a publication that is inherited by Web publications, audiobooks, etc.

A cleaner separation of concerns might lead to a model like:

Publication manifest |--> Web Publication
                     |          ^
                     |          |
                     |          v
                     |--> Audiobook

where all our outputs are considered modules that inherit from a common publication manifest, with perhaps no required metadata and certainly no mention of structure. That way we can introduce different required metadata, other ways of harvesting metadata, different manifest/pep discovery models, etc. etc. on a per-model basis.

There should always be paths between the formats, but it won't always be true that one format is necessarily the other all the time.

I think our last revision to take out the affordances got us halfway to this model, but we'd still need to separate all the Web-specific structures and metadata requirements.

It's not what we set out to define, of course, but maybe is a more pragmatic model?? (I could be out to lunch on this comment, too, as I haven't thought it through all that deeply. :)

@TzviyaSiegman

This comment has been minimized.

Copy link
Contributor

TzviyaSiegman commented Feb 4, 2019

I am having emoji issues today, but i very much agree with @mattgarrish.

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 4, 2019

@mattgarrish I believe this concept is compatible with the work that @dauwhe has been doing (and to a lesser degree myself).

Speaking for myself, everything starts with a manifest. That being said [and assuming a minimum set of requirements such playback order, audio file name/size/length/checksum, supplemental file name/size/checksum, etc] we should be able to generate a manifest based on the current delivery packages that we receive from publishers, simply by analyzing the files.

In one of our more complex products, we package audio, html, and media overlays, to provide a 'read along' experience. We do this with a custom player, but I throw it out there as an example of "There should always be paths between the formats" - as it is more accurately a hybrid of two :)

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 4, 2019

Right, the manifest is the key to every format because otherwise you just have html/audio/etc. It's always been what has bound all the resources together. So long as it is common, there will be various ways of translating content, and not strictly from one to the other as you say.

I think the problem has been that we started by analyzing one module without realizing it was just one module. As we discussed on the call today, I don't think it's imperative that everything derive from web publications, but it's taken moving on to another module, audiobooks, to realize that we've arguably been modeling too much on one possible rendering.

That said, I still think there's going to be some complexity in remodeling what we have. What is a canonical manifest, for example, if data harvesting may not apply to every module? These are eminently solvable problems, of course, but might take some creative thinking to avoid needless duplication. I'm going to see what I can make of splitting out the manifest from wpub, as a first step.

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 4, 2019

@mattgarrish As a pure audio publisher (primarily) I have struggled with the term 'readingOrder' to describe what is more appropriately a 'listeningOrder'. How much separation do you foresee in the modular approach?

Metadata (book title etc) is common (potentially) but 'runtime' would only be relevant to the audio component of an audio-enabled package. So would that live in an 'audio' segment? or would a 'listeningOrder' imply audio, and a 'readingOrder' imply HTML etc? What about the media overlay, would that rely on the existing SMIL? or could we consider a future datapoint 'mediaSync' or something?

Rhetorical questions really.

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Feb 5, 2019

I like what @mattgarrish described in #33 (comment), though I am a bit concerned about the complexity thereof, as well as the possible proliferation of mutually incompatible modules. The original consensus proposal included the fact that a user agent remains compatible with a packaged WPUB without further ado, and the choice of adding (or not) a PEP is exclusively in the hands of the publishers.

I would also note that, as showed in #33 (comment), the minimal PEP to be added to make everything compatible is ridiculously simple, and that index.html file could be used as an indication of the nature of the package. A bit like EPUB's mime type file. If there was an agreement among audio publishers that it is fine to add that to the package, we would be done without a major damage...

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 5, 2019

@iherman I am warming to the idea of making the PEP required based on the minimum of linking/referencing a manifest, and I concede to your point on compatibility.

that index.html file could be used as an indication of the nature of the package. A bit like EPUB's mime type file.

I would like to combine this thought with #34 and particularly the reference to @dauwhe comment on embedded manifests.

By requiring an embedded PEP, the publisher is effectively saying “this is a web publication”. They then have the opportunity to define the manifest to use for the ‘Web Publication’ version of the package. This would be extremely useful for adoption, I believe, as the package itself could have multiple manifests for different applications. It would also allow manifests to be hosted external to the package itself.

On the subject of #34 (as it is a companion ticket to this) I would say that the PEP is required inside the package, but the manifest (which is also required) must be accessible via the reference in the PEP.

[edit: I’m having a difficult time keeping track of the myriad conversations, so i apologize if I am making statements that have already been made/assumed.]

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Feb 5, 2019

@iherman I am warming to the idea of making the PEP required based on the minimum of linking/referencing a manifest, and I concede to your point on compatibility.

Wow. This may be a way out of the current deadlock...

On the subject of #34 (as it is a companion ticket to this) I would say that the PEP is required inside the package, but the manifest (which is also required) must be accessible via the reference in the PEP.

That is already the case. This is what makes a WPUB, in fact: the manifest MUST be either referenced or included in the PEP:

The primary entry page is the only resource in which a manifest can be embedded. To ensure discovery of the manifest, the primary entry page MUST provide a link to the manifest, regardless of whether the manifest is embedded within the page or external to it.

At the moment the manifest may either be embedded in the PEP (in JSON-LD) or referenced from it if the manifest is a separate JSON-LD file. #34 is based on the assumption that if the WPUB is packaged, the manifest MUST be a separate file; on the other hand, there is another open issue (w3c/wpub#327) which would require just about the opposite: the a manifest may appear within the PEP only. At the moment we do not have consensus on neither of these two:-(

@HadrienGardeur

This comment has been minimized.

Copy link

HadrienGardeur commented Feb 5, 2019

If the final decision is that we MUST have an entry page to make the content directly usable on the Web, here is the minimal PEP that is needed:

<html>
   <head>
       <link rel="publication" href="manifest.jsonld">
   </head>
</html>

For me this perfectly illustrates why we shouldn't have the entry page as a requirement in any package:

  • the only requirement that makes sense for the content of the entry page is the link to the manifest (or an embedded manifest), this is better achieved with a well-known location to a manifest (manifest.jsonld) in a package
  • while we've discussed in the past the idea of requiring the entry page to provide an entry point in the publication, I believe that spec-wise this is extremely difficult to have as a requirement (it's too vague), plus a link to an audio resource (in the case of an audiobook) is not enough to go through the entire publication
  • the entry page serves two primary goals: discovering the manifest and as a fallback for non WP-aware UAs, but non WP-aware UAs won't be able to open packaged publications anyway and we need a lot more than a link for a fallback

This minimal entry page would bring zero value to the table, it just makes a packaged audiobook more complex to produce.

This group has widely over-estimated the usefulness of the entry page as its defined today and too many of these requirements only make sense in the context of publications that are primarily meant to be distributed on the Web.
For EPUB and audiobooks, these requirements only make things more complicated than they should be, at the cost of simplicity for both authors and UAs.

@HadrienGardeur

This comment has been minimized.

Copy link

HadrienGardeur commented Feb 5, 2019

where all our outputs are considered modules that inherit from a common publication manifest, with perhaps no required metadata and certainly no mention of structure. That way we can introduce different required metadata, other ways of harvesting metadata, different manifest/pep discovery models, etc. etc. on a per-model basis.

I mostly agree with your comment @mattgarrish but with a few tweaks:

  • the current set of required metadata is really minimal, aside from dropping the requirement for url (also called the address) I don't think that it needs to be tweaked
  • discoverability could also remain in the document that defines the manifest, but it wouldn't be a requirement and there wouldn't be any concept of an entry page
  • a separate Web Publications specification would introduce the concept of an entry page, require url in the manifest and require discoverability from the entry page
  • the Web Publication specification would also define a fallback when the default reading order is missing (use the entry page)
  • the Lightweight Package specification would define a container (ZIP based on ISO spec) as well as well-known locations (manifest.jsonld and index.html)
  • the Audiobook specification would define a profile of the Lightweight Package specification, with additional requirements for the manifest (reading order should only reference audio resources) and maybe additional recommendations or requirements for metadata
@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 5, 2019

The original consensus proposal included the fact that a user agent remains compatible with a packaged WPUB without further ado

And this would remain true for Web Publications that are packaged.

But what I see is that we're moving off into a different realm, where we have formats that predominantly live only in their packaged form, with only some desire to be able to make them also deployable as web publications. The path from exploded web content back to these specific formats is probably even smaller. This would encompass audiobooks, epubs and no doubt other forms as well.

The PEP makes it easier to explode these packaged formats onto the web, but it's not a critical part of them in their packaged form. That's why the formats should all inherit the same common manifest format, but whether they need the additional trappings to be web publications remains optional.

I would also say that a requirement of any packaged format is that it must retain the ability to be produced as a conformant web publication, not that it must simultaneously be a conformant web publication.

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 5, 2019

@mattgarrish @iherman I think I am confusing the wpub and pwpub projects.

Is the scope of this project is to define how to package (i.e. contain in a single redistributable file) a collection of files that conform to the standard laid out in the wpub project?

[edit: w3c/wpub#400 Yep. I am]

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 5, 2019

@iherman @HadrienGardeur Given the edit above, "If the final decision is that we MUST have an entry page" then from the perspective of an audiobook and epub Publisher, and an Audiobook aggregator/distrubutor, we would have no issue including a reference to a manifest inside the index.html to facilitate manifest discovery.

For publications that already have an index.html, the rel link would be injected.
For publications that do not, we would include a bare PEP.

The name of the rel should be established so as to minimize the risk of a clash inside an existing index.html. Alternatively choose a well-known filename that is unlikely to clash with a file that is part of the publication (or make index.html a reserved filename in the wpub spec)

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 5, 2019

Is the scope of this project is to define how to package (i.e. contain in a single redistributable file) a collection of files that conform to the standard laid out in the wpub project?

We have a bit of overlapping scopes here, to be honest, as we're also looking at whether audiobook = web pub in terms of what has to be packaged and how to discover it. That might be contributing to the confusion.

In general, though, the packaging defined here should work for web pub, audiobook and whatever else we define.

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 5, 2019

@mattgarrish Thanks for the response.

This may be better suited to the wpub project....

We (Blackstone) have 3 high-level product types in this space.

  • Audiobooks
  • Print/ePub
  • Hybrid of the two (an ePub with audio).

All 3 products have the potential for supplemental material which may need to be embedded (a PDF for example) or linked (a DVD ISO for example)

For us it would make sense to ensure the two can be sensibly co-mingled. We do not produce Manga, or ePubs with Video assets, but I would assume the complexity of such would match our 'hybrid' product.

  • Audio-only products do not dictate how the files are displayed, only what order they should be played in.
  • ePub-only products strictly dictate how the content should be displayed, and what order they should be consumed in.
  • Hybrid products are more complicated. There is no guarantee that 1 audio file will match 1 html file, and often does not (and I am ignoring the media overlays for a moment).

I know I seem to be flip-flopping here, but it could be argued that a Packaged Web Publication that is missing a PEP, is Audio only, and therefore intended for consumption by an application running on a device, rather than inside a web browser, and as such the manifest should be consumable directly without requiring discovery.

The only utility I can see in a more 'complete' PEP for audio-only, would be if a publisher had specific display requirements for that title. However, I could counter-argue that it would then fall under the loose heading of the 'hybrid' product, and is not strictly 'audio only' (in essence, it's a 1-page ePub that is a playlist).

Apologies if some of this is naive. My background is operations/infrastructure, not application design. I am involved here because I developed the automation systems we use at Blackstone for asset ingestion and packaging.

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 5, 2019

The name of the rel should be established so as to minimize the risk of a clash inside an existing index.html.

If we have to resort to specially named files, it would be good to only have one, and I'd lean to that being the manifest given its general utility.

It feels like we could just add a link relationship to identify the pep in whichever of the reading order or resource list it appears in.

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 5, 2019

@mattgarrish For clarity, is the manifest.jsonld here intended to be the same one as the one defined by the wpub spec, but with non-local resources locations converted to local ones?

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 5, 2019

For clarity, is the manifest.jsonld here intended to be the same one as the one defined by the wpub spec, but with non-local resources locations converted to local ones?

I haven't seen the issue of non-local resources raised yet, since this was borne out of audiobooks and their not living on the web first.

It would be a problem that needs solving to package web publications, but perhaps that doesn't get handled before the web packaging specification (i.e., there are limits on what can be packaged by this LPF container).

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 5, 2019

@wareid was kind enough to provide me with clarity.

My opinion here would be that a well-defined manifest, that includes a sequence for media playback, would render a PEP redundant. I would therefore advocate for making the PEP optional but the manifest required in all publications.

From an application perspective, a missing PEP would indicate the publication was 'media only' and render an application-themed media-appropriate playlist based on the sequence.

The presence of a PEP would indicate the publication has specific display requirements, and should be rendered appropriately.

If the consensus is to make the PEP required, I would be OK with including a minimum PEP as described by @iherman

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 6, 2019

a missing PEP would indicate the publication was 'media only'

That's true right now, but does it hold for future packaged publications? What if an EPUB 4 also doesn't require a primary entry page?

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 6, 2019

@mattgarrish I’m not sure I follow. If EPUB4 is HTML based, it would have an index. If it is manifest based, it would not need a PEP.

The specifics of how an manifest-based EPUB4 might be interpreted and rendered is not relevant to this ticket :)

I’ll happily edit or remove the paragraph on what the absence of a PEP implies, as I feel the paragraph on what it’s presence implies is still appropriate.

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 6, 2019

If EPUB4 is HTML based, it would have an index.

The PEP isn't (or hasn't been) about media type, but about deployment on the web. Whether an EPUB is destined for the Web is also a publisher choice, so at this time I can't see why it would be required -- except in the same scenario where we require publishers to produce unnecessary files.

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 6, 2019

@mattgarrish I think I follow here. I'm primarily infrastructure/operations, so some of the subtleties of this debate are lost on me

The comment by @iherman in w3c/wpub#401 reinforces my feeling that I am working from some invalid assumptions.

Given that, I'm going to defer to the experts on the question of optional/required, and simply say I am a fan of only doing as much work as is necessary.

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 6, 2019

reinforces my feeling that I am working from some invalid assumptions

If it helps, the central crux of this debate is whether the audiobook package has to be a valid web publication if it is unzipped and uploaded onto the web. If you have a primary entry page, then this premise holds true even if the file is seen as useless in non-web scenarios. If you don't have it, then what you upload doesn't provide a discovery page and isn't a web publication unless you pre-process before deploying.

What is being questioned is whether the primary entry page is a benign piece of fluff or a major deterrent. If we outlaw embedded manifests and HTML tables of contents, for example, and require a specific name and location for the manifest, will users just drop in the "minimal" pep and maybe roll their eyes at such an odd requirement, or will it turn them off the format that they have to add files of indeterminate value?

On the flip side, if we don't outlaw embedded manifests and HTML tables of contents, are you interested in having to deal with a format that allows these, or would you just ignore it until something better comes along?

If we don't require the package to be a web publication, then the requirements can be modified to make a more lightweight audio format (e.g., dropping the pep, as this issue is about). Maybe everyone wins, since publishers could still include a primary entry page if they wanted. But we lose the purity/simplicity of always knowing the package is also a web publication.

In all honesty, I'm not sure how we decide the right direction without a lot more input. It might be helpful to step back from issue debates and develop fuller proposals to get community feedback on.

@geoffjukes

This comment has been minimized.

Copy link

geoffjukes commented Feb 6, 2019

@mattgarrish That does help, thank you. To say it back to you - the essence of the debate (of which this ticket is a symptom) is whether or not a "packaged web publication" is a "web publication" at all.

It almost feels like (and maybe this is what you modular manifest alludes to) is that there should be a "publication specification", and then "web publication" and "packaged publication" inherit from that directly.

@mattgarrish

This comment has been minimized.

Copy link
Member

mattgarrish commented Feb 6, 2019

there should be a "publication specification", and then "web publication" and "packaged publication" inherit from that directly

I agree this debate gets confusing fast, as it also sets up the possibility that there is a "publication container" and that what's inside it isn't always a web publication. That would seem to be the reality if we go this direction of modularizing around a less web-y concept.

@avneeshsingh

This comment has been minimized.

Copy link

avneeshsingh commented Feb 11, 2019

The idea of @mattgarrish "publication as the highest level, and web publications and audio publications are derived from it is conceptually very good, but the question here is if it falls in the current charter.
This changes the very fundamental direction from which this group started. Looking the scope of the charter copied blow, the charter revolves around web publications, and other items like PWP and EPUB 4 look derivatives of web publications.

It is possible that we may have to recharter to move in the direction mentioned by Matt.
If we want to diverge from WP path to create audio publications, then I think another question comes in focus, do we want audio publications to be web publications.
If we keep entry page compulsory in audio pub, and allow a minimum entry page that mainly have link to manifest (as mentioned by Ivan), then we can keep audio publications aligned to web publications.

I understand that I am raising very sensitive questions for charter. Looking forward to comments of Chairs and Ivan, and group members.

  1. Scope

The WG will specify Web Publications and identify what they need the underlying Web platform to provide.It will build upon existing platform technologies specified by other groups, where available, seeking to fill gaps by assuring that the unique requirements of Web Publications are addressed by features (including optional features) or extension points in those specifications.

In particular, the WG will make normative Recommendations for Web Publications; Packaged Web Publications; EPUB 4; and DPUB-ARIA 2.0, as described in Deliverables below.

Recommendation-track deliverables will contain mechanisms to make Web Publications accessible to a broad range of readers with different needs and capabilities.This includes general Web Content Accessibility Guidelines (WCAG) and Web Accessiblity Initiative (WAI) requirements of the W3C as well as requirements for international readers using different scripts and document formats.Profiles of Web Publications may be defined with more stringent accessibility requirements.

@llemeurfr

This comment has been minimized.

Copy link
Contributor Author

llemeurfr commented Feb 11, 2019

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Feb 11, 2019

@avneesh,

it is good to raise the question on charter scope compatibility time to time but, I believe, we are fine. The way I see it, we do define (even in @mattgarrish's) approach:

  • Web publications, which, in @mattgarrish's formulation, would consist of the Web Publication Manifest + a Web Publications "profile".
  • We do not define PWP ourselves, but I think our consensus is that the Lightweight Package format is not specific to audiobooks but, rather, a lightweight packaging format for publishing in general that are based on the Web Publication Manifest (the latter being the central piece in the package in any case). We rename this for the known reasons, but we are fine as far as the charter goes.
  • Audiobooks represent a specific profile of... well, certainly a profile based on the Web Publication Manifest. Whether it is a profile of Web Publications per se is the current issue, but I do not think that would invalidate the charter.
@iherman

This comment has been minimized.

Copy link
Member

iherman commented Feb 11, 2019

I presume @mattgarrish's proposal will be discussed later today, but it may be good if I give my reactions in writing in advance.

I guess the main issue I have what that bidirectional arrow means in

Publication manifest |--> Web Publication
                     |          ^
                     |          |
                     |          v
                     |--> Audiobook

between the Web Publication and the Audiobook. The consensus proposal that was discussed on our last call ensured that the

Web Publication --> Specific book profile, eg, audiobook

direction works (remember that we are not discussing packaging in terms of audiobooks only, hence the deviation in the figure!). Ie, I can take a Web Publication and (provided the additional metadata entries are provided that may be defined for, e.g., audiobooks), zip it, and ship it to an audiobook player. As far as I am concerned, that is an important aspect of our work on Web Publications because zipping means that we have a cheap and easy way of some sort of offlining until some more complex implementations come to the fore. If we cut that approach, I really think we are bifurcating things too much; not necessary in the formal terms of the charter (see my response to Avneesh) but at least in the spirit thereof.

The other direction, ie,

Specific book profile, eg, audiobook --> Web Publication

is less of a concern to me, because it is easy to add a trivial PEP to complete that process.

Bottom line: I think @mattgarrish's proposal is indeed a good way forward editorially and conceptually but only if it is accompanied with the consensus proposal. Without it we are incurring the danger of specifying completely independent and mutually incompatible profiles.


A better picture would actually be:

Publication manifest |--> Web Publication
                     |          ^
                     |          |
                     |          v
                     |--> Lightweight Packaging |--> Audiobook
                                                |--> Visual narrative
                                                |--> Scholarly packages

where the approach (again per consensus proposal) is that the PEP is not required for Lightweight Packaging in general, but may be required for some profiles (that would probably be the case for scholarly packages that are very HTML based).

@iherman

This comment has been minimized.

Copy link
Member

iherman commented Feb 12, 2019

This issue was discussed in a meeting.

  • RESOLVED: Restructure the document to reflect the publication structure as primary, with web publications and packaged web publications as modules {: #resolution2 .resolution}
  • RESOLVED: WP keeps PEP as a requirement, Lightweight Packaging will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present, look for standalone manifest]), but one must be present in the package. {: #resolution3 .resolution}
  • RESOLVED: Laurent will merge the pull request as soon as he can {: #resolution4 .resolution}
View the transcript PEP in a package
Wendy Reid: #33
Wendy Reid: First topic today is final topic from last week: issue 33 from the Packaged Web Publication repo, about primary entry page becoming optional…
… a quick recap, and clearing up some questions. The main proposal right now is Ivan’s…
… PWPs may give you the option to make index.html or a JSON manifest the primary entry page…
… an alternate proposal was brought up on GitHub, the so-called ‘minimal’ PWP. The index.html only exists to point to the manifest
… this is specific to PWPs – just for the package context, not for web publications as a whole. Does anyone have comments?
Ivan Herman: Matt came with a proposal which I personally consider complimentary to the previous proposal, but some people might disagree
… but I would prefer Matt to make the proposal
Matt Garrish: In the discussion around primary entry pages and whether it’s required, we may be overlapping too much with audiobooks and web publications, expecting audiobooks to always be web publications…
… what I proposed was separating the manifest so the primary entry page doesn’t always have to be present with a packaged audiobook/epub… but it can be
… if the publisher wants it to be a conformant PWP, the publisher can include the entry page
… Ivan posted a better clarification of this morning this morning, which is that the packaged formats are somewhat separate from a WP… the package format doesn’t always have to be a WP
… everything becomes compatible. In its packaged form, the audiobook can be valid. It’s essentially making everything more abstract…
… getting out of the mess of how something remains logically consistent, even if these other formats don’t want all these other WP requirements to be present
Wendy Reid: #33 (comment)
Ivan Herman: I want to re-emphasize: everything that Matt said is obviously true, but one important thing is missing: a reading system MUST be able to understand the packaged version of full web publications…
… must be able to understand the primary entry, find the manifest out of that, so that a WP in the traditional sense, by just zipping it, even if I called the manifest file something else (although the index file must be kept) it’s still valid
Laurent Le Meur: I totally agree with Matt’s point. This is conceptual. Nevertheless, when we describe the spec, it will be very complex to express that conceptual model and at the same time to explain what Ivan said: a reading system must be able to understand everything with a primary entry page
… I propose we follow what Ivan suggested before: stating the primary page OR the manifest: one or the other…
Ivan Herman: +1 to laurent_
Laurent Le Meur: which for processing is easy to understand. It doesn’t mean we don’t have to explain this model, but I’d be careful not to make the concept too complex…
Wendy Reid: We’re back to the original proposal. The resolution would be: a PWP may include either the primary entry page or manifest but must contain one of those two
Ivan Herman: I think the proposal has two equally important parts.
Wendy Reid: So there are two resolutions
Proposed resolution: Restructure the document to reflect the publication structure as primary, with web publications and packaged web publications as modules (Wendy Reid)
Ivan Herman: -> Matt’s description for the proposal: #33 (comment)
dkaplan31: +1
Tzviya Siegman: +1
Ivan Herman: +1
Geoff Jukes: +1
Joshua Pyle: +1
Rachel Comerford: +1
Ric Wright: +1
Franco Alvarado: +1
Mateus Teixeira: +1
Simon Collinson: +1
Ben Schroeter: +1
Luc Audrain: +1
Bill Kasdorf: +1
Gregorio Pellegrino: +1
Wendy Reid: Resolution accepted
Resolution #2: Restructure the document to reflect the publication structure as primary, with web publications and packaged web publications as modules {: #resolution2 .resolution}
Proposed resolution: WP keeps PEP as a requirement, PWP will give the option of using the PEP or the Manifest, but one must be present in the package (Wendy Reid)
Ivan Herman: -> Ivan’s consensus proposal: #33 (comment)
dkaplan31: -1
Laurent Le Meur: +1
Ivan Herman: +1
Garth Conboy: How does this fit in with what Laurent just said about or/both?
Wendy Reid: OR/BOTH would work here, but at least one has to be present
Deborah Kaplan: I’m -1 unless it becomes EXACTLY one must be present
… one, but only one, must be present…
… from my experience of working with small producers, people will be confused, which means publications will be wrong, which means reading systems will behave inconsistently…
… creators will have to come up with workaround due to inconsistent implementation…
… the option of having two will end up with badness.
Ivan Herman: The resolution which I proposed said that the processor MUST look for a primary entry page and if it finds it, MUST process according to the rules. If it doesn’t find it, it looks for a manifest
Deborah Kaplan: As long as it’s 100% clear to creators and reading systems what will happen, that’s fine
Laurent Le Meur: I was very clear about that
dkaplan31: in that case I am changing my vote to a +0 from a -1
Wendy Reid: Would it be better to rephrase the resolution as either or but at least one must be present?
George Kerscher: As someone producing a publication, I’m going to start with my manifest. For an audiobook, I zip that up and distribute it to various places and they process it…
… if I want to add a primary entry page then I could serve it up on the web and all is well. To my mind, I’m progressively enhancing the publication
Ivan Herman: That workflow is correct
Matt Garrish: My question is one of consequences. When we require specific names, it’s going to mean that if you unpackage it on the web, you can only do this with one WP in a directory due to collisions
… are we putting a limitation that we got away from earlier back into play – that you can’t have multiple articles in one directory?
Deborah Kaplan: +0 and not +1 because I still dislike giving people choices, because small creators are confused by choices, while meanwhile large publishers can create a PEP trivially as part of production workflow whether they need one or not. But not -1 as long as clear flow is documented.
Ivan Herman: Matt is right. If we don’t have a name restriction, we have to do something to the package itself to find where to start. This is zip, we aren’t having web packaging, so I’m not sure what else we can do
Matt Garrish: It’s a circular problem: if we don’t have specific names, how do you find what you’re looking for - if you have something else finding the names, how do you prevent those from colliding?…
… trying to prevent the index.html problem from re-occuring, but I’m not sure how much of an issue it is…
Tzviya Siegman: This makes me uncomfortable too – it’s something we’ve always tried to avoid doing. In the world of scholarly publishing, if I have a journal of 30 articles, each published on their own, each will have this problem…
… I feel like this is going to come back to bite us…
Benjamin Young: My question was similar: if we have specified names for these things and a tree of inheritance where down a certain road you have index.html and down another road you don’t…
… is it possible to make a web audiobook in that world, or do they no longer intermingle?…
Charles LaPierre: Thinking about a journal made up of multiple article, wouldn’t each article be its own subdirectory, hence no collisions?
Ivan Herman: This whole filing thing reflects that what we’re using is a packaging format that isn’t Web friendly. And we know that, which is why we consider the current format as a lightweight temporary solution…
… I’d be happy if we had today a format which allowed me to refer to a URL for every file, and maybe we’ll have one before I retire. But we’ve agreed to define a lightweight packaging format now, and we have to live with it… we don’t really have a choice
… we could require a specific way of zipping which puts the file first in the zip file, which makes the publication more complicated, because I can’t just take a directory and zip it… this isn’t the solution…
Wendy Reid: We have to find a compromise
Ivan Herman: We have to accept the deficiencies of the system right now
Garth Conboy: I agree with Ivan. We dislike the alternatives more – anything that makes it harder is a no-no…
… there’s a manifest.json and index.html which are both magic names…
… the actual manifest can be standalone or included in the PEP…
… what are the changes that we propose to ensure there is no possible duplication?
Ivan Herman: What I’ve proposed: the first step the reading system does is locate the PEP. If it finds that, then it follows the processing steps that are described in the WPUB document…
… at first look at your own file, otherwise look for a Manifest file and that’s your manifest…
Matt Garrish: We’re making our WP format dependent on the packaging… I can live with this, but what if a better packaging format comes along in future?…
… would we drop these restrictions?
Ivan Herman: If we find a packaging format that allowed that, then yes
Laurent Le Meur: In future, this packaging will be used by publishers as a booster for leaving earth. When the publication is exposed on the Web, pure web packaging becomes important then
Benjamin Young: A general question: are we open to analyzing other formats for web archiving and distribution, the primary component being that they keep URLs around, or continue with zip?
Wendy Reid: If you recall a few weeks ago, we did open up the request for analysis of the different potential formats. They were analyzed based on the pros and cons of that table. If we missed anything in that table, it’s good to know about…
… we made the decision based on the ≈7 formats we looked at in that table
Ivan Herman: Let’s not reopen closed issues. For now we’ve decided to go with what we have, knowing that eventually the committee will produce a webby packaging format
… we explicitly said that if and when that happens, then this working group or its successor will look at it and consider it…
… but we need something today if we want to produce anything before the end of the life of the working group, less than 18 months from now
Proposed resolution: WP keeps PEP as a requirement, PWP will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present look for standalone manifest]), but one must be present in the package. (Garth Conboy)
Ivan Herman: we decided on something and shouldn’t reopen today
Bill Kasdorf: Quick question: if we’re seeing that not all packaged audiobooks are web publications, then we shouldn’t call them packaged web publications, right?
Laurent Le Meur: in fact we don’t.
Bill Kasdorf: Then they aren’t really PWPs, then
Proposed resolution: (Less typos version) WP keeps PEP as a requirement, Lightweight Packaging will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present, look for standalone manifest]), but one must be present in the package. (Garth Conboy)
Ivan Herman: +1
Garth Conboy: +1
Charles LaPierre: +1
Tzviya Siegman: +1
Matt Garrish: +1
Laurent Le Meur: +1
Rachel Comerford: +1
Ben Schroeter: +1
Joshua Pyle: +1
Bill Kasdorf: +1
Tim Cole: +1
Geoff Jukes: +1
Luc Audrain: +1
George Kerscher: +1
Mateus Teixeira: +1
Gregorio Pellegrino: +1
Resolution #3: WP keeps PEP as a requirement, Lightweight Packaging will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present, look for standalone manifest]), but one must be present in the package. {: #resolution3 .resolution}
Wendy Reid: We’ve made a decision, with 20min to spare… moving on to our next issue…
Ivan Herman: I propose that we merge the two requests from Laurent whenever he feels comfortable…
… reading that document via the pull request is a pain, and it’s better if we merge it in
Laurent Le Meur: I prepared the merge last week. We can work from that
Wendy Reid: If no opposition, we’ll merge as soon as Laurent is ready
Resolution #4: Laurent will merge the pull request as soon as he can {: #resolution4 .resolution}
@iherman

This comment has been minimized.

Copy link
Member

iherman commented Feb 18, 2019

Per resolution above and merged #30 closing this issue.

@iherman iherman closed this Feb 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.