Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging for audiobooks #352

Closed
HadrienGardeur opened this issue Oct 23, 2018 · 76 comments
Closed

Packaging for audiobooks #352

HadrienGardeur opened this issue Oct 23, 2018 · 76 comments

Comments

@HadrienGardeur
Copy link

HadrienGardeur commented Oct 23, 2018

As a follow-up to our discussions at TPAC, I'd like to submit a first proposal for what could become the packaging format for audiobooks:

  • all resources (including the manifest) are packaged together in a ZIP (a lighter take on OCF)
  • the audio resources should not be further compressed in the ZIP
  • the manifest has a well-known location at the root of our package: manifest.jsonld
  • the entry page has a well-known location as well: index.html
  • we drop the requirement for an entry page and its reference in the manifest
  • all resources contained in the package that are not listed under readingOrder in our manifest are considered part of the resource list
  • we define a dedicated media type (TBD) and file extension (TBD as well) to identify such packages, both of them would be specific to audiobooks only
@GarthConboy
Copy link
Contributor

With quick review, this looks like a very good starting point to me.

@llemeurfr
Copy link
Contributor

An intriguing part of the proposal is that the entry page is not required for audiobooks. But the ToC and other navigation lists are currently only defined in this entry page.

Where will they be defined then? in the manifest as as machine readible ToC?

@HadrienGardeur
Copy link
Author

But the ToC and other navigation lists are currently only defined in this entry page.

That's not the case. They're both identified in the manifest and can be included in other resources.

@llemeurfr
Copy link
Contributor

They're both identified in the manifest and can be included in other resources.

True. We can add therefore a feature of the packaging format for audiobooks:

  • the ToC and other landmarks a included in one or more html resources and referenced as resources from the manifest.

@HadrienGardeur
Copy link
Author

@llemeurfr but that's not specific to audiobooks or packaging them, that's why I don't think it's worth listing.

@HadrienGardeur
Copy link
Author

I'd like to upload an example but unfortunately most audiobooks are too large for our repo. Any suggestions how we should deal with that issue?

cc @iherman @GarthConboy @wareid

@GarthConboy
Copy link
Contributor

Back briefly to the TOC question. Yes, the TOC can be in a non-reading-order resource and referenced from the manifest -- this should work fine. However, it seems we may want some way to identify said resource as ONLY the the TOC (or allow the TOC to be encoded in the manifest), such that the UA/RS knows that said TOC-resource is not really an ancillary resource to be side-presented with the audio, it's just the machine processable TOC.

In practice, said side-presented resources (supplemental content) will likely be PDF's, but I'm not sure that type should be the key to identification.

@lrosenthol
Copy link

lrosenthol commented Oct 25, 2018

I strongly recommend against raw ZIP. There are a number of well known problems with it, which is why all standard ZIP-based packages start by addressing them. Since we don't want to recreate the wheel - I suggest two possible starting points.

@GarthConboy
Copy link
Contributor

Yea, OCF without mimetype (if we're really mad at it) -- to get the charset and file path "fixes" -- would be fine.

@dauwhe
Copy link
Contributor

dauwhe commented Oct 25, 2018

How much benefit do we get from using one packaging mechanism for EPUB3, a second packaging mechanism for Audiobooks, and a third packaging mechanism for the packaged version of WP? Can we just use OCF until we figure this out for everything?

@GarthConboy
Copy link
Contributor

Touché -- I have to say I'm less mad at mimetype than others. :-)

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Oct 25, 2018

@GarthConboy

Yes, the TOC can be in a non-reading-order resource and referenced from the manifest -- this should work fine. However, it seems we may want some way to identify said resource as ONLY the the TOC [...]

Well, we already have a rel value to indicate that the resource contains the TOC. If we go down the dual-approach for the TOC that I've suggested in #350, it will be even more clear that this is a document primarily meant to be processed rather than rendered.

[...] (or allow the TOC to be encoded in the manifest)

That's a different story altogether. We could use JSON of course, but I would advise against doing that just for audiobooks.

If you'd like to illustrate the difference:

@HadrienGardeur
Copy link
Author

@lrosenthol @dauwhe aside from the restrictions on file names, could you list the other benefits of using OCF?

We clearly don't need the mimetype file or META-INF/container.xml, yet they're both required in OCF.

@llemeurfr
Copy link
Contributor

llemeurfr commented Oct 25, 2018

You'll find here an ISO standard which specifies a zip profile that could certainly do what we need. Or being the base for a profile we can define (re. filenames constraints) as compatible with OCF, without the XML part.

It explicitly references EPUB OCF in a section about file names and interoperability (annex B).

@llemeurfr
Copy link
Contributor

Note that an alternative is to define and "OCF light", keeping only the OCF Zip Container (section 4), but removing the mediatype file section (4.3), keeping also the File Names section (3.4).

I like the Signature feature, but it may belong to another specification. Or we may decide to keep it also in such "OCF light".

@lrosenthol
Copy link

lrosenthol commented Oct 25, 2018 via email

@llemeurfr
Copy link
Contributor

Whatever the choice is for the editing of this spec, we should have a way to validate such packaging. I'm sure they are useful pieces in epubcheck. But are there other pieces of code that would be helpful?

@iherman
Copy link
Member

iherman commented Oct 29, 2018

I am uneasy about some aspects of the proposal. In our terminology a Web Audiobook is a special Web Publication, and a packaged version thereof is a special version EPUB4 or PWP (whatever the terminology we use, let us forget about that issue for the moment). Viewing it this way, this proposal sets a precedence that may, on long term, unduly influence how a future packed version of a WP may be. What I find questionable are:

  • the manifest has a well-known location at the root of our package: manifest.jsonld
  • we drop the requirement for an entry page and its reference in the manifest

We essentially throw away what I consider to be an essential element of flexibility we have in a Web Publication, creating a fairly strong bifurcation in our specs. After all, I could (maybe naïvely) imagine an audiobook consisting of an HTML file containing a TOC, whose entries are a series of HTML audio elements...

@iherman
Copy link
Member

iherman commented Oct 29, 2018

@HadrienGardeur

I'd like to upload an example but unfortunately most audiobooks are too large for our repo. Any suggestions how we should deal with that issue?

How big? Can't you put it somewhere on the cloud with a stable URL? If necessary, I can push it up on the W3C web site (but if it is big, I would have to do it while I am at the institute with a big enough bandwidth).

@HadrienGardeur
Copy link
Author

We essentially throw away what I consider to be an essential element of flexibility we have in a Web Publication, creating a fairly strong bifurcation in our specs.

@iherman

Sorry Ivan, but I have to strongly disagree with you here. In a package, we always need to have at least one well-known location. How is that throwing away an element of flexibility?

There's a big difference between dropping the requirement for an entry page and saying that it's actually forbidden. If you still want an entry page in your packaged publication, you'd be allowed to do that.

The entry page is primarily meant to:

  • provide a fallback for non WP-aware UAs
  • discover the publication (through the presence of a manifest)

In the case of packaged publications, we don't need such things IMO.

After all, I could (maybe naïvely) imagine an audiobook consisting of an HTML file containing a TOC, whose entries are a series of HTML audio elements...

Is there anything in the proposal restricting you from doing that? I don't think so.

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Oct 29, 2018

After discussing briefly with @iherman, it seems that he's more comfortable with having a well-known location for both:

  • the manifest: manifest.jsonld
  • and the entry page: index.html

This would make it easier to create "single resource in the reading order" publications where the manifest is embedded in index.html.

This doesn't really change my mind about making the entry page optional rather than required but I think it's a good compromise overall.

@llemeurfr
Copy link
Contributor

Re. an entry page, optional, as index.html: I join such a compromise.

@iherman
Copy link
Member

iherman commented Oct 29, 2018

It is not a compromise, it is a consensus:-)

@GarthConboy
Copy link
Contributor

I just thumbs up-ed the above... with the view that the entry page would be optional at least for audiobooks... just checking, is that the consensus?

@HadrienGardeur
Copy link
Author

I've tweaked the first post and added the well-known location for the entry page as well, this way we have a full list for the proposal which could be discussed in a future WG call.

@iherman
Copy link
Member

iherman commented Oct 29, 2018

@HadrienGardeur just to be clearer:

we drop the requirement for an entry page and its reference in the manifest

the entry page, if present, must have the same structure than in the WP, ie, it must have a reference to the manifest, or may embed it. What is proposed to be dropped is the requirement for the very existence of the entry page, not its structure.

@HadrienGardeur
Copy link
Author

@iherman

I'm certainly not suggesting that the entry page should be structured differently.

What I'm saying is that:

  • we don't need to have the url term in the manifest for a packaged publication (this affects the JSON Schema for the manifest)
  • we don't need to have an entry page itself in the packaged either

@HadrienGardeur
Copy link
Author

How big? Can't you put it somewhere on the cloud with a stable URL? If necessary, I can push it up on the W3C web site (but if it is big, I would have to do it while I am at the institute with a big enough bandwidth).

@iherman

I'd rather upload the example somewhere in the cloud that's not tied to any of my personal accounts, since someone else than me might need to update it.

The packaged version of Flatland should be roughly 240-250 Mb.

@iherman
Copy link
Member

iherman commented Nov 5, 2018

@HadrienGardeur that is fine, but at least temporarily you will have to put it somewhere on the cloud, because I would expect email clients to have problems with such an attachment. Once I get hold of the file, I can push it up on W3C at some www.w3.org/2018/11/XXX URL, which can be then changed later (by my or some other team member) if necessary.

@danielweck
Copy link
Member

Just a note about "ingestion format" vs. "end-user format": on multiple occasions I heard the term "distribution format" used to describe what I personally interpret as a B2B "interchange format". This notion of "distribution" really depends on "who distributes to whom", it's a question of perspective :)
(same with "delivery format")

So, this kind of terminology can easily be misconstrued if we don't define the context carefully, and some of us might get lost in translation during our conversations. There are quite a few intermediaries along the digital supply chain (content creation / authoring, publishers, libraries, accessibility remediation, reading systems, etc.). I'm no expert, but I imagine that audio books production + distribution (that word again!) involve a very different workflow than ; say ; trade e-books, comic books, scholarly publications, etc. (which is why we're discussing TOC and packaging issues, notably)

So, as we aim to clarify use-cases specifically for packaged audio books (e.g. "ingestion" / "interchange") vs. generic packaged web publications (e.g. "delivery" / "distribution"), let's also try to disambiguate the terminology :)

@HadrienGardeur
Copy link
Author

@danielweck I agree that it can become difficult to follow such discussions and for audiobooks in particular there's IMO a lot of bias due to how things are deployed right now by large companies (each of them inventing their own format when delivering content to users).

IMO we should align this packaged audiobook format with the same use cases as EPUB.

Reading an EPUB in a Web app is IMO a mistake: that's not something that EPUB was designed for and you need to jump through many hoops to achieve something that works decently. That's why I would recommend that we exclude such a use case for packaged audiobooks (at least as long as we don't have Web Packaging ready), since this can be handled in a much better way by WP.

@llemeurfr
Copy link
Contributor

@lrosenthol could you please add details from your comment #352 (comment) ?

What are the character issues treated in OCF and not in the ISO std? what what is this signature issue?

@dauwhe
Copy link
Contributor

dauwhe commented Dec 10, 2018

There is an existing audiobook packaging format, M4B. It supports a cover image, some metadata, track names, etc.

@lrosenthol
Copy link

lrosenthol commented Dec 11, 2018 via email

@lrosenthol
Copy link

@llemeurfr
The filename issue can be found in the OCF spec at http://www.idpf.org/epub/31/spec/epub-ocf.html#sec-container-filenames where it goes into details on a subset of valid names, which must be encoded as UTF8

For the DigSig issue, see section 5.2 of ISO 23120 where it discusses differences from the ZIP Appnote including not supporting ZIP's native DigSig. (but as with encryption, yes, you could use your own)

@iherman
Copy link
Member

iherman commented Dec 11, 2018

There is an existing audiobook packaging format, M4B. It supports a cover image, some metadata, track names, etc.

Is there a description of that file format? All pages that I stumbled into are very superficial, and I want to know whether it is really "just" a file format that can contain anything that we define, or whether we are forced to abide to some restrictions.

Also, it worries me that the standard itself is, as often with ISO documents, behind a paywall. I do not think it would go down well when all other standards we use and refer to are available for free.

@dauwhe
Copy link
Contributor

dauwhe commented Dec 11, 2018

Is there a description of that file format? All pages that I stumbled into are very superficial, and I want to know whether it is really "just" a file format that can contain anything that we define, or whether we are forced to abide to some restrictions.

It is indeed frustrating trying to find out more about this. But I think this is work we should do—the fact there is an existing audiobook standard demands close examination. I was able to make an audiobook in this format with an existing program (which admittedly cost me $US 5.26), and it worked perfectly in iTunes. Attempts to install command-line tools to examine the format directly have so far failed (my operating system is too old for Homebrew).

Also, it worries me that the standard itself is, as often with ISO documents, behind a paywall. I do not think it would go down well when all other standards we use and refer to are available for free.

That's also maddening. But EPUB normatively references ISO 8601, which costs CHF 138! HTML normatively references ISO 3166, which costs a mere CHF 38.

We should talk to David Singer about this; he mentioned the use of MPEG as a packaging technology at TPAC Lyon.

@lrosenthol
Copy link

@dauwhe the stuff that David Singer mention is MPEG Part 12, while the audiobook stuff is using MPEG Part 14. Related, yes, but not exactly the same.

@iherman
Copy link
Member

iherman commented Dec 11, 2018

We should talk to David Singer about this; he mentioned the use of MPEG as a packaging technology at TPAC Lyon.

Actually, I saw his name appearing on one of the documents around MPEG as editor (or something similar) so he can certainly be very helpful with this.

@TzviyaSiegman, he is an AB buddy, right?

@llemeurfr
Copy link
Contributor

llemeurfr commented Dec 11, 2018

MPEG-4 Part 14 specs the .mp4 (or .m4a or .m4b for Apple + audio (+ bookmarks)) file format.
We are looking for a packaging format which can contain multiple mp4 files, with WP defined metadata ... not of stream of media objects with a few mpeg defined metadata (or XMP metadata).
-> Not the same logical level.

@murata2makoto
Copy link

I would strongly recommend against using ISO 21320 for your package normative reference for three main reasons.

@lrosenthol

Although I am not sure if we should reference ISO 21320, I would like to make corrections.

21320 is nothing but PWARE ZIP except those features requiring license fees.

1 - It doesn't properly address various well known file naming situations
(eg. proper Unicode and platform incompatibilities) which OCF/UCF do.

Surely, it does not. Neither OCF or UCF do. Unforunately, it is too late to fix
ZIP implementations. More about this, see the annex of ISO 21320.

2 - It disallows encryption, which would not be good for those publishers
requiring some form of DRM

ISO/IEC 21320 merely disallows PKWARE encryption, which uses license fees. OCF does the same thing. It is certainly possible to add encryption on top of 21320.

3 - It disallows DigSig, which would prevent proper tamper detection.

Again, I do not think that this is true. Digital signatures can be added on top of 211320.

@danielweck
Copy link
Member

MPEG-4 Part 14 specs the .mp4 (or .m4a or .m4b for Apple + audio (+ bookmarks)) file format.
We are looking for a packaging format which can contain multiple mp4 files, with WP defined metadata ... not of stream of media objects with a few mpeg defined metadata (or XMP metadata).
-> Not the same logical level.

Agreed, but shouldn't we also document the rationale for the in/out-of-scope status of the m4b format, based on its merits/drawbacks, as per the use cases defined for (audio) Web Publications? This is important because from a UX perspective, there isn't a significant functional gap (at least on the surface). For example:
https://player.cantookaudio.com/aHR0cHM6Ly9yZWFkaXVtLm9yZy93ZWJwdWItbWFuaWZlc3QvZXhhbXBsZXMvRmxhdGxhbmQvbWFuaWZlc3QuanNvbg==
=> this "looks" exactly like your run-off-the-mill m4b audiobook player, yet it is actually based on ReadiumWebPubManifest (if I am not mistaken).

@HadrienGardeur
Copy link
Author

For example:
https://player.cantookaudio.com/aHR0cHM6Ly9yZWFkaXVtLm9yZy93ZWJwdWItbWFuaWZlc3QvZXhhbXBsZXMvRmxhdGxhbmQvbWFuaWZlc3QuanNvbg==
=> this "looks" exactly like your run-off-the-mill m4b audiobook player, yet it is actually based on ReadiumWebPubManifest (if I am not mistaken).

I'm not entirely sure which point you're trying to make @danielweck but to provide additional context:

  • this is based on the audiobook profile of the Readium Web Publication Manifest, not a packaged version of it
  • the Web App itself can handle audiobooks in multiple media types and bitrates (if expressed in the manifest) and prioritize them accordingly (order listed in the <audio> element)

This can be a good example to illustrate how an audiobook can be published as a WP, but it doesn't feel relevant to me in a discussion about packages.

@danielweck
Copy link
Member

Sorry to have briefly digressed into the UX perspective, but I imagine that stakeholders in the audiobook business will need convincing that "packaged web audio publications" solves a problem they can't already address with the m4b format. I appreciate that this is a "meta" level concern, and of course I am also well aware that m4b cannot be used to container-ize Web Publication resources without significant, non lossless transformations. So my point was that we should not dismiss an established technology without explaining why.
PS: coincidentally, I recently purchased a 15h audio book (companion to a hardcover book) in m4b format with chaptering, cover image, metadata, etc.

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Dec 13, 2018

Most container formats that I'm aware of for audio/video tend to be specifically tied to a file format and/or codec.

Can you package Opus files in an M4B for example?

@llemeurfr
Copy link
Contributor

@daniel, m4b is an Apple extension of the mp4 format. Not a standard. Only Apple players use its extended features.

  • I'm not sure that the bookmarks it provides (the b in the name) are hierarchical as the TOC this new format can provide. And we can have a beautiful & accessible & UTF-8 HTML TOC. Maybe m4b list of bookmarks don't go that far.

Add the metadata we aim to provide, and the fact that an m4b is ONE big file (vs multiple audio files for audiopub, easier to produce maybe), like a publication is not only one huge HTML file. But this is an advantage for the producer, not the user.

@danielweck
Copy link
Member

Only Apple players use its extended features.

I am not an Apple customer and I read audiobooks in m4b format, so maybe the adoption of this format not confined to the Apple ecosystem?

As I said, the transformation from Audio-Web-Publications to m4b would not be lossless, and anyway this is certainly not something I am advocating right now. I am merely reporting the fact that ; as others have done ; this format exists and seems to appeal (perhaps "by default") to some publishers. I am sure that Packaged-Audio-Web-Publications will be better ;)

@lrosenthol
Copy link

lrosenthol commented Dec 13, 2018 via email

@HadrienGardeur
Copy link
Author

OCF and UCF both have requirements and restrictions concerning the naming of files in the ZIP central directory.

Same thing for ISO 21320, check Annex B which even references OCF, UPC and UCF.

@dwsinger
Copy link

I will find out what happens in m4b, for sure. The offer I made was of the HEIF format, which is a specialization of the formats developed for MP4 (widely used) and MPEG-21 (unused), with the latter being re-purposed to carry images rather than 'digital items'.

The attached slides are what I hoped to show at TPAC. In essence, HEIF allows the storage of 'items', which have types (4 character code, or MIME types), simple names (which can be used in relative URLs, 'as if' the item were a separate file coming from the same place as the package), references (typed, directional, so one can see dependency), and identifies a 'primary item' entry point (e.g. the main HTML page, for this case). HEIF is a moderately abstract base-layer on which building a modern image file format was surprisingly easy; and it allows for both timed and untimed material in the same package.

If we could combine this with something the visual publishing world needs -- the ability to cause timed update of the HTML etc. -- I think we might have something very powerful.

It's an offer, and something to be aware of, and I'd be happy to entertain questions or get around a whiteboard.

heif preso.pdf

@dauwhe
Copy link
Contributor

dauwhe commented Dec 14, 2018

@dwsinger That's really interesting. How you envision this sort of package being consumed? My browsers have no idea what m4b files are, but iTunes is happy to play them. If a packaging format based on HEIF largely contained web content, can you imagine a future where a web browser could display all the content directly?

I met a mountain guide in Canada last winter. He’s created a really complex publication about avalanches. He doesn’t want to distribute it as an ebook, as most reading systems can’t handle the JavaScript, and many end users can’t easily figure out how to obtain a reading system and side-load an EPUB. He just wants something he can email to a person so they can double-click and have it open in a browser. PDF eventually attained that level of ease. Some of us want that for web stuff.

@dwsinger
Copy link

dwsinger commented Jan 4, 2019

I think there may be an opportunity here also for convergence; the media business (videos, audio) are also wanting a packaged interactive format. And there is a product spectrum here -- book, book with embedded audio/video, book with spoken audio, audiobook, TV/video program...

@murata2makoto
Copy link

Long time ago, in W3C, there was an attempt to create a ZIP-based package format. In my understanding, it failed because different applications had different priorities.

@HadrienGardeur
Copy link
Author

More than two months and 70+ comments later, it doesn't feel like we've made significant progress or diverged from my initial comment.

all resources (including the manifest) are packaged together in a ZIP (a lighter take on OCF)

This seems to be the preferred option in early 2019 as well.

the audio resources should not be further compressed in the ZIP

I haven't seen a single mention regarding compression of resources that are already optimized (audio, video, images). Probably worth considering in these discussions.

the manifest has a well-known location at the root of our package: manifest.jsonld
the entry page has a well-known location as well: index.html

Both well-known locations are listed in the OCF Lite draft.

we drop the requirement for an entry page and its reference in the manifest

I think that this is still very relevant. Forcing authors/publishers to create something that they don't need or usually produce is completely counter-productive.

We've been down that road before with EPUB FXL (not allowing images in "spine") and we're still paying the price for this today (with a mix of distributors that pre-process such files and reading apps like Kobo who use a dual-rendering engine approach).

all resources contained in the package that are not listed under readingOrder in our manifest are considered part of the resource list

This hasn't been discussed again. Probably worth considering a bit more (like compression on optimized resources).

we define a dedicated media type (TBD) and file extension (TBD as well) to identify such packages, both of them would be specific to audiobooks only

This is still under discussion.

@BigBlueHat
Copy link
Member

Long time ago, in W3C, there was an attempt to create a ZIP-based package format. In my understanding, it failed because different applications had different priorities.

There's also the later attempt (2012) at "Packaged Web Apps" aka widgets: https://www.w3.org/TR/widgets/

Interestingly, Google had Gears, Firefox had Firefox OS packaged apps, but ultimately they've deprecated those in favor of Web-distributed installable Web apps.

Consequently, our distribution and consumption models need analysis as we consider the packaging concerns. Ideally, the manifest and contents of the publication would need no changes when packaged or unpackaged such that publishers can create "a publication" and then determine the best distribution models for their business and content types.

@wareid
Copy link

wareid commented Jan 30, 2019

This issue was resolved by a discussion in the meeting on January 28.

  • RESOLVED: PWG will adopt a light-weight version of Zip, based on <a href="https://www.iso.org/standard/60101.html,">https://www.iso.org/standard/60101.html,</a> with some restrictions and additions for WP
View the transcript packaging
Tzviya Siegman: #390
Tzviya Siegman: The next item is packaging. Wendy did a nice summary — but since the last meeting we’ve defined success criteria. Here is the issue with success criteria. Those who attended the AB publishing meeting, we got to see how things stand with packaging.
… we got info about the HEIF format — and Dave did a presentation on OCF-lite. We heard from the Chrome team about the current incubating format that google proposed. All that said, we have the success criteria put out.
… We are all but agreed agreed on OCF-Lite — but that’s a bad term for it because it’s possibly misleading…
Dave Cramer: I have a question about what we are specifying — it came up in a packaging thread. There is an ISO standard that basically is very close to being a subset of ZIP that OCF uses. It mentions restricting the types of compression — it has an appendix on restricting filenames…
… it’s also oddly, for an ISO standard, you can get the PDF without paying a bunch of Swiss franks. I think it would be good if we didn’t copy a whole bunch of text from OCF, but borrowed things that work like folder locations…
Tzviya Siegman: Dave — so you’re in favor of using a restricted format of ZIP instead of rewriting a spec?
Dave Cramer: I’m worried that we have an OCF spec, and we could be normatively adding more items. If we’re saying we want to use ZIP the ISO zip has some nice items in there.
Dave Cramer: https://www.iso.org/standard/60101.html
Dave Cramer: which is ISO/IEC 21320-1:2015
Garth Conboy: I think we could probably make — I don’t disagree with Dave — I think we could make a resolution to do something like OCF-Lite. Whether built up from the ISO Zip or down from OCF — I think we’re in the same neighborhood.
… One decision is about creating compatibility with meta-inf. We’re in the neighborhood that we want to use something zip based with minimal restrictions. We can probably resolve that now and there is some work to do if it’s build up or build down.
Nick Ruffilo: .. but we could resolve if it will work for audiobooks, but maybe other profiles for different uses of WP…
Ivan Herman: +1 to Garth
Laurent Le Meur: I’m not against the ISO standard as a base, but there are several items to discuss. the ISO 21320 spec only prohibits some characters — the OCF original prohibits other characters — so it’s not 100% compatible. In the ISO standard, there is a table that says that digital signatures is not allowed…
… but is not described in the OCF. It won’t be enough for a packaging mechanism, we need to specify what the name of the manifest file is — so we have some details of packaging that needs to be described that won’t be in the ISO spec. This is why in the draft I made before, I had chosen to copy the OCF spec and remove things.
… even if I make reference to the ISO spec, most of the language that is there will still be there.
Ivan Herman: First of all — accepting what Laurent had said — it will put our document/work on a more solid basis if it was based on ISO spec. It should be a starting position. Whether we want to be compatible with OCF on characters — we can go through that…
… The starting position should be the ISO — it will help in talking with TAG and others. The original reason I was on the queue is that our position goes — whatever we define here is not to define specifically and exclusively for audiobooks.
… whatever we do, we should try to be as generic as possible. If tomorrow, another profile comes up that is similar to audiobooks — manga came up — the starting position should be that the document Laurent creates is a lightweight format for publishing in general, which happens to be used by audiobooks.
Nick Ruffilo: +1
Tzviya Siegman: +1 to ISO
Laurent Le Meur: +1 to a generic packaging format for WP
Bill Kasdorf: +1 to Ivan
Wolfgang Schindler: +1 to Ivan — generic packaging format for WP
Garth Conboy: I would agree with that. I didn’t mean to imply it was only audiobooks, but that there could be more formats in the future. If we have to dream up another name, we can. We can get to what Laurent drafted by building up from the ISO spec…
… we clearly will have to have a well-known place for the manifest. I hope that we can say we are building a lightweight zip format based upon the ISO spec with additional restrictions and rules for WP
Laurent Le Meur: If agreed, I can modify the draft to reflect that…
Proposed resolution: PWG will adopt a light-weight version of Zip, based on https://www.iso.org/standard/60101.html, with some restrictions and additions for WP (Tzviya Siegman)
Tzviya Siegman: I think we should talk about if we need a stand-alone document
Ivan Herman: There are things we need to describe somewhere, so we need to have a document for this. Where you find the manifest, etc. It may only be one page but it must be there.
Ivan Herman: +1 to the proposal
Garth Conboy: +1 to proposal (too)
Wolfgang Schindler: +1 to the proposal
Dave Cramer: Just thinking about Tzviya’s comment about what we need for a document. OCF is 2 specs — there is an OCF abstract container, and then the zip container. The later includes all the ISO stuff, the former is ‘everything has to be the same folder and you have to put a container.xml here’
… not sure how we move the abstract container stuff into our spec…
Laurent Le Meur: +1 to PWP…
Laurent Le Meur: PPWP for Pragmatic PWP
Luc Audrain: +1
Tim Cole: +1
Tzviya Siegman: we do have a document — that is a shell — PWP — packaged web publications — that way we get around writing a package document. We have the information associated with packaging — anything else — but Matt might get annoyed if we change the short title.
Ivan Herman: +1 to dauwhe
Dave Cramer: My concern about putting this under PWP — will this make it feel like we’re rejecting all other attempts at the web for doing packaging?
Garth Conboy: When I typed PWP I almost typed “profile 1” but I am sympathetic to Dave’s comment. We can be clear that “this is what we’re doing now, we hope to use it in the future, but it’s not a hard decision that there won’t be another format in the future”
Ivan Herman: That’s why there was an email where we set up the limits and milestones for the upcoming year as a “lightweight packaging format”. There might be a heavyweight coming in the future, but I agree with Dave. We should be careful. I’m cautious using PWP.
… We have talked too much about it being all the solutions to the miseries of the world, so coming up with this will lead to ugly discussions.
Tzviya Siegman: +1
Laurent Le Meur: +1 to the vote
Tzviya Siegman: the document we’re talking about is very short — Light Weight Packaging Format.
Joshua Pyle: +1
Bill Kasdorf: +1
Wolfgang Schindler: +1
Mateus Teixeira: +1 to proposal
Nick Ruffilo: +1
George Kerscher: +1
Gregorio Pellegrino: +1
Resolution #3: PWG will adopt a light-weight version of Zip, based on https://www.iso.org/standard/60101.html, with some restrictions and additions for WP
Tzviya Siegman: Moving on — what are our next steps?
Laurent Le Meur: I have to modify the draft, remove everything about the characters. Replace that will reference to the ISO zip format. Rename what is PWP inside this document — we can use something like LWPF until we choose a final name. I can keep it as ocf-lite but we can rename it to something else when we chose the name..
… no one will see PWP, but next week would be good to chose a final name.
Ivan Herman: Great! A question we will have to decide is whether this document is on a Rec track or not. The ISO part makes this much easier. That means the only thing we have to test as part of the CR procedures are the ones we add — not the packaging/unpackaging. Which is very helpful.
… We will have to decide if it’s rec track or not. Personally I think it should be a rec track document.
Tzviya Siegman: I think referencing the ISO document makes it much shorter. The only things needed in our document are adjustments — so it becomes very slim.
Laurent Le Meur: but there’s no mention of font obfuscation, but still there is an issue where we have to discuss that…
Tzviya Siegman: For now, leave it out, we can add it later…
Dave Cramer: as someone who has spent too much time with the OCF spec, I hope we can write something clear about what is expected from authors and user-agents. OCF is a bit loose about what is happening with packaging
Laurent Le Meur: as a first step for next week, we should discuss which kind of template or reading system behaviour. There are many ways to specify the user agents, so I would like some input from the group on which kinds of writing we should put.
Tzviya Siegman: There will be a placeholder in the explainer — and we can work on getting the document drafted in the next few weeks

@wareid wareid closed this as completed Jan 30, 2019
PWG Task Management automation moved this from Discuss - Call to Editorial Jan 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
PWG Task Management
  
Editorial
Development

No branches or pull requests