Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information content of the abstract manifest #6

Closed
dauwhe opened this issue Jul 5, 2017 · 105 comments
Closed

Information content of the abstract manifest #6

dauwhe opened this issue Jul 5, 2017 · 105 comments

Comments

@dauwhe
Copy link
Contributor

dauwhe commented Jul 5, 2017

From @dauwhe on June 27, 2017 14:33

What information is required for an abstract manifest? [edited to add items from comments]

  1. An identifier for the web publication, which should be a URL
  2. Some way of saying that this URL represents a web publication.
  3. Some way of identifying the constituent resources of the web publication.
  4. Some way of providing a preferred order of (some of) the constituent resources in case there is more than one
  5. Some way of being able to add more complex metadata to a publication. (Not clear to my mind whether we would define a minimally required set of metadata, but the slot should be there.)
  6. Locating table of contents or other navigation structure

What else? I think we should distinguish required information from "nice to have" information.

Copied from original issue: w3c/publ-wg#12

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @GarthConboy on June 27, 2017 14:56

I'd also throw in:

-- Reading order
-- Basic metadata (yes, a can of worms we'll need to open)

Re the #1 and #2 just above in Dave's original issue, it seems they may want to be pre-manifest -- defined before the manifest is found, or be the actual path to the manifest (or to a "first file" that can be rendered, but also somehow points to the manifest).

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @iherman on June 27, 2017 14:56

  1. Some way of providing a preferred order of (some of) the constituent resources in case there is more than one
  2. Some way of being able to add more complex metadata to a publication. (Not clear to my mind whether we would define a minimally required set of metadata, but the slot should be there.)

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @iherman on June 27, 2017 14:56

(Wow. I just said the same thing as Garth just in other words. I swear we did not conspire...)

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @mattgarrish on June 27, 2017 15:54

What is meant by required here? Must always be present or must be accounted for in the design? This is why I wasn't sure at the f2f if navigation constituted a top-level or lower-level consideration.

A standardized means of locating the table of contents seems critical to me, even if it's optional to define and there are no epub-like rules on its construction.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @GarthConboy on June 28, 2017 16:2

The updated #6 in the first panel says "Locating table of contents or other navigation structure", we should also consider:

-- Do we need such a Nav file (likely yes for A11Y)
-- Should it be in the Manifest or pointed-to by the Manifest (I could see an argument for all eggs in one basket -- though the machine readable or renderable discussion will arise)

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

Do we need such a Nav file (likely yes for A11Y)

See #14

Should it be in the Manifest or pointed-to by the Manifest

Interesting question. I know Hadrien has proposed including section titles in a JSON manifest, but I have major concerns about possible reader-facing text in JSON (especially given that there's a standard html way to do this stuff).

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 2, 2017 20:27

I know Hadrien has proposed including section titles in a JSON manifest, but I have major concerns about possible reader-facing text in JSON (especially given that there's a standard html way to do this stuff).

IMO the Navigation Document in EPUB 3 is a failed experiment. Most EPUB 3 documents that I've seen end up including at least two HTML table of contents:

  • a nice looking one, included in the spine and not marked as being a Navigation Document
  • a basic one, used as the Navigation Document

Most EPUB 3 reading systems do not render these Navigation Documents either, they simply parse them, extract the info and display things using their own UI.

This is a typical example of "spec purity" (the beauty of the Navigation Document) vs real world usage (no one is rendering these documents and we end up with more redundancy instead of less).

Readium (1, JS and 2) ended up parsing the info in the Navigation Document and providing a JSON output instead, which is much easier for developers to work with.

In the Readium Web Publication Manifest:

  • there is absolutely zero requirement for a table of contents (I strongly believe that we shouldn't force a ToC on single resource publications that won't need one)
  • all the different ToC types that exist in EPUB are parsed (NCX, landmarks in OPF and Navigation Document) and exposed in a consistent way (collections) in the manifest
  • we also keep links to the Navigation Document in spine or resources and identify them as such using a rel value

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 2, 2017 20:35

To go back to the initial question, in Readium we separate clearly the abstract model with the minimal requirements for a manifest.

The abstract model has three core concepts:

  • metadata (based on JSON-LD)
  • links
  • collections (identified by a role, can aggregate metadata, links and other collections)

For each core concept, we make sure that:

  • the requirements are very basic
  • the model is flexible and powerful enough to allow the expression of complex use cases
  • a number of extensibility points are available and clearly identified

The basic requirements for a manifest are then based on that model:

  • a manifest should at least contain a title in its metadata
  • it should at least contain a link to itself, identified by the self relation
  • it should contain at least one resource in its spine collection, which contains the key resources for a publication in reading order

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @llemeurfr on July 3, 2017 12:43

An identifier for the web publication, which should be a URL

Better, an IRI because a) may be a urn (up to the publisher to choose, the Web doesn't care) and b) i18n is important. A URL to the origin is also important but should be another property.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @WSchindler on July 3, 2017 14:47

I would like to add:
7. language(s) used in the WP - the plural is due to the fact that we will have publications such as parallel texts (original + one or more translations), bilingual dictionaries which contain 1-n languages . The language used has also implications for rendering (e.g. "ltr" vs "rtl", vertical layout)

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 3, 2017 14:50

Language and direction (ltr vs rtl) should be two separate metadata. Agree that we need to allow more than one language.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @lrosenthol on July 3, 2017 22:5

If we plan to use anything other than a URL (as defined by the HTML spec -
https://www.w3.org/TR/WD-html40-970917/htmlweb.html), then we are going to
need to be willing to jump into the current battle between the W3C and the
IETF on the definition of URL/URI/IRI etc. Here is an old blog entry about
it - http://intertwingly.net/blog/2014/10/02/WHATWG-URL-vs-IETF-URI

On Mon, Jul 3, 2017 at 8:43 AM, L. Le Meur notifications@github.com wrote:

An identifier for the web publication, which should be a URL
Better, an IRI because a) may be a urn (up to the publisher to choose, the
Web doesn't care) and b) i18n is important. A URL to the origin is also
important but should be another property.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
w3c/publ-wg#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AE1vNUBV20dmP2MLDyjT0lS3eVlEeU8gks5sKOHjgaJpZM4OGuBw
.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @llemeurfr on July 5, 2017 10:22

Re. URL vs IRI, after reading https://www.w3.org/International/wiki/IRIStatus, I must admit that this seems like a can of dirty warms. Apart from trying to allow for an extended i18n of publication identifiers, there is still the question of URNs allowed or not as global identifiers. For instance, I spotted that most @HadrienGardeur's Manifest samples use isbn urns as identifiers.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 5, 2017 12:47

@llemeurfr you're mixing up two different concept regarding the Readium Web Publication Manifest.

Keep in mind that we started this work in the context of BFF and that for Readium-2 we mostly ingest EPUB files.

The only requirement in the draft document for the Readium WebPub Manifest is to always provide a self link. In the context of a Web Publication it makes perfect sense: if a publications lives on the Web, we need a URL that can point to its manifest.

Here's a basic example using the Readium WebPub Manifest model:

"@context": "http://readium.org/webpub/default.jsonld",
"metadata": {
  "title": "The Master and Margarita"
},
"links": [
  {"rel": "self", "href": "http://example.com/manifest.json", "type": "application/webpub+json"}
],
"spine": [
  {"href": "http://example.com/chapter1", "type": "text/html"}
]

If the publication has an additional identifier, this can be provided in its metadata:

"metadata": {
  "title": "The Master and Margarita",
  "identifier": "urn:isbn:9780141180144"
}

That second identifier is not a requirement in the Readium model, and we can't expect all Web Publications to have such an identifier either.

The reason why most of our current samples have URNs (mostly for ISBNs or UUIDs) is because we ingest EPUB files or provide samples for books where ISBNs are very common.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

I would like to add:
7. language(s) used in the WP - the plural is due to the fact that we will have publications such as parallel texts (original + one or more translations), bilingual dictionaries which contain 1-n languages . The language used has also implications for rendering (e.g. "ltr" vs "rtl", vertical layout)

My only concern is that HTML already has mechanisms for describing the language(s) of content. What happens when a user agent opens an HTML page declared with language A, finds a rel=manifest link, follows it, and sees language B declared?

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 5, 2017 13:11

My only concern is that HTML already has mechanisms for describing the language(s) of content. What happens when a user agent opens an HTML page declared with language A, finds a rel=manifest link, follows it, and sees language B declared?

The manifest declares the language for the publication, while HTML is meant to declare the language for that resource.
The UA would simply set the default to language B but override that option with language A as it displays or interacts with that HTML page.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @llemeurfr on July 5, 2017 14:4

you're mixing up two different concept regarding the Readium Web Publication Manifest.

That's right. If a Web publication is copied to another website, this value will not be modified. Therefore a possible definition of the self link is "The original location of the Web Publication", which can be aligned with Requirement 8 for Web Publications: "There should be a way to uniquely identify a Web Publication."

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 5, 2017 14:10

From RFC5988:

o Relation Name: self
o Description: Conveys an identifier for the link's context.
o Reference: [RFC4287]

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @WSchindler on July 5, 2017 15:36

It's of course true that via @lang or @xml:lang, you may define the language(s) used in your HTML. I still think that the point of entry for a UA consuming a WP would be the manifest where it would be helpful to find an information on the languages used in the WP. If you have a Chinese-English dictionary, it is IMO no trivial task to prepare the rendering.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @lrosenthol on July 5, 2017 16:15

Actually, I would expect the UA to completely ignore the language settings
(A, in this case) in the manifest - and only concern itself with the actual
resource being processed/rendered (B, in this case). The language (or
languages) in the manifest have no bearing on the actual content - they are
(IMO) informational only.

On Wed, Jul 5, 2017 at 9:11 AM, Hadrien Gardeur notifications@github.com
wrote:

My only concern is that HTML already has mechanisms for describing the
language(s) of content. What happens when a user agent opens an HTML page
declared with language A, finds a rel=manifest link, follows it, and sees
language B declared?

The manifest declares the language for the publication, while HTML is
meant to declare the language for that resource.
The UA would simply set the default to language B but override that option
with language A as it displays or interacts with that HTML page.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
w3c/publ-wg#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AE1vNbw7uxWapNOfZZN7r09Gmn2AxeqPks5sK4uKgaJpZM4OGuBw
.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @lrosenthol on July 5, 2017 16:16

If a Web publication is copied to another website, this value will not be modified

That's not necessary true. The new site may well change the link(s) in the
manifest. There is nothing about it that is "off limits" - certainly not
in a WP, and possibly not even in a PWP.

On Wed, Jul 5, 2017 at 10:04 AM, L. Le Meur notifications@github.com
wrote:

you're mixing up two different concept regarding the Readium Web
Publication Manifest.

That's right. If a Web publication is copied to another website, this
value will not be modified. Therefore a possible definition of the self
link is "The original location of the Web Publication", which can be
aligned with Requirement 8 for Web Publications: "There should be a way to
uniquely identify a Web Publication."


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
w3c/publ-wg#12 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AE1vNRbejRAPPpj2OsrzKSZptKCwspLPks5sK5gCgaJpZM4OGuBw
.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 5, 2017 16:21

Actually, I would expect the UA to completely ignore the language settings (A, in this case) in the manifest - and only concern itself with the actual resource being processed/rendered (B, in this case). The language (or languages) in the manifest have no bearing on the actual content - they are
(IMO) informational only.

While rendering content, sure I fully agree. But a UA can provide additional services on top of it, for example dictionaries or search. The publication metadata can be useful in that regard.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @mattgarrish on July 5, 2017 16:21

I would expect the UA to completely ignore the language settings
(A, in this case) in the manifest

I agree it's informative and must not be used for rendering content (or metadata), but the same question about value has been raised in epub revisions and the case has been made that it does have uses (e.g., pre-loading tts languages, offering access to dictionaries, etc.).

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @lrosenthol on July 5, 2017 16:23

On Wed, Jul 5, 2017 at 12:21 PM, Hadrien Gardeur notifications@github.com
wrote:

But a UA can provide additional services on top of it, for example
dictionaries or search. The publication metadata can be useful in that
regard.

It could indeed be useful - and whether a UA chooses to use it for that or
not is (IMO) out of scope for our work.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 5, 2017 16:24

It could indeed be useful - and whether a UA chooses to use it for that or
not is (IMO) out of scope for our work.

Defining the UA behavior is out of scope, but making sure that it has relevant info needed is definitely within scope.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

IMO the Navigation Document in EPUB 3 is a failed experiment. Most EPUB 3 documents that I've seen end up including at least two HTML table of contents:

a nice looking one, included in the spine and not marked as being a Navigation Document
a basic one, used as the Navigation Document

I'm only responsible for around 25,000 EPUBs, but I've never seen an EPUB with two HTML tables of contents.

  1. Do other publishers here commonly do this?

  2. If so, why? I know that some reading systems don't support things like the hidden attribute, or list-style-type: none.

  3. Do others consider the nav document to be a failure?

@GarthConboy
Copy link
Contributor

Well, our reading system, FWIW, builds its in-UX TOC from the Nav document (or NCX, in the old days) -- only very rarely is this sufficiently "basic" as to not be the one the user expects to use for actual navigation.

And, I'd suspect our A11Y community would not consider a global navigation document as a failed experiment.

(and I'm certainly willing to admit that there are numerous things we invented from whole cloth in EPUB-land that would deserve the "failed experiment" moniker, but I don't think Nav docs would be one of them)

@lrosenthol
Copy link

lrosenthol commented Jul 11, 2017 via email

@mattgarrish
Copy link
Member

Reading order is the list of files in sequence in which they're presented (the spine). Navigation document contains the table of contents (plus page list and landmarks).

Even if the spine documents were titled, you'd only get a rudimentary idea of the document outline from them, as when you factor in content chunking it won't even be clear what rank the headings have (i.e., not every document has to start with a level 1 heading).

@lrosenthol
Copy link

lrosenthol commented Jul 11, 2017 via email

@mattgarrish
Copy link
Member

On way in which they can be presented, you mean.

The reading order as defined by the spine isn't dynamic, even if the reader follows a non-linear path through it. I don't see how that is easily changed, unless the UA understands the content at some deeper level.

At any rate, even if the reading order were shuffled it doesn't change the limitations as a means of navigating the actual publication outline.

add it to the reading order where you think it belongs

PDF has bookmarks. Word has the ability to view the document outline. EPUB has the navigation document. Do we want WP to be the outlier without a programmatic method of discovering?

@lrosenthol
Copy link

lrosenthol commented Jul 11, 2017 via email

@HadrienGardeur
Copy link

Thanks @GarthConboy for your proposal, I'll also reply point by point.

Identifier of WP. Required. Should be the URL to the first (or only) document in the reading order. [Allowing a clueless UA to get somewhere useful]

I disagree about this, for several reasons:

  1. The URL of the first document in the spine identifies that resource, it doesn't identifies the publication itself. Mixing those two up is very confusing.
  2. What happens if a resource is included in several publications? Do these publications all share the same identifier? This is madness.
  3. A clueless UA won't expose the URL of the manifest anyway, it'll expose the URL of one of its constituent resources. It's perfectly fine to share any URL for a constituent resource and then discover the manifest through it (<link> in HTML, Link: in HTTP).

Identification as a WP. Required in the first (or only) document in the reading order. Likely not explicit, but should be implied by the presence of whatever of the below turns out to be required (e.g., minimal WP metadata, presence or, or a pointer to, the manifest).

What would you like to identify as a WP? If we're talking about a resource from the publication, it should be identified by either the presence of a link to a manifest, or because the UA has already accessed a manifest and knows that the resource is part of a publication.

For the manifest itself, it should be identified by a specific media type.

List of publication resources. Required (yes, one could debate the degenerate case of a one-resource WP, but I'd lean toward "required" and this likely serves to identify the beast as a WP).

In Readium WebPub Manifest we also opted for a requirement, at least one resource must be listed in spine.

Reading order. Required (with similar degenerate case comments as above).

In Readium, the only required listed is the spine (which is listed in reading order). The other list (resources) is optional.

Metadata. Is some minimal set of metadata required? I lean toward "maybe required," but this clearly can be argued. I view it as required that there is an ability to specify WP metadata.

In Readium we require a title, but @lrosenthol is right that if we extend our scope to any sort of publication this might be problematic.

Nav doc. I lean toward optional. But, the presence of a machine-readable Nav Doc has many plusses. I do not view it as requirement that said document be directly renderable, but the dual nature of the current EPUB Nav Doc is not all bad. A11Y issues also abound.

I strongly lean towards optional. I think we should offer (both as options):

  • the ability to indicate that an HTML resource is meant for navigation, and render this publication as-is without requiring all the weird authoring rules associated to NavDocs in EPUB.
  • a separate machine-readable option in the manifest

The machine-readable info in the manifest should contain all navigation not rendered directly to the user (either because it shows up in the UI of the UA instead, or because it's useful for internal stuff).

From @lrosenthol

Maybe. I've been thinking more about how resources are really connected to
the content page and not to the publication. Do we (well, the UA) really
need (to load) the full 1000 images used by publication up front? Not
necessarily - it may only want/need what is required to load the first
content document.
We should be sure to design with optimization and performance in mind from
the start...

Unlike the <manifest> in EPUB, we shouldn't expect a manifest to reference all resources available in a Web Publication. It should only reference those that are deemed as very important for rendering content.

This would leave a lot of wiggle room for the kind of edge case scenario that you're thinking about (gigantic Web Publications).

@GarthConboy
Copy link
Contributor

GarthConboy commented Jul 12, 2017

Thanks @GarthConboy for your proposal, I'll also reply point by point.

I'll reply point by point too. Though can't help but comment that this sort of design work is really hampered by use of github issues -- something (e.g., Google Docs) where you can really comment in place and have conversations would be better! :-)

Identifier of WP. Required. Should be the URL to the first (or only) document in the reading order. [Allowing a clueless UA to get somewhere useful]

I disagree about this, for several reasons:

  1. The URL of the first document in the spine identifies that resource, it doesn't identifies the publication itself. Mixing those two up is very confusing.
  2. What happens if a resource is included in several publications? Do these publications all share the same identifier? This is madness.
  3. A clueless UA won't expose the URL of the manifest anyway, it'll expose the URL of one of its constituent resources. It's perfectly fine to share any URL for a constituent resource and then discover the manifest through it ( in HTML, Link: in HTTP).

I think we only partially disagree. I think it is unwise to have the identifier of the publication be the URL of the manifest, as a clueless UA would render either nothing or something it doesn't understand (depending on format of said manifest). I do think the manifest should be pointed to either from the first markup document in the reading order, or potentially even from all of markup documents in the reading order (yes likely through a link as you say) -- this is the same madness you refer to -- but, doesn't seem that mad to me!

Identification as a WP. Required in the first (or only) document in the reading order. Likely not explicit, but should be implied by the presence of whatever of the below turns out to be required (e.g., minimal WP metadata, presence or, or a pointer to, the manifest).

What would you like to identify as a WP? If we're talking about a resource from the publication, it should be identified by either the presence of a link to a manifest, or because the UA has already accessed a manifest and knows that the resource is part of a publication.

I think Dave wanted to be able to identify the "site" as a WP -- and yes, I think the presence of a link to the manifest would be a fine way of doing that -- that's one of the options I was proposing.

For the manifest itself, it should be identified by a specific media type.

Agree.

List of publication resources. Required (yes, one could debate the degenerate case of a one-resource WP, but I'd lean toward "required" and this likely serves to identify the beast as a WP).

In Readium WebPub Manifest we also opted for a requirement, at least one resource must be listed in spine.

Agree.

Reading order. Required (with similar degenerate case comments as above).

In Readium, the only required listed is the spine (which is listed in reading order). The other list (resources) is optional.

Somewhat agree. I have list of resources as required (above), but that's almost a detail.

Metadata. Is some minimal set of metadata required? I lean toward "maybe required," but this clearly can be argued. I view it as required that there is an ability to specify WP metadata.

In Readium we require a title, but @lrosenthol is right that if we extend our scope to any sort of publication this might be problematic.

Close to agree.

Nav doc. I lean toward optional. But, the presence of a machine-readable Nav Doc has many plusses. I do not view it as requirement that said document be directly renderable, but the dual nature of the current EPUB Nav Doc is not all bad. A11Y issues also abound.

I strongly lean towards optional. I think we should offer (both as options):

  • the ability to indicate that an HTML resource is meant for navigation, and render this publication as-is without requiring all the weird authoring rules associated to NavDocs in EPUB.
  • a separate machine-readable option in the manifest

The machine-readable info in the manifest should contain all navigation not rendered directly to the user (either because it shows up in the UI of the UA instead, or because it's useful for internal stuff).

I think this will be the root of lots of conversation, but I don't think we're too far apart.

@HadrienGardeur
Copy link

I'll reply point by point too. Though can't help but comment that this sort of design work is really hampered by use of github issues -- something (e.g., Google Docs) where you can really comment in place and have conversations would be better! :-)

@GarthConboy yeah, it's not always easy to follow all threads in a discussion...

I think we only partially disagree. I think it is unwise to have the identifier of the publication be the URL of the manifest, as a clueless UA would render either nothing or something it doesn't understand (depending on format of said manifest).

This is where I strongly disagree. I think that a clueless UA won't ever be aware of the URL of a manifest, and that even in a WP aware UA, users will never be aware of the URL of a manifest either.

If they're not aware of its existence and therefore don't share it, we don't have any problem at all using the URL of the manifest as the WP identifier.

I do think the manifest should be pointed to either from the first markup document in the reading order, or potentially even from all of markup documents in the reading order (yes likely through a link as you say) -- this is the same madness you refer to -- but, doesn't seem that mad to me!

I think this should be entirely up to the author/publisher to decide where and when they include discovery. There shouldn't be any requirement IMO.

Also, I'd like to have the ability to remix content on the Web. If I have zero write-access to the content that I'd like to remix within a Web Publication, there's no way I'll be able to include such a link in HTML or HTTP.

About navigation

I think this will be the root of lots of conversation, but I don't think we're far apart.

Right, but I think a lot of the arguments in favour of including all machine-readable navigation in HTML are misguided:

  • as I've already explained in a previous comment, developers are pretty much limited to plain text in whatever UI they're building, HTML doesn't help at all when it's not directly rendered as-is
  • working with HTML is more difficult than working with JSON or XML when extracting machine-readable info
  • restrictions regarding how HTML must be authored are harmful, these restrictions mostly exist to make the NavDoc more machine-readable. Such restrictions would also limit the ability to re-use existing HTML resources and simply mark them as navigation.
  • including content that is not useful for the user (such as page-list) can be very problematic, that's even more of an issue since the hidden attribute is not always supported

@GarthConboy
Copy link
Contributor

@HadrienGardeur I think we may be typing past each other on the first issue above. I'm presuming that the identifier of a WP will be a URL. And that the only two logical places for this URL to point is either at the publication's manifest or the first markup document in the reading order. Do you have a different view? Or do not view the identifier as a URL at all?

This was referenced Jul 12, 2017
@lrosenthol
Copy link

@HadrienGardeur

Also, I'd like to have the ability to remix content on the Web. If I have zero write-access to the content that I'd like to remix within a Web Publication, there's no way I'll be able to include such a link in HTML or HTTP.

I moved the discussion of this item over to its own issue at #8

@lrosenthol
Copy link

@HadrienGardeur

Right, but I think a lot of the arguments in favour of including all machine-readable navigation in HTML are misguided:

I moved the discussion of this item over to its own issue at #9

@HadrienGardeur
Copy link

@GarthConboy

  1. Fully agree that the identifier of a WP should be a URL, that's also a requirement in the Readium WebPub Manifest.
  2. I don't think that the URL can be the same URL as the first markup document in the reading order for the reasons listed in a previous comment
  3. While it could be a different URL than the first markup document, but still rely on content negotiation and/or an HTTP redirect to point to that document anyway, I don't think that's possible either because we won't be able to use them in certain situations.
  4. I think it's perfectly acceptable to use the URL of the manifest as the identifier of the WP, because a clueless UA won't be aware of its existence, and users won't be aware of its existence either. The scenario where someone discovers a WP through a manifest and ends up with something that they can't use is very unlikely to ever happen.

@GarthConboy
Copy link
Contributor

Yep -- I think we found our disagreement. I think if the identifier is a URL, folks will sent it around, and will expect it to "work". Thus, I disagree with your #2 and #4 above, and I still favor the identifier being the first content document. But, since I'm missing the call on Monday, you all can agree to something else, and I'll just whine for subsequent years.

@HadrienGardeur
Copy link

I fail to understand how you can completely disagree with my second point, let me quote precisely my previous comment:

  1. The URL of the first document in the spine identifies that resource, it doesn't identifies the publication itself. Mixing those two up is very confusing.
  2. What happens if a resource is included in several publications? Do these publications all share the same identifier? This is madness.
  3. A clueless UA won't expose the URL of the manifest anyway, it'll expose the URL of one of its constituent resources. It's perfectly fine to share any URL for a constituent resource and then discover the manifest through it ( in HTML, Link: in HTTP).

These are real issues, how do you address them if you decide that the URL of the first content document is the identifier of the WP?

The only situation that would make this acceptable is if we embed the manifest in HTML (which I find problematic for completely different reasons).

@GarthConboy
Copy link
Contributor

@HadrienGardeur yep, guess I disagree with two of the three! :-)

  1. The URL of the first document in the spine identifies that resource, it doesn't identifies the publication itself. Mixing those two up is very confusing.

I don't really see why. Assuming the first document in the spine points to manifest or contains the manifest, this seems an elegant solution -- a clueless UA can do something useful, and a clueful UA can chase down (or process) the manifest and do something better.

  1. What happens if a resource is included in several publications? Do these publications all share the same identifier? This is madness.

I'd think "don't do that" is a fine answer (bug for bug compatible with EPUB today). And like you said, this could lead to want to include rather than link to the manifest.

  1. A clueless UA won't expose the URL of the manifest anyway, it'll expose the URL of one of its constituent resources. It's perfectly fine to share any URL for a constituent resource and then discover the manifest through it ( in HTML, Link: in HTTP).

See #1. If the clueless UA gets the URL to the "lead" resource, it can just render the content, either ignoring an embedded manifest or not bothering to follow the link to an external one.

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 12, 2017

@GarthConboy

Since this argument is spread across issues, I've also had to reply to that same question in a separate issue.

I don't really see why. Assuming the first document in the spine points to manifest or contains the manifest, this seems an elegant solution -- a clueless UA can do something useful, and a clueful UA can chase down (or process) the manifest and do something better.

It only feels elegant if the manifest and the first resource are one and the same (manifest embedded in HTML). Otherwise it feels very confusing to me to use the same identifier (URL) for two different concepts (publication vs resource).

@GarthConboy
Copy link
Contributor

GarthConboy commented Jul 12, 2017

@HadrienGardeur -- Yep saw that too. Doesn't make me a convert. But, we'll see where the group ends up.

This is probably an issue we should try to resolve very soon, as it drives a number of subsequent decisions.

@HadrienGardeur
Copy link

Side question: what if I can't include a link to the manifest in the first resource in reading order?

What happens then?

@lrosenthol
Copy link

lrosenthol commented Jul 12, 2017 via email

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 12, 2017

What if the publication already exists on the Web and someone else would like to author a manifest for it?

There are plenty of publications on the Web that could benefit from such a manifest, I'll re-use the same example: http://poignant.guide/

@lrosenthol
Copy link

lrosenthol commented Jul 12, 2017 via email

@HadrienGardeur
Copy link

I see that you're carefully being very neutral by talking about a "front page" ;-)

So, this means that I would either need to:

  • simply publish a JSON manifest if we use the URL of the manifest as the WP identifier
  • or publish an additional HTML resource in addition to a manifest (plus include that HTML resource in the spine) if the first resource is used as the WP identifier

@lrosenthol
Copy link

lrosenthol commented Jul 12, 2017 via email

@laudrain
Copy link

Sorry for late comment but none of all our EPUBs have nav doc in spine.
I completely support Hadrien in the idea that HTML is not a perfect language for structuring information but is quite ok for rendering.
That's how we understand in existing EPUBs the use of nav (structured information) and a true ToC in HTML and in the spine for navigation.
When when a book has no printed toc, the EPUB has one in the spine for all chapters to allow navigation, and a nav doc for a11y (chap list, page list, etc).
So I vote for:

  • TOC as HTML content document (optional)
  • Structured information (required) in any any appropriate language (JSON, XML, ...)

@TzviyaSiegman
Copy link
Contributor

This issue is addressed by #7 and #15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants