Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The URI, URL, and URN of a web publication #11

Closed
dauwhe opened this issue Jul 14, 2017 · 46 comments
Closed

The URI, URL, and URN of a web publication #11

dauwhe opened this issue Jul 14, 2017 · 46 comments

Comments

@dauwhe
Copy link
Contributor

dauwhe commented Jul 14, 2017

This is another issue being extracted from #5, with the hope that we will focus on the matter at hand.

How do we identify and locate a web publication? We do not seem to have an agreement yet on whether a WP should be located by the URL of the manifest, or the URL of the first content document. I have a strong preference for the latter.

@TzviyaSiegman
Copy link
Contributor

not to be too nitpicky, but I think we have agreed to use the term "address". We are not in the business of identifying anything ;-)

@HadrienGardeur
Copy link

  1. The "start" link and the "self" link are two different concepts that can be separately handled by a manifest. This way the self link can be exchanged in APIs or specialized UAs, while the start link is what the user is aware of.
  2. Many other technologies (including Web App Manifest and RSS/Atom) have a standalone machine-readable document that basic UAs do not understand or render in any specific way. Yet this has never been a problem for these technologies. Why would Web Publications be any different?
  3. While auto-discovery is a good thing, we can't force it and can't expect that the first content document will always contain a link to the manifest or an embedded manifest.
  4. If two publications have the same first content document, we still need an address for each publication.
  5. For a UA, it's much more convenient to have direct access to a manifest than to either have to extract an embedded manifest, or indirectly discover it through a link.

@rdeltour
Copy link
Member

The "start" link and the "self" link are two different concepts

Right, good point.

While auto-discovery is a good thing, we can't force it and can't expect that the first content document will always contain a link to the manifest or an embedded manifest.

Why can't we? PWAs require that the app manifest is linked from the HTML (there's no spec, but it's the common documented practice, the "cow path"), why would Web Pub be any different?

For a UA, it's much more convenient to have direct access to a manifest than to either have to extract an embedded manifest, or indirectly discover it through a link.

For a browser, I'd say it's much more convenient to get HTML first.

@HadrienGardeur
Copy link

Why can't we? PWAs require that the app manifest is linked from the HTML (there's no spec, but it's the common documented practice, the "cow path"), why would Web Pub be any different?

They don't require anything. They give you the possibility to offer auto-discovery.

You're not required to include a link to a Web App Manifest from all resources of a Web App, you include a link whenever you want and/or need it.

For a browser, I'd say it's much more convenient to get HTML first.

Not for a browser who knows how to deal with a WP. The other browsers would never be aware of the existence of this manifest anyway.

@rdeltour
Copy link
Member

You're not required to include a link to a Web App Manifest from all resources of a Web App, you include a link whenever you want and/or need it.

Right, but if you want to make the Web App Manifest actually used, you do need this link, right? Is there any implem which uses a Web App Manifest directly? (I'm not aware of any).

The other browsers would never be aware of the existence of this manifest anyway.

Exactly, that's why pointing to HTML first offers better graceful degradation.

@HadrienGardeur
Copy link

Right, but if you want to make the Web App Manifest actually used, you do need this link, right? Is there any implem which uses a Web App Manifest directly? (I'm not aware of any).

Sure, any API could rely entirely on Web App Manifest directly. For example an OPDS integration in a Readium-2 based UA.
I can imagine many use cases that wouldn't require auto-discovery to exchange and display a WP.

Exactly, that's why pointing to HTML first offers better graceful degradation.

Do you point a RSS reader to a website first and then let it figure out how to discover the right RSS (they could be multiple links)? Of course not.
It's exactly the same here.

Once again, the main issue here is that @dauwhe is mixing up two concepts together ("start" and "self").

As long as we have a standalone manifest (instead of an embedded one), it'll be accessible directly in a basic UA. This has nothing to do with using this URI as an identifier or not, and saying that the "self" link should be the URI of the manifest won't make the problem any worse or better.

@HadrienGardeur
Copy link

Since we've moved to separate issues, I'll quote a previous comment:

What we use as the WP identifier is deeply tied to how we represent and/or link to a manifest.

If a manifest is an external resource with its own URI:

  • it makes it easier for a UA to directly access and use the manifest
  • it also means that one way or another, this manifest could be accessed by a clueless UA that won't know what to do about it

If a manifest is embedded in HTML (most likely using <script>):

  • it makes it more difficult for a UA to access and extract the manifest
  • it makes perfect sense for the URL to that HTML to be used as the WP identifier

What I completely fail to understand, is why some people in this group (including @dauwhe & @GarthConboy) believe that using the URL of the manifest as the WP identifier will make it more likely to be displayed by a clueless UA.

The fact that a clueless UA won't be able to understand and use a URL directly to an external manifest has nothing to do with the WP identifier itself.

If you're completely against the manifest being directly accessible to a clueless UA, then you should also be in favour of strictly embedding the manifest in HTML (like @frivoal), but that doesn't seem to be the case (at least for @GarthConboy).

@rdeltour
Copy link
Member

Sure, any API could rely entirely on Web App Manifest directly. For example an OPDS integration in a Readium-2 based UA.
I can imagine many use cases that wouldn't require auto-discovery to exchange and display a WP.

Oh, I can very well imagine how manifest-first could work. I was just commenting that it's not how web app manifests are currently used.

Do you point a RSS reader to a website first and then let it figure out how to discover the right RSS (they could be multiple links)? Of course not.
It's exactly the same here.

I would point a user to a blog URL, and their browser would discover the RSS feed. In most cases, I would also paste the blog URL straight to my RSS client, which would discover the feed.

For a blog, I would posit that in practice both self and start is the URL of the blog site, not its RSS feed.

@HadrienGardeur
Copy link

Oh, I can very well imagine how manifest-first could work. I was just commenting that it's not how web app manifests are currently used.

Web Apps are not entirely dependent on auto-discovery, Microsoft for instance has announced integration of PWAs in the Windows 10 store.
This is exactly the same concept as discovering Web Publications through an API (for example my library's catalog).

@BigBlueHat
Copy link
Member

This is exactly the same concept as discovering Web Publications through an API (for example my library's catalog).

Well...not exactly. 😃 One of the beautiful things about "web publications" (writ small as I'm referring to things published on the Web right now) is that you can find the whole from its parts. This can also be true within a library's indexing system--where a search brings up not only a book reference, but also a specific chapter or page. In a webby publication, I could browse/find/retrieve just that page and then (at my option) get the rest of the publication in which that page is a part.

That's typically how PWA's are distributed outside of stores--any user-facing content (typically HTML pages) reference the manifest for the PWA they're apart of. If you land on any of them, you have (assuming modern tech is in use) the option to "install" (or "keep") the PWA of which that page is a part.

Sometimes those PWA's may only have one "page." Sometimes, they may have many.

Same situation is true of rel="serviceworker" fwiw. It can be referenced from any resource, but defines a scope as part of it's registration: https://w3c.github.io/ServiceWorker/#link-type-serviceworker

Whatever we build, let's keep it webby.

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 17, 2017

You're missing the point @BigBlueHat, I've never argued against discovery and in fact I've posted numerous exemples about how this can be handled (link in HTML and Link: HTTP header).

But @rdeltour was arguing that this is the only use case, which isn't true.

The discoverable and distributed nature of it doesn't mean that it can't also be useful in a more controlled environment (a library's catalog being a very good example of such an environment for publications).

@HadrienGardeur
Copy link

If you land on any of them, you have (assuming modern tech is in use) the option to "install" (or "keep") the PWA of which that page is a part.

BTW, that part specifically isn't true. If you include a link to a Web App Manifest on every single page of your Web App you get into some very weird behaviours (I've seen Chrome displaying the Web App Manifest install banner in an app that I've already installed for instance), so in general you should avoid including a link to it on every single page.

I'm not saying that we should avoid including a link on every content document of a Web Publication, but forcing them is IMO a bad idea (in some situations you might not be able to provide such links).

@rdeltour
Copy link
Member

I've never argued against discovery and in fact I've posted numerous exemples about how this can be handled (link in HTML and Link: HTTP header).

But @rdeltour was arguing that this is the only use case, which isn't true.

For the record, I was arguing that URLs to HTML documents, with a discoverable link to a manifest, was by far the most frequent usage pattern in today's Web (notably with PWAs).
There may be existing real-world use cases dealing with direct links to manifests (I wasn't aware of Microsoft's app store case) but the above is still true AFAIK.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 17, 2017

As a user, I find nearly everything on the web by clicking on a link or typing a URL. The link to most web applications I've seen are 1. described by a URL that makes no reference to a specific file or file extension and 2. resolve to an HTML document. And from the point of view of an end user, that link does serve as the identifier for a web app or web site. Do I want to go to google? Then I use google.com. Do I want to read the CouchDB book? I go to http://guide.couchdb.org. Look at the Firefox Platform Status PWA? I go to https://platform-status.mozilla.org. In some cases there's an index.html, in other cases there isn't—as a user, I don't care. In my mind, the identity of the web app or site is the URL that allows me to use the app.

I notice that the web app manifest spec doesn't include a rel=self link or equivalent. What's different about web publications that we would need such a thing?

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 17, 2017

We can go over and over again over the same issue @dauwhe but you're not really answering to the question that I've been asking several times now: why do you believe that having the URI of the manifest as a WP identifier will have any impact on people opening such URIs in a basic UA?

As long as we have a standalone manifest, this will always be possible and this has nothing to do with what we use as an identifier.

Are you in agreement with @frivoal and suggesting that the manifest MUST be embedded in HTML?

@murata2makoto
Copy link

I am wondering if we can postpone this issue for a while. A WP may have a manifest, the first spine item, and the navigation document. I think that the URI of a WP is either that of its manifest, first spine item, or the nav. doc, and nothing else. Isn't this good enough for now? Pros and cons of each option would be nice. Remember that our FWPD will be a good chance to provide a better big picture and invite browser vendors.

@HadrienGardeur
Copy link

I notice that the web app manifest spec doesn't include a rel=self link or equivalent. What's different about web publications that we would need such a thing?

You don't really need to update a Web App Manifest, while this will be very important for a Web Publication where caching and packaging will most likely be based on a declarative approach rather than a scripted one.

This means that we'll need to check if a Web Publication Manifest has been updated fairly often, and having a self/canonical URL for that is extremely useful.

Atom feeds have a self link for the exact same reason but that link will be even more important on a Web Publication where the manifest can be cached for much longer periods of time.

@rdeltour
Copy link
Member

rdeltour commented Jul 18, 2017

while catching up with other threads, it occurs to me that we may not all talk about the same things. I was considering a URL to a Web Pub as a locator or address (in other words, a user-facing concept; what would be linked to from another web page), whereas @HadrienGardeur seems to be talking about pointing at the manifest as the Web Pub identifier.

Am I correct? If yes, what do we exactly mean by identifier, what's the use case and definition?

@GarthConboy
Copy link
Contributor

Indeed. Good question. And should they be (or do they need to be) different or separable concepts?

@lrosenthol
Copy link

lrosenthol commented Jul 18, 2017 via email

@frivoal
Copy link
Contributor

frivoal commented Jul 18, 2017

If the goal is de-duplication, isn't the web's way of solving that rel=canonical, rather than having each instance of the thing include an ID of some kind (be it a URI or not)?

@TzviyaSiegman
Copy link
Contributor

This is why I commented that we were talking about addressing not identification. We have explicitly stated that PWG is NOT working on identification in the sense of unique for a publication. If Wiley publishes Moby Dick and Hachette publishes Moby Dick, PWG is not going to solve the problem of resolve identifiers between the 2 versions. Other organizations have worked on that. PWG WILL work on addressing those identifiers in a web-friendly manner.

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 18, 2017

There isn't a single locator for a publication, there are as many locators as they are content documents in that publication as it's been pointed out by multiple people here including @BigBlueHat in a previous comment.

That said, one specific content document in a publication may be deemed as more important than others. I don't think that's necessarily the first one in the reading order or the navigation document (if we have any), and marking that resource as the "start" could be one way to identify it.

The URI of the manifest on the other hand is best suited as a WP identifier since:

  • it's the only resource that can uniquely identify the WP, all other resources can be included in another publication too and therefore can't identify a publication
  • it's also extremely useful to have it as a canonical location for a manifest, since we'll need to check the manifest on a regular basis even in the context of a PWP

In Atom and other API formats, this is often done by including a self link, but as @frivoal pointed out this is also similar to how canonical is used in HTML.

@HadrienGardeur
Copy link

We have explicitly stated that PWG is NOT working on identification in the sense of unique for a publication. If Wiley publishes Moby Dick and Hachette publishes Moby Dick, PWG is not going to solve the problem of resolve identifiers between the 2 versions.

@TzviyaSiegman I don't think anyone attempted to address that specific problem in this issue or in another issue so far.

Providing an identifier for a WP is within the mission of this WG. We haven't talked about Work-Expression-Manifestation-Item anywhere so far (and hopefully we won't).

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 18, 2017

Let's talk purely about locators. I'm finding it helpful to look at existing web practices, especially around web apps, which I think are very closely related to web publications, being bundles of web resources viewed as a whole. So let's say I want to find out about what features Firefox is working on. A google search turns up a URL:

https://platform-status.mozilla.org

And indeed, this brings me to the web app. There's actually an index.html file, but I wouldn't even know that unless I checked:

https://platform-status.mozilla.org/index.html

My browser also sees that there's a link to a manifest in that HTML page, so it knows about the manifest. As a user, I didn't have to worry about that either. In fact, the only way I can find out the URL of the manifest is to view source on the HTML page.

https://platform-status.mozilla.org/manifest.json

If I tried to locate the web app by using the URL of the manifest, I'd see the raw JSON, which is not a good user experience.

For me, it makes sense that the URL of this web app is https://platform-status.mozilla.org/; that's all I need to know to access the content. The manifest does contain a start_url, which happens to be /?utm_source=manifest (which is interesting in itself). But note that even start_url is optional in the web app manifest spec.

We seem to have a pattern here. A URL leads to a web page which contains a link to a manifest. This is explicit in the web app manifest spec:

A resource is said to be associated with a manifest if the resource representation, an HTML document, has a manifest link relationship.

I think this pattern can work just as well for web publications. Can we just say:

The URL of a Web Publication must resolve to a content document containing either a link to a manifest or an embedded manifest

@TzviyaSiegman
Copy link
Contributor

+1 to @dauwhe's suggestion. This builds on existing functionality. Looking at the Web App Manifest start_url more carefully, it seems to do a lot of the things about which we've been speculating:

The start_url member is a string that represents the start URL , which is URL that the developer would prefer the user agent load when the user launches the web application (e.g., when the user clicks on the icon of the web application from a device's application menu or homescreen).

As we've discussed in the past, the spec is likely adaptable to publications, so s/application/publication.
[1] https://www.w3.org/TR/appmanifest/#start_url-member

@HadrienGardeur
Copy link

Sorry @dauwhe but at least for me this doesn't help to move the conversation forward. I've asked a question that IMO is the very key to that discussion several times, but you're simply avoiding it.

I'll reply anyway, but this is not going anywhere...

And indeed, this brings me to the web app. There's actually an index.html file

Can we please stop talking about files and folders? This is a resource, not a file. It may be a static file on a server, but it might be as well a simple route handled by an app.

My browser also sees that there's a link to a manifest in that HTML page, so it knows about the manifest.

I don't think that anyone is arguing against discoverability, I've only argued against discoverability as a requirement (I strongly believe that this is a SHOULD, not a MUST).

But note that even start_url is optional in the web app manifest spec.

It should also be optional for a Web Publication but for a different reason.

For a Web App, it can be optional because the other option is to rely on the current URL for the resource that contains a manifest link.

For a Web Publication, it can be optional because as long as we have the equivalent of a spine we can always fallback to the first element of the spine or rely on a rel value to identify another useful resource.

I also still fail to see why we need a single locator for a WP, when obviously there are as many locators as there are content documents.

@llemeurfr
Copy link
Contributor

llemeurfr commented Jul 18, 2017

As a start, I agree/repeat that we shouldn't talk about "the identifier of a WP"; the issues titled with that word should be renamed. Even "THE address of a WP" or "THE URL of a Web Publication" is not an interesting subject IMHO (they are several useful addresses for a given publication).

By the way, I don't find a trace of an "identifier of a Web Application" in the Web App Manifest spec, but I find a start_url property.

start_url is a clear equivalent of what @HadrienGardeur describes as a "start" url.
As many participants seem to approve the structure of the Web App Manifest, we could spend some time on its specification.

But I see a difference between a Web App and a Web Publication: Web Apps are discovered by humans, where Web Publications will often be discovered and handled by applications (ex. reading systems, now UA).

@dauwhe said:

As a user, I find nearly everything on the web by clicking on a link or typing a URL. 
This is a proper use case. We miss clear use cases and requirements in every issue ...

Being member of the Readium team, I'd like to add 2 complementary use cases:

  • As a UA, I want to find structured information on the web, in a minimum number of http requests. This is what makes me efficient.
  • As a UA, I want to find easily the most recent information about the publication I have to handle.

If we agree with these three use cases, we can reach an agreement on a simple solution:
1/ use a notion of start_url in the manifest for leading a user to the first interesting resource in the publication (where the author leads the user and this is the URL to be shared among users; which may not be the first document in reading order).
2/ use a notion of self url in the manifest for leading a UA to the structured info (i.e a up to date manifest) he wants/needs to process.

And yes, a document of the publication can link to the manifest, offering added discoverability to UAs. But if a UA is given the direct address of the manifest (this is a web resource, it has an address), adding a self link in the manifest itself will bring added benefits to UA (not to humans)

@GarthConboy
Copy link
Contributor

I believe Hadrien's question was:

Why do you believe that having the URI of the manifest as a WP identifier will have any impact on people opening such URIs in a basic UA?

If the two instances of URI in that question both have the same value (e..g, foo.com/bar/blat.json), then I think that would not be interesting for presentation for to a basic UA, and a basic UA wouldn't be able to find a renderable resource as it doesn't understand the manifest. Dave, is that your perspective?

And:

Can we please stop talking about files and folders? This is a resource, not a file. It may be a static file on a server, but it might be as well a simple route handled by an app.

I'm wondering what consensus is on this. Do we believe this to be true of a "publication"?

We may be reaching a point that we should table this clearly out of hand "issue" and attempt to reach consensus on the next call.

@HadrienGardeur
Copy link

If the two instances of URI in that question both have the same value (e..g, foo.com/bar/blat.json), then I think that would not be interesting for presentation for to a basic UA, and a basic UA wouldn't be able to find a renderable resource as it doesn't understand the manifest. Dave, is that your perspective?

The question is not whether the UA can render it or not.

In his latest comment, @dauwhe is still suggesting an external JSON manifest that will be accessible (it has its own URI) and therefore might be rendered in a UA that doesn't know what to do about it.

Why would having a self link that points to the manifest have any impact on this simple fact?

I'm wondering what consensus is on this. Do we believe this to be true of a "publication"?

I'm not sure which consensus you're looking for @GarthConboy, I'm simply pointing out the fact that a server can handle a URI such as http://example.com/index.html without having any static file named index.html.
There's no notion of file and folder with URI, therefore we shouldn't use these terms.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 18, 2017

Why would having a self link that points to the manifest have any impact on this simple fact?

Do you interpret the self link as being the URL of the web publication? Does web publication have a URL at all? Or do only the components of a web publication have URLs?

@lrosenthol
Copy link

lrosenthol commented Jul 18, 2017 via email

@HadrienGardeur
Copy link

Do you interpret the self link as being the URL of the web publication?

Is there such a thing as a URL of a Web Publication?

I'll quote a comment I've made in another issue:

A WP should not:

  • require any specific URI convention (URI ending with a /, well-known location)
  • restrict how resources are hosted (domain, subdomain, path)
  • require content negotiation
  • require any specific HTTP header

I don't think we need some sort of "abstract" URI that represents a Web Publication from which a server will serve another resource or provide a redirection.

As it's been pointed out by @lrosenthol before, this would make it impossible to host a Web Publication on Dropbox, Google Drive etc.

Does web publication have a URL at all? Or do only the components of a web publication have URLs?

Once again, it's very unclear to me what you mean by "URL of a Web Publication".

Here's what I'd like to have:

  • a Web Publication has a URL where the manifest can be accessed directly (self)
  • a Web Publication has many content documents, each accessible using a URL
  • some of those content documents may provide discovery for the manifest
  • one of those content documents may be marked as the start of a publication

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 18, 2017

Here's what I'd like to have:

  • a Web Publication has a URL where the manifest can be accessed directly (self)

Acknowledging that we haven't yet decided if a manifest can be embedded in HTML, yes, there would be a URL to the manifest. Perhaps what this URL represents is out of scope.

  • a Web Publication has many content documents, each accessible using a URL

Absolutely.

  • some of those content documents may provide discovery for the manifest

At least one?

  • one of those content documents may be marked as the start of a publication

Would this content document ("start_url") be required to have a link to the manifest?

@HadrienGardeur
Copy link

Acknowledging that we haven't yet decided if a manifest can be embedded in HTML, yes, there would be a URL to the manifest. Perhaps what this URL represents is out of scope.

When I say "directly", I mean a single GET request so the statement should work for an external manifest or an embedded one.

some of those content documents may provide discovery for the manifest
At least one?

I'd like to have the ability to entirely re-use existing content documents on the Web. This is a strong SHOULD but not a MUST IMO, not even for a single document.

Would this content document ("start_url") be required to have a link to the manifest?

I think that's an open discussion that we need to have.
If the presence of a start link is a SHOULD/MAY and not a MUST (as implied in my comment), I'm open to the idea that the document would have to provide discovery for the manifest.

@TzviyaSiegman
Copy link
Contributor

We may be reaching a point that we should table this clearly out of hand "issue" and attempt to reach consensus on the next call.

I believe @GarthConboy is suggesting that the discussions here seem to be leading, not to consensus about a particular point, but to further confusion. As a reminder, none of these issues have yet been brought to the WG. And, we do not make decisions without a call for consensus [1]. There are several assumptions in this thread. We cannot write a spec based on assumptions.

I am seeing the following proposals for addressing:

  • variation of web app manifest start_url
  • a Web Publication has a URL where the manifest can be accessed directly (self) (is this the same as the address for the whole publication?)

I think the best way forward is to write actual proposals. This will give us something concrete to discuss at the next meeting. The purpose of these discussions on GH is to bring the discussion to the larger WG so that we can reach consensus and draft the specs.

A separate point that has been made in other threads as well is that a publication might include multiple resources and/or URLs. See #10 for that discussion.

I am not sure that we need WPs to work on tools like Dropbox or Google Drive. That seems like an issue for PWP. But, please open a separate issue to discuss portability.

[1] https://www.w3.org/publishing/groups/publ-wg/WorkMode/#telco

@rdeltour
Copy link
Member

@HadrienGardeur said:

Why do you believe that having the URI of the manifest as a WP identifier will have any impact on people opening such URIs in a basic UA?

and then

it's very unclear to me what you mean by "URL of a Web Publication".

I for one am confused about what you mean by "a WP identifier" in the first statement: I understood you were talking about a URL of the Web Publication, but the second statement seems to contradict that.

This confusion aside, 👍 on the later statements:

  • a Web Publication has a URL where the manifest can be accessed directly (self)
  • a Web Publication has many content documents, each accessible using a URL
  • some of those content documents may provide discovery for the manifest
  • one of those content documents may be marked as the start of a publication

I think it's reasonable to expect that some entities would deal with direct URLs to the manifest (e.g. UA and/or stores) and some other would rather deal with URLs to a starting document (e.g. user-oriented hyperlinks).

@HadrienGardeur
Copy link

I for one am confused about what you mean by "a WP identifier" in the first statement: I understood you were talking about a URL of the Web Publication, but the second statement seems to contradict that.

By WP identifier, I mean a string that can uniquely identify a WP (two different WPs won't be using the same string).

Since we're on the Web, using a URL sounds natural. We'll also always have a URL for the manifest, no matter if it's external or embedded, which is why the URL for the manifest feels like a good fit for a WP identifier.

@bduga
Copy link

bduga commented Jul 18, 2017

Trying to read through all these posts. I agree with Tzviya, this discussions seems to be increasing confusion rather than leading to clarity :( It seems we have a few requirements:

  1. Basic UAs should be able to display the parts of a WP with no explicit knowledge of WPs (I can point Lynx at https://mybooks.com/mobydick_ch1.html and read the whole book).
  2. Certain UAs (or RSes or whatever) can determine that https://mybooks.com/mobydick_ch1.html is a WP and is able to locate the manifest for it.
    2a. Those same UAs may be able to look at https://mybooks.com/mobydick_ch15.html and realize it is the same WP and locate the manifest for it.
    2b. Those same UAs could also be given the manifest URL directly. This would not look good in a basic UA, but enhanced UAs would understand it was a WP and use some method to find the first resource to display to the user.
  3. Content may be reused across WPs. For instance, https://mybooks.com/mobydick_about_melville.html and https://mybooks.com/billybudd_about_melville.html may point at the same underlying html content.

These requirements argue against a few things:

  1. Magic of the start_url as somehow identifying the WP. I may receive a link to chapter 15 and I still want to know this a WP and use the features of my enhanced UA to read it.
  2. Putting links to the manifest directly in the html, which makes 3 difficult. There are solutions here, so this isn't hard and fast - we might be able to work around it.
  3. Putting the manifest in the html. We would either need to replicate the manifest everywhere, or somehow know to load another html document and then parse out the manifest, etc. Pretty ugly. It also makes reuse of the first resource difficult.

So, I have some thoughts about how to solve those uses (none amazingly great), but maybe we can try to focus explicitly on use cases for now. And not generic, everything anyone ever might want to do, but very specific things we want to make possible. Does my list (1, 2, 2a, 2b, 3) cover it? Are there others? Are mine incorrect?

@HadrienGardeur
Copy link

@bduga these requirements work for me.

I'd like to point out BTW that 3 is not specific to WP, it's the way the Web works: any HTML document can link to other HTML documents using URLs.

Magic of the start_url as somehow identifying the WP. I may receive a link to chapter 15 and I still want to know this a WP and use the features of my enhanced UA to read it.

That's why I believe that the URI for the manifest is better suited to identify a WP.

Putting links to the manifest directly in the html, which makes 3 difficult. There are solutions here, so this isn't hard and fast - we might be able to work around it.

If these links are optional and there are other ways that a WP can be discovered, I don't think that's a problem.

Putting the manifest in the html. We would either need to replicate the manifest everywhere, or somehow know to load another html document and then parse out the manifest, etc. Pretty ugly. It also makes reuse of the first resource difficult.

Fully agree. That's a separate discussion, but I'd much rather have the manifest as an external JSON document.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 18, 2017

@bduga One more possible use case:

  1. Nested web publications. One example is a journal containing articles, and both the journal and the articles are web publications. Another example is publishing together three novels in a series ("omnibus" edition). This is not fun with EPUB.

@HadrienGardeur
Copy link

@dauwhe I'm not sure that technically (from a manifest perspective) these publications will have to be nested.

This is how it could work with an omnibus containing three novels:

  • four different manifest documents, each with its own URI
  • these four publications may have four different start documents, but the omnibus and first novel may also share the same start document
  • same thing for discoverability, each content document may have one or more link to a manifest, or nothing. The publisher may also decide to only include such links on the start document of each publication.

The manifests themselves do not IMO need to be nested.

@murata2makoto
Copy link

murata2makoto commented Jul 22, 2017

I am afraid that I do not understand the scope of this issue well. When I saw "URN" in the title of this issue, I thought that location-independent URNs (such as ISBN numbers) and resolution of such URNs to URLs (or something else) will be discussed. But this is not the case. You guys appear to discuss the relationship among manifest URIs, first-spine-item URIs, WP URIs, and so forth.

@llemeurfr
Copy link
Contributor

From recent discussions in other threads, this issue should IMO better be renamed "Address of a web publication". This would make things clearer.

@TzviyaSiegman
Copy link
Contributor

addressed by #28

@iherman
Copy link
Member

iherman commented Aug 29, 2017

See telco discussion on closure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests