Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packages vs canonical id #47

Open
llemeurfr opened this issue May 7, 2019 · 5 comments
Open

Packages vs canonical id #47

llemeurfr opened this issue May 7, 2019 · 5 comments

Comments

@llemeurfr
Copy link
Contributor

ref https://www.w3.org/TR/wpub/#canonical-identifier

By definition, "A Web Publication's canonical identifier is a unique identifier that resolves to the preferred version of the Web Publication. It is expressed using the id property." A Package usually contains content that has been composed "out of the web". There is therefore no "preferred version of the Web Publication" involved.

Creators of Packages have some solutions:

1/ No canonical id.
This is ok from the WP spec has indicated in the sentence "If a URL is not provided in the manifest, or the value is an invalid URL, the Web Publication does not have a canonical identifier.' A canonical id may be added to the manifest by a processor which makes a Web Publication from the Package.

2/ A DOI as canonical id.
In such a case, the DOI will redirect to a Web Publication, when it is put online. But in this case, what if the Package becomes SEVERAL Web Publications at different URLs?

3/ a URN as canonical id.
URN (Uniform Resource Name) is used here in the sense of a Name but not Locator. The advantage is that there is no attempt to use it as a Locator. But URNs are not liked much these days in Web circles.

Any other solutions? which one should be recommended (or required)?

@llemeurfr
Copy link
Contributor Author

llemeurfr commented May 31, 2019

A similar issue applies to the WP address, defined in https://w3c.github.io/wpub/#address.

This is treated in #49.

@llemeurfr llemeurfr changed the title Packaged publications vs canonical id Packaged publications vs canonical id and WP address May 31, 2019
@llemeurfr llemeurfr changed the title Packaged publications vs canonical id and WP address Packages vs canonical id and WP address May 31, 2019
@iherman
Copy link
Member

iherman commented May 31, 2019

An alternative might be the https://..../publ/ approach (as discussed at the F2F), although that discussion was not conclusive. @BigBlueHat is supposed to make a more detailed argument; it may be valid for this case, too.

@llemeurfr llemeurfr changed the title Packages vs canonical id and WP address Packages vs canonical id Jun 3, 2019
@llemeurfr
Copy link
Contributor Author

We may add an example to the spec.

@iherman
Copy link
Member

iherman commented Jun 4, 2019

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript Packaged publications vs canonical id and WP address
Ivan Herman: #47
Laurent Le Meur: #47 - packaged publications vs canonical ID: this is more complex. A WP can have an address, because we know where it’s located. By definition the canonical ID is the preferred WP address (a URL). If an LPF file is born inside the publishers house, before it becomes a WP, it has no address
… so the solution is to have no canonical ID. It’s possible in the spec to have no canonical ID. It can have a DOI, or an ISBN - so we had an issue with canonical ID.
… We have a different issue with WP URL…
Ivan Herman: The current text does say that the unique identifier - but I don’t know if bill is here, but what we are really talking about here is a unique identifier for the publication, which is preferred to be a URL.
… If the unique identifier is different, that should be perfectly accepted and the text in the web publication should be ammended. In your case the DOI or URN should be OK. It is used to identify something.
… Yes, the DOI can be tacked to the URL but when you look at the DOI, it’s an identifier. I think we might go the other way from the web packaging work - it’s OK as is (we might want an editorial note that it’s not always a URL)
… and then make the text on a canonical ID a little bit lighter.
Luc Audrain: Can this identifier be an ISBN?
Laurent Le Meur: https://www.w3.org/TR/wpub/#example-49-example-for-setting-both-the-isbn-and-the-address-of-the-same-document
Laurent Le Meur: If you look at the example in #49 - you’ll have your answer.
Ivan Herman: Wait - we may have a problem. The ID - being a jsonLD term, and the value must be a URI. Benjamin is that correct?
… it’s a question. In JSONLD - the ID must be a URI?
Benjamin Young: Yes, IRI…
… If you were to use an ISBN, it could be ISBN:[number]
Laurent Le Meur: so it’s solvable. For the WP address?
… they are related because they are sometimes the same.
Ivan Herman: What should we do with #47?
Laurent Le Meur: should we add a note in the spec that any URI can be used as a canonical ID?
Tzviya Siegman: I think we should include some examples.
… if I understood what Benjamin said, the examples we had were not quite right?
Ivan Herman: that’s a schema.org issue. They have a separate term for ISBN. That’s a problem in schema.org
Deborah Kaplan: +1 for including examples that are non-url URNs
Benjamin Young: The examples aren’t wrong. We show and example of schema.org ISBN - but we don’t state that there is any use for that. If we move it into use it as the ID of the document.
… It just changes - not everything damaging
Garth Conboy: I was not concerned, just wanted to go back to what Benjamin said for the value of ID. Not sure it’s a problem, but it was a good example
Ivan Herman: there is a need to make a small review of that section
Tzviya Siegman: Do we have a resolution for this point?
Laurent Le Meur: There is no need to resolve anything. The issue #47 is also about the WP address. the Wp address is another issue because the property is a URL yet the package has no URI
Ivan Herman: Isn’t it issue #45?
Laurent Le Meur: We should open a new issue for the publication address
Ivan Herman: and the canonical issue should be closed - we put some examples in the document and update the main document
Benjamin Young: This isn’t as massive as it sounds but it’s going to come down to the origin one. if we have a canonical ID - whether or not reference into the publication - scrolling, etc should use the canonical ID. It doesn’t have to be dereferencable - so it can be any identifier.
… so we’re going to have to make some determinations about package referencing. ‘Is that going to be the thing’ or is the web publication address the thing we point to. It’s no longer a packaged web publication, then it would be a packaged publication
… because if it’s offline, we’re just pulling it out of the box. If they are more epub-y, then they are already off the web. The identifiers have a whole set of new issues, and may not have a URL… it’s an identity crisis
Garth Conboy: https://www.w3.org/TR/wpub/#canonical-identifier
Garth Conboy: I didn’t quite understand if we made a decision or moved to another issue, but are we also saying - as it’s relevant to audiobooks - that the canonical ID could be an ISBN or DOI… Did we get to anything?
Ivan Herman: My understanding on the canonical IDs is it can be missing or be any type of valid URI - ISBN or DOI or whatever else
Garth Conboy: Is that a change?
Ivan Herman: it’s a slight change from the text - because the text requires dereferencable.
Garth Conboy: It also requires the term ID
Ivan Herman: It uses the term ID. We have a schizophrenic issue with ID. We have explicit terms for many of them, but it’s not exhaustive. They don’t have a term for DOI but they do for ISBN…
… people can use the ISBN because it’s schema.org - but ideally they should use the ID in a valid URI form so it’s valid for JSONLD
Garth Conboy: We explicitly say it’s expressed using the ID property
Tzviya Siegman: if we’re not clear on it, it’s confusing in the document
Benjamin Young: If we - whatever we put in the ID field as the canonical - should/must (probably must) would become the identifier of the publication. Ideally on the hard drive or the server… We made it a URL but accommodated for DOIs - so you could use a DOI so it’s more permanent than just a URL…
… we were trying to have a canonical ID. When it was just web publications, we dereferenced the ID. That ID doesn’t have to be dereferencable and there might not be a URL in it at all. But we need to think about the systems and what they do with the ID. In JSONLD, it becomes the identifier of the thing.
… so if you release version 2 of this, you give it a different ID - or it has a different name entirely.
… so there’s a whole host of identification issues that this brings back up
Ivan Herman: I think we have discussed this a long time ago - the difference between an ID and a URL. This is not a packaging issue. You should be able to use a URN as an identifier - even if it’s on the web. You have the address, which is different from the ID. What we have right now - regardless of packaging - is a bit too restrictive
… and that’s why we have 2 different things, the URL and the ID.
Tzviya Siegman: I think what we do need in the document is clearer language. It’s confused enough of us. Not sure how to propose that. Garth, it sounded like you had some suggestions.
Garth Conboy: I had more questions than suggestions. I found that what was in there wasn’t matching this discussion. Matt can probably work magic on it though.
Tzviya Siegman: Well, if that’s how we’re leaving it…
Matt Garrish: I can give it a try but I’m a bit confused - does it have to dereference, should it dereference?
Ivan Herman: we’ll come up with something
Laurent Le Meur: #47 should be closed then?
Ivan Herman: we can put an example there, but nothing really heavy
Ivan Herman: +1
Nick Ruffilo: +1
Luc Audrain: +1
Benjamin Young: it should stay open because it’s related to packages - and it’s a packaging driven requirement
Laurent Le Meur: in this case - i want to move to #49 - the real issue - most packages created by publishers will have no address or URL until they are exposed on the web

@llemeurfr
Copy link
Contributor Author

I added an example in the spec, of an AudioBook Manifest with a canonical id. I added a note about the fact that the WP Address is not part of the Publication Manifest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants