Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch DPV namespace IRI from the current hash to use slash instead #53

Closed
pmcb55 opened this issue Sep 30, 2022 · 4 comments
Closed

Switch DPV namespace IRI from the current hash to use slash instead #53

pmcb55 opened this issue Sep 30, 2022 · 4 comments

Comments

@pmcb55
Copy link

pmcb55 commented Sep 30, 2022

For details, see the DPV mailing list entry from yesterday (Sept 29th 2022): https://lists.w3.org/Archives/Public/public-dpvcg/2022Sep/0012.html

I knew it was 'late in the day' to be suggesting such a change, but I thought it should have been feasible given that the namespace IRIs had moved from the W3C (i.e., from http://www.w3.org/...) to w3id.org (i.e. to be https://w3id.org/...) fairly recently. So this suggested 'switch to slash' tweak would have the same impact, and since everyone using DPV already had to cope with a recent namespace change, it shouldn't be such a big deal to make another one now, before it's set-in-stone and v1.0 is finalized.

I do agree that not changing this now will result in DPV carrying technical debt forward. The effects I think will be felt immediately, and without this change, won't ever be avoidable, e.g. anyone wishing to lookup any individual term will be 'bombarded' with lots of superfluous surrounding 'noise'. For example, if I wish to know what https://w3id.org/dpv#JointDataControllers means (i.e., by just clicking on it), then I get a huge document with a daunting Table of Contents on the left-hand side. What I should get is simply info on JointDataControllers, such as QUDT provide today for: https://qudt.org/schema/qudt/CurrencyUnit, or Schema.org provides today for: https://schema.org/Person - lovely nice clean, concise, and precise descriptions of just what I asked for.

For me, these examples from QUDT and Schema.org are self-evidently better in this regard, and I'd certainly consider those vocabs (and gist) as 'authoritative' as any other vocabs today.

So to summarize my counter-arguments to Harsh's concerns:
a) it would completely break the existing IRIs and anything that uses them;

  • Yep, but DPV already made just such a change recently, and everyone seemed to cope (I assume!).

b) there are not enough benefits to offset breaking things;

  • I think the benefits are self-evident from the examples from QUDT and Schema.org.
  • It provides far more efficient/concise responses to vocab lookups (i.e., just the info you ask for, and no more).
  • It's simply the 'more correct thing to do' - i.e., every IRI anywhere should be uniquely dereferencable (and hash fragments mess with that fundamental principle of Linked Data).

c) this will delay our v1 release schedule well into 2023 - which I would prefer not to happen.

  • I don't know why this would be case, but assuming it is true, then yeah sure, probably just not worth 'fixing' this issue now!

And yeah, I'll post to the Semantic Web mailing list too, as I'd be very interested to hear thoughts from there too.

@coolharsh55
Copy link
Collaborator

Hi Pat. Thanks for opening the issue for discussion, and creating a thread on the sem-web mailing list.

I encourage people to express their preference either by participating in this discussion, or using the reaction emojis (thumb up/down) to express your opinion.

To address specific points:

a) breaking existing IRIs and that we recently had a change in IRIs - yes, but this was a necessity since w3 namespaces were not being maintained or updated in time, so the best solution there was to move to w3id. Not doing this would have meant not being able to provide new concepts/vocabularies/documentation. Also, at that time, there were only two vocabularies (dpv and dpv-gdpr) - where for both the w3 IRIs still resolve to DPV. So the amount of stuff breaking was minimal - if any. Whereas we now have potentially ~12 different IRI sets, which means this breaks more number of things. We also discussed the earlier change for several months before implementing it. Despite that, I still see people use the older IRIs in their work! So hoping that everyone follows the change may not always work out well or in time.

b) cost vs benefits: yes, there are benefits, but the question is whether they are enough to justify and offset breaking stuff (as above). This is opinionated and there is no correct quantitative answer.

c) release schedule: the delay will be from this discussion taking time for reaching a conclusion, and also because we will have to wait to hear from the community and provide them sufficient time to discover that we are proposing a major breaking change that will make things backwards incompatible, then give them more time to participate in this discussion and express their preference. Why is a "v1" important? Because until then DPV is an "in-development draft" vocabulary, and in order to encourage its usage and adoption, there needs to be stability and maturity - which is what we want to indicate with a "v1". Hence the strong push from me.

As I said in the DPV mailing list, I have no objections to your arguments per se, and they definitely make sense to me. But we do incur a significant penalty for "doing the right thing". And if everyone (or enough of us) who use or wants to use DPV supports making such a change - then it makes it easier to implement as opposed to people being caught unaware. Were this question raised last year, it would have been significantly easier to implement the proposed change for everyone involved. Since this proposal has only arrived now, it needs to be considered in the context of who will be affected and are they okay with this.

@bertvannuffelen
Copy link

bertvannuffelen commented Oct 3, 2022

Hi all,

(this is a very old discussion for which no unisone answer exists.)

There are pros and cons for fragment based URIs versus slash based URIs. In addition to the discussion one can add HTTP versus HTTPS.

In the end it comes down that one has to balance what your user community expects and the editorial effort required.

Without a long discussion on pros and cons for each, the core of the discussion is: how does the editor maintain persistency, or otherwise formulated how does the editor can guarantee semantical coherency over time.
And it is the case that the fragment approach is generally more "editor" (manual editing) friendly than the slash approach. It is all in one file ;-)

So unless the editors invest in tooling and publication methods to support slash based URIs, then I can understand their preference for fragment based approach.

BTW It might not even a choice to made by DPV but one by https://w3id.org as the domain is https://w3id.org/dpv and this is probably redirected to github pages.
So this discussion is probably a discussion with the maintainers of https://w3id.org. And generally it would be for the community even worse if https://w3id.org/{dom1} and https://w3id.org/{dom2} would implement slash URI differently.

In addition, I have to mention that unfortunately there exist W3C recommendations where the URIs are not dereferenceable :-(

@coolharsh55
Copy link
Collaborator

coolharsh55 commented Oct 3, 2022

Bert's comment above made me introspect on why did we end up with this design in the first place. I think the rationale was basically following what the community does - in this case the use of ReSpec for design, but also LODE/WIDOCO were tested for their outputs. We ended up with a custom set of scripts to produce the output because the community tools don't support even the separation of 'topics' or 'sections' within an ontology (e.g. compare WIDOCO output with DPV segregated by sections). The problem here is, as Bert noted, that tools and practices are editor-centric.

Now, the issue seems to be how to provide convenient information based on (only) what is asked. Here I generalise Pat's argument as follows:

  • If the request is for a concept, the output should be a page detailing only that concept, e.g. DataController
  • If the request is for a section, the output should be a page detailing only that section and its concepts, e.g. LegalEntities
  • If the request is for the entire spec, the output should be a page detailing all sections and their concepts

This makes me wonder if we can keep the hash-based IRIs and still do the above. I think it is feasible to have a core page with some JS that checks the IRI part, identifies the fragment, and pulls the necessary data to populate what is asked for. This should allow us to keep the existing IRIs, while still providing the sort of behaviour Pat is asking for in presenting documentation of concepts. The pros of this are what Pat has outlined. The cons of this are that one cannot get static pages for anything without first running the js script (e.g. curl/wget will not return what has been asked for, but only the basic page without js - something I don't feel nice about).

@coolharsh55
Copy link
Collaborator

// Closed as stale //

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants