Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values for dcterms:conformsTo of instances dcat:DataService (sparql endpoints, for example) #1211

Open
nfreire opened this issue Jan 24, 2020 · 40 comments

Comments

@nfreire
Copy link

nfreire commented Jan 24, 2020

Dear DXWG,

I'm working in a profile for dataset description in cultural heritage (within Europena).
I'm looking for guidance on how to specify the values that the dcterms:conformsTo values should have, and ensure machine interoperability.

I've observed some real-life cases and noticed sometimes the values are namespaces and in other cases the values are the links to the specifications.
Since we are interested in machine interoperability, namespaces look more appropriate but I'm far from certain.

Here is an example. In the case of a SPARQL endpoint...
... should namespaces defined by the protocol be used:

Is there a common practice, or recommendation, within DCAT?

Thanks in advance,
Nuno

@aisaac aisaac added the dcat label Jan 24, 2020
@aisaac
Copy link
Contributor

aisaac commented Jan 24, 2020

It seems that DCAT uses mostly URLs of specifications as in Example 45. Is it a best practice we should follow?

@andrea-perego andrea-perego added this to To do in DCAT revision via automation Feb 7, 2020
@andrea-perego andrea-perego added feedback Issues stemming from external feedback to the WG future-work issue deferred to the next standardization round labels Feb 7, 2020
@nicholascar
Copy link
Contributor

My preference would be for the use of specification URIs, not namespace URIs. This is because, according to all of the profiles thinking we have done in DXWG, a specification/profile is fundamentally a different thing from a namespace. Obviously specifications might have namespaces for their technical content but a specification is a "larger" thing than just a namespace.

The problem here is, of course, that most specifications' URIs are not set up for any sort of machine actioning so the best you could currently hope for there is that the URI provide a universally unique identifier.

I hope that, in time, specifications that wish to be well machine-readable provided Linked Data functionality for their specification URI so that you can get to both a human-readable specification document, as you can presently for a spec like, say, DCAT but that you can also get an RDF version of the specification. That specification version should probably be something like a Profiles Vocabulary description of it which then tells you where all the other profile parts, such as machine-actionable constraints etc., are.

This question is interesting for the profiles work so I'm tagging it profiles-vocabulary too.

@nicholascar nicholascar added the profiles-vocabulary For discussion of profile description vocabulary label Feb 18, 2020
@dr-shorthair
Copy link
Contributor

dr-shorthair commented Feb 19, 2020

Namespaces just give you a collection of elements with some axioms - i.e. it is the ingredients, not really the recipe. It would usually be the recipe that you want to conform to - the patterns and usage rules. These might be expressed in a machine actionable form (e.g. SHACL) but will often be supplemented by additional instructions that are expressed in natural language. As @nicholascar says, there will typically be a suite of artefacts associated with a specification. The profiles-vocabulary provides one way to describe and 'package' them. I would expect a document conforming-to the profiles vocabulary to be accessible from the specification URI (i.e. the /TR/ URI, not the /ns/ URI) (using conneg by profile of course ...)

@luizbonino
Copy link

In our case we would like to use the dcterms:conformsTo to refer to specifications of the metadata schema expressed as a SHACL document. The goal is to make explicit to which metadata schema the provided metadata record should conform to. This is used not only as information but also supports the validation of the metadata content against its schema.

@nicholascar
Copy link
Contributor

nicholascar commented Feb 19, 2020

@luizbonino you have a noble goal in mind here but the issue is that the use you're describing is not standardised therefore if you do this, others might not. The proposal in the Profiles Vocabulary, which is aiming at being a standard is to identify specifications and then link things like SHACl documents to them. The documents can have roles so you can know that Specification_X has a Resource_Y that is a SHACL file use for validation (as opposed to something else that you could use SHACL for like transformation.

I think the Profiles Vocabulary can cater for your requirement and it would have you define your specifications of interest (and profiles of them) and the relevant validation resources.

@nfreire
Copy link
Author

nfreire commented Feb 19, 2020

@luizbonino We also have a similar requirement to yours. In our work around Europeana, we also have difficulties with the use of dcterms:conformsTo for data resources. Some data sources will state that the data conforms to a namespace, others use the URL to an XML-Schema. And we know also that any of these choices will not be appropriate in cases of general data container formats. It is also necessary to know the data profile in use.

@nicholascar you pointed out the key aspects of unique identification and the machine readability.
I'm looking forward for seing the Profiles Vocabulary become a standard.

@rob-metalinkage
Copy link
Contributor

@nicholascar has pointed out that this use case is a motivation for the Profiles Vocabulary as an information resource that can address such Use Cases. Its worth pointing out that implicit in this is the HTTP-range14 problem - you need to reference the conceptual specification as @dr-shorthair suggested - and then decide what information resource it best has.

When you are talking about "schema" - thats a perfect example of where flexibility is needed for different resources that form part of the recipe for a specification. For XML that will be XSD, for JSON that will be JSON-schema, for RDF its RDFS, OWL and/or SHACL. But if using JSON you also might want JSON-LD contexts.

At this point you can both describe all the available resources using Profiles - but you can also dereference specification URI using content-negotiation and content-negotiation-by-profile to access resources directly without processing the prof description.

Ultimately you need a framework for disseminating the identifiers to the community - having a URL link to some information resource is inherently fragile and non-extensible, so you need to manage non-information-resource URIs and infrastructure to dereference these.

At the OGC I am working through publishing profile descriptions for specifications, and setting up such infrastructure - but there is a legacy of embedded document references to think about too.

Some members of the DXWG are interested in best practices for publishing specifications from a practical sense, but whether this emerges as a formal deliverable in some guidance document is not clear yet. Please include us in reviews of any proposals to formalise requirements.

@kcoyle
Copy link
Contributor

kcoyle commented Feb 20, 2020

The intent of dct:conformsTo has to do with "established standards". The full definition is:

"An established standard to which the described resource conforms."

This is where you can say that your resource conforms to ISO 2709 or Oasis-Open "legaldocml-akomantoso". An internal document or application, such as a SHACL file, doesn't really fit this definition. The profiles vocabulary may be better suited to this. I don't see PROF as a substitute for this DC term, but a statement with a different meaning.

@luizbonino
Copy link

@kcoyle , thanks you Karen, this distinction between established standards and internal documents is key for our needs. I think that, for our case, PROF would be appropriate. The PROF's example 1 (https://www.w3.org/TR/dx-prof/#eg-initial-example) is very close to what we intend.

@smrgeoinfo
Copy link
Contributor

It seems to me the important question is from the client perspective. The client parses a metadata document and finds various distributions for a resource of interest; the dc:conformsTo property should provide the criteria to identify a distribution that the client will be able to parse and use. The problem is that there is a dependency on the sophistication of the client-- some clients might require a very specific serialization of the resource representation to work (e.g. xml according to schema X, using vocabulary Y for property values), others might have the logic necessary to handle any application/xml. Thus the dc:conformsTo profile URI needs to be hierarchical or multivalued. Established standards are good, but for a particular community and client, as long as the client recognizes a conformsTo URI for a distribution it can work with, it really doesn't matter if its a 'standard'.

@kcoyle
Copy link
Contributor

kcoyle commented Feb 20, 2020

@smrgeoinfo What does matter is not to change the semantics of the dct:conformsTo, since that is defined by DCMI. If you need different semantics then you need a different property. So, yes, it does matter that it is a 'standard' although the interpretation of 'established standard' is somewhat loose. I would say that using dct:conformsTo to link from a dataset to a specific processing program or internal schema is more than a stretch, and doesn't help the client select an application vs a defined standard. Obviously there are no Dublin Core Cops to stop you, but you may not be communicating well outside of your own narrow community if you use dct:conformsTo with two very different meanings.

@nicholascar
Copy link
Contributor

nicholascar commented Feb 20, 2020

@kcoyle

I don't see PROF as a substitute for this DC term, but a statement with a different meaning.

PROF uses dcterms:conformsTo directly and as intended by its definition.

To achieve what @smrgeoinfo wants, we just need to see more things that are used as "An established standard...", as per the dcterms:conformsTo, allocated a URI.

I imagine that for a "big" standard like ISO 2709, you would use PROF to indicate conformance to it with dcterms:conformsTo and also the conformance of things to profiles of it that communities might make and ID with a URI.

@nicholascar
Copy link
Contributor

I'll just add that I've joined ISO's Technical Committee 211 with a view to ensuring that the ISO 19* series standards (about geographic information) have sensible URIs to use for conformance claims. This is partly already the case, see https://def.isotc211.org/ which shows how to generate URIs for parts of 19* standards.

@rob-metalinkage
Copy link
Contributor

rob-metalinkage commented Feb 20, 2020

Just remember there are three things happening - all of which are fairly common sense unless mixed up...

  1. identification of some conformance concept expected by some community (this is actually all a dct:Standard is) - so there needs to be a published URI shared amongst a community (this matches @smrgeoinfo view)
  2. publication of resources in various forms to allow such a conformance target to be described (all the weird and wonderful platforms that evolve )
  3. A canonical way to describe a standard and find all the resources a client may need (this is what PROF does).

So PROF isnt a pre-requisite of dct:conformsTo - it just makes the URI identifiers of conformance statement targets useful beyond simple string comparison use cases and human reading of specification documents.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Feb 21, 2020

I'm concerned that the dots are not being fully connected here.
@kcoyle points out the dct:conformsTo should point to an 'established standard`.
But how is this done? With a URI reference!

That begs the question of what do you get when you dereference the URI denoting a 'standard'?
It might be a PDF or HTML page - i.e. a classic standard document.

But you could also do content-negotiation on the same URI and get a PROF resource, which is an alternative representation of a standard, in particular to support traversal to a variety of artefacts, some of which are executable in a particular environment.

@rob-metalinkage
Copy link
Contributor

@dr-shorthair - there is no actual grounding for "established" - thats up to whoever uses the URI to determine if it meets their needs.

there is thus no control over what dereferencing will mean - only a best practice. Generally humans should be able to cope with a HTML rendering of Prof as a "Landing page" - the only thing that would be upset would be automated document harvesters that are unable to ask for the right profile (which would be advertised). Still working through this implementation detail - at least we have canonical mechanisms available now, the question is only one of transition strategy ;-)

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Feb 21, 2020

Actually I was not particularly interested in the 'established' question.
I was trying to respond to what I understood to be the original concern of @nfreire , which is whether a dcterms:conformsTo property should point to a document or to something more machine readable. I'm attempting to point out that the answer can be both, through the magic of URI references and content negotiation. Certainly it needs practices to be established, but 'Standard' (established or not) does not mean only a specification document.

@makxdekkers
Copy link
Contributor

@kcoyle
As I see it, the semantic definition of dct:Standard does not refer to a 'standard' or 'established standard'; it is defined as "A reference point against which other things can be evaluated or compared." Certainly, an ISO standard or W3C Recommendation would fit this definition but also a 'internal schema' or some other set of rules. Even the objective of a SHACL file is to compare things against, isn't it?
Having said that, I agree that if you want something more specific than dct:conformsTo, you could define a different property, possible a sub-property like foo:conformsToSpecification or foo:conformsToSchema.

@aisaac
Copy link
Contributor

aisaac commented Feb 21, 2020

Thanks everyone for the input! With @nfreire we've decided that we're going to put our bet on PROF. So ideally we would have URIs for the specifications used with dcterms:conformsTo, and expect PROF descriptions for them (after content negotation). For the time being we're going to use URLs for http://www.w3.org/TR/sparql11-query/, assuming that they are the closest we have for serving as reference URIs for specs.

@nicholascar
Copy link
Contributor

This is all food for the Profile Guidance document which looks like it’s still definitely needed!

@aisaac
Copy link
Contributor

aisaac commented Feb 25, 2020

Yes good point @nicholascar ! I guess that instead of closing it we could keep it open and tag it with "guidance" so that we don't forget to add it there. And remove the tags "profile-vocabulary" and "DCAT"...

@plehegar plehegar removed dcat profiles-vocabulary For discussion of profile description vocabulary labels Feb 25, 2020
@andrea-perego
Copy link
Contributor

andrea-perego commented Mar 26, 2021

@aisaac said:

@rob-metalinkage maybe this could be a different github issue?

Yep, better to create a separate issue, not to overload this one.

@rob-metalinkage , I've copy-pasted your comment in #1338 . I'll reply to you there.

@andrea-perego
Copy link
Contributor

@aisaac said:

My question is about protocol vs query standard, for an access service. The URI that we agreed here for a SPARQL endpoint (and was in the DCAT spec) was that of the query language (http://www.w3.org/TR/sparql11-query/).
I've just seen that a recent change in the spec uses the URI for the SPARQL protocol instead (https://www.w3.org/TR/sparql11-protocol/):
https://github.com/w3c/dxwg/pull/1310/files

Does this represent a change of approach, or some recommendation that should be followed, on what is expected wrt dct:conformsTo for data services?

You're right, @aisaac . The revision you refer to stems from discussion in #1225 - in particular, #1225 (comment) -, and I'm afraid I forgot about what discussed in this thread.

As a side question, I'm wondering whether there is anything here that the DCAT spec editors should be pushed to more specific DCAT-related guidelines such as the DCAT-AP ones. (maybe @andrea-perego or @makxdekkers know?)

Following the discussion in #1225 , we have updated the draft by providing some guidelinles - see the last part of §13.2.1 (Conformance to a standard).

But those guidelines are not discussing specifically whether you should point to the protocol or the query language specification.

@aisaac
Copy link
Contributor

aisaac commented Mar 28, 2021

@andrea-perego thanks for the answer. I must say I am mildly convinced by the discussion on #1225. The protocol URI seems to have appeared out of the blue there, without explanation.

I have tried to see if the European Data Portal could give us expectation of the usage of the protocol URI vs the query language URI. Firing the query

PREFIX dct: <http://purl.org/dc/terms/> 
SELECT distinct ?d ?s
WHERE { ?d dct:conformsTo ?s 
       FILTER contains( STR(?s), "sparql")}

at https://www.europeandataportal.eu/sparql-manager/en/
I obtained only results with https://www.w3.org/TR/sparql11-protocol/. But they seem to come from one catalogue, so I'm not sure this is a strong proof.

Maybe this is something we could ask advice from the wider group? Unless you can show me some more motivation, which I would probably accept :-)

@andrea-perego
Copy link
Contributor

@aisaac said:

@andrea-perego thanks for the answer. I must say I am mildly convinced by the discussion on #1225. The protocol URI seems to have appeared out of the blue there, without explanation.

I think it came out after I made a reference - see #1225 (comment) - to the list of URIs maintained by OSGeo, where they use https://www.w3.org/TR/sparql11-protocol :

https://github.com/OSGeo/Cat-Interop/blob/master/LinkPropertyLookupTable.csv

I have tried to see if the European Data Portal could give us expectation of the usage of the protocol URI vs the query language URI. Firing the query

[...]

at https://www.europeandataportal.eu/sparql-manager/en/
I obtained only results with https://www.w3.org/TR/sparql11-protocol/. But they seem to come from one catalogue, so I'm not sure this is a strong proof.

Maybe this is something we could ask advice from the wider group? Unless you can show me some more motivation, which I would probably accept :-)

An argument could be that what distinguishes a service / API are the protocol and query parameters, whereas it is not necessarily bound to a specific query language (there are services / APIs that support different query languages).

But we should indeed ask input from the group.

@jakubklimek
Copy link
Contributor

But they seem to come from one catalogue, so I'm not sure this is a strong proof.

@aisaac this is not proof as this comes from our Czech catalog where I used it in connection to the referenced discussion.

I agree with @andrea-perego in the argumentation that the protocol (e.g. HTTP methods and Media types) is what characterizes the data service more than the specification of the query language.

@aisaac
Copy link
Contributor

aisaac commented Apr 3, 2021

Hi, following a suggestion from my colleague @Abbe98 suggested we could have two dcterms:conformsTo, one for the language and one for the protocol. Would this be acceptable?

@andrea-perego andrea-perego modified the milestones: DCAT3 2PWD, DCAT3 3PWD May 4, 2021
@andrea-perego
Copy link
Contributor

Hi, following a suggestion from my colleague @Abbe98 suggested we could have two dcterms:conformsTo, one for the language and one for the protocol. Would this be acceptable?

Sorry for my late reply, @aisaac .

Not sure dcterms:conformsTo is the most appropriate property. Probably, a specific one should be used (e.g., dcat:queryLanguage or dcat:supportedQueryLanguage). But this links to the more general issue (still under discussion) on whether DCAT should describe the specific characteristics of the service / API interface.

The current approach is minimal: the description of the service / API interface is meant to be included in the "document" linked to from dcat:endpointDescription, whose correct interpretation is meant to be indicated by dcterms:conformsTo (e.g., the endpoint description is specified via Swagger/OpenAPI, OGC WMS / WFS / WF* Capabilities, etc.).

Said that, it might be worth extending the current approach by making DCAT able to specify at least a subset of the information in the endpoint description - as the supported query languages, formats, and profiles - if relevant for uses in scope of DCAT (e.g., search / filtering services / APIs).

@aisaac , I wonder whether you could elaborate on the requirement of having the query languages included in DCAT records, and the related use cases.

PS: Probably, we'd better open a new issue on this specific point.

@andrea-perego
Copy link
Contributor

The last DXWG plenary (https://www.w3.org/2021/05/18-dxwg-minutes#t02) discussed the option of creating a new GH repo for the Profile Guidance document, and move there all the issues labelled with profile-guidance.

As this one is also labelled with dcat, we have to decide whether to transfer it or not.

@kcoyle
Copy link
Contributor

kcoyle commented May 19, 2021

I'm not sure why this is labelled profile-guidance. I don't expect that group to be discussing individual properties. I think we can remove profile-guidance from this, and if it comes up again (?) in that group we can create our own issue.

@agreiner
Copy link
Contributor

I think it is there to remind us to explain how to go about describing a data service using DCAT.

@kcoyle
Copy link
Contributor

kcoyle commented May 19, 2021

I don't think we'll be getting into that level of detail in the guidance document. I've been thinking more along the lines of general guidance, like:

  • whether you can ignore some DCAT-defined properties
  • whether you can add other properties
  • whether you can drop some DCAT-properties but replace them with a similar property
  • what the inheritance relationship is between an AP and a vocabulary, or an AP and a related AP

If we are going to get into specific aspects of DCAT, like how to define data services, that will be quite a bit more work, and will be less extensible to profiles in general.

I can go either way, but we should decide on our scope ASAP.

@agreiner
Copy link
Contributor

Looking over the thread again, I think my initial impression was just wrong.

@andrea-perego
Copy link
Contributor

Thanks, @kcoyle & @agreiner .

I'm removing the profile-guidance tag, then.

@rob-metalinkage
Copy link
Contributor

The underlying issue is we (the wider world) have a hole around generic description of conformance and specifications and relationship to data, and another one around direct vs. service orientated access to resources. PROF helps but only handles two immediate needs around direct inheritance and qualified link to support annotation of resources that may be used to describe a specification.

Specification models - description of types of conformance target - seem to be the key - how to separately describe a range of specifications in play, including but not limited to:

  1. service API
  2. implemented profile of service API
  3. self-description interface (e.g. OpenAPI) for the service
  4. query/parameterisation model
  5. data model of parameters
  6. data model(s) of returned data - again base+ profile viewpoints
  7. encoding/formats
  8. descriptive metadata attached to these various aspects.

Probably a good thing for DCAT to separate the concerns here and consider canonical properties for different aspects, or a qualified role mechanism.

@aisaac
Copy link
Contributor

aisaac commented May 30, 2021

Hi, sorry for having let so much time pass.

First, I can confirm that my comment was indeed not about profile guidance in general, only about the specific DCAT model.

Second, to react to @rob-metalinkage , while I see there might be a need for more advanced machinery, for the moment I can live with anything simple that allows me to indicate that a data service complies with SPARQL. This was the requirement, @andrea-perego ! And whether the solution based on a 'document' would be acceptable for data consumers, I guess this should be passed on the owners of the European Data Portal, as I am no owner of catalogue of data service, just a mere data provider :-)

Originally we had agreed that this requirement was to be achieved by using the URI of the SPARQL language with dcterms:conformsTo. If this is deemed less appropriate than using the URI of the SPARQL protocol, I can live with it. Especially because there is a one-to-one relationship between the SPARQL protocol and the query language. .
The first issue is that if there are protocols that allow special (profiles of) query language, then the indication of the protocol alone wouldn't be enough. And then this may require that people who have used dcterms:conformsTo with a protocol URI would add some statements with language URIs. But I guess it's part of the game.
The second issue, and probably harder to overcome, is that intuitively it makes a lot of sense that the data service would be related to protocols and languages by dcterms:conformsTo or some specialization of it, because the definition of that generic property in Dublin Core (''An established standard to which the described resource conforms.") does seem to match the requirement...

@aisaac
Copy link
Contributor

aisaac commented Jan 14, 2022

Hi, I would like to know if this issue is still tracked somewhere. I'm still keen to discuss this with my colleague @nfreire !
For the moment we're still keeping the dcterms:conformsTo statement with the value http://www.w3.org/TR/sparql11-query/ for the distribution that corresponds to a SPARQL endpoint. But we can still can change it later if the consensus here would argue for it.

@andrea-perego andrea-perego modified the milestones: DCAT3 3PWD, DCAT3 4PWD Jan 26, 2022
@davebrowning davebrowning added this to To analyse in DCAT: Potential new requirements via automation Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dcat:DataService dcat dct:conformsTo feedback Issues stemming from external feedback to the WG future-work issue deferred to the next standardization round
Development

No branches or pull requests