Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of dct:type with both Class and Concept #314

Open
lvdbrink opened this issue Aug 17, 2018 · 38 comments

Comments

@lvdbrink
Copy link
Contributor

@lvdbrink lvdbrink commented Aug 17, 2018

In DCAT revised edition, classifying-dataset-types, dct:type is used with a range of skos:Concept, while the range of dct:type is formally rdfs:Class as defined in Dublin Core.

In the examples, both rdfs:Classes and skos:Concepts are used as object. While this may not be wrong per se, a consequence of this is that OWL-Full comes into play. Every instance of a skos:Concept becomes a rdfs:Class as well.

I'm not sure if this is intended by the DCAT editors?

I see two solutions:

  • change this in Dublin Core allowing skos:Concept in the range of dct:type, but this is of course outside the direct influence of this group;
  • use dc:type instead of dct:type.

Thanks to @architolk for pointing this out.

@rob-metalinkage

This comment has been minimized.

Copy link
Contributor

@rob-metalinkage rob-metalinkage commented Aug 17, 2018

Really interested in the take of others here - there are a couple of ways of looking at this - but I have never been able to find a cogent argument why we shouldnt assume that a rdfs:Class is actually a type of skos:Concept - classes are nothing more that concepts that define sets of instances.

It seems to be explicitly supported in OWL 2 as "punning" [https://www.w3.org/2007/OWL/wiki/Punning]

It seems a perfectly natural Use Case to me to model skos:Concepts as types in systems, then generate rdfs: and owl Class models only if and when we need to model additional behaviours.

skos:Concept is relevant for "soft-typing" and rdfs:Class for hard-typing - and the equivalence is actucally a useful thing.

Is OWL-Full really a problem? I think not for several reasons:

  1. I dont see evidence that OWL-DL (or any other flavour) inferencing is happening at run-time across distributed systems - all "semantic web" implementations I have see cache any models they want to use.
  2. There is currently no way of telling a client, not specification, constraining referenced models to be any particular profile of OWL - so no assumptions can be made anyway
  3. With negotiation-by-profile the same concept can be served with a graph that conforms to SKOS, RDFS, OWL-DL, OWL-Full, SHACL and any other specific profile needed by a client.

IMHO there is a need to provide a specific example of the problem and why its a problem, and how to handle the use cases of soft-typing.

My feel here, although I can't prove it, is that negotiation by profile and OWL 2 punning are two sides of the same coin - implementation and theory, and essentially we can get out of the bind by the URI dereferencing architecture - OWL-DL reasoners can ask for the metaclass representation they want.

Default behaviour for OWL - i.e. if a client asks for OWL perhaps an OWL-DL representation SHOULD be returned. I dont know if the profile guidance or content negotiation scope will allow us to go into this platform specific detail - or where and who in W3 cares about the general architecture of distributed OWL models?

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

@dr-shorthair dr-shorthair commented Aug 20, 2018

'Hard' typing just means using rdf:type for classification.
'Soft' typing means using anything else (e.g. dct:type).

The range of rdf:type is rdfs:Class and standard RDFS entailments mean that an individual is also a member of the super-classes of the asserted classifier.

As @lvdbrink points out, the range of dct:type is also rdfs:Class, but no other entailments follow.

The use of either rdf:type or dct:type entails that the value is an rdfs:Class regardless of whether it was originally declared as such - so if it was defined as a skos:Concept it also becomes a rdfs:Class.

The use of any predicate other than rdf:type for classification has no RDFS significance, but nevertheless might be given significance in a particular application.

This doesn't say anything y'all don't know already, but maybe puts it in perspective.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

@dr-shorthair dr-shorthair commented Aug 20, 2018

In an email that was not reflected into this issue, @kcoyle pointed out that the SKOS editors explicitly declined to introduce a constraint that a skos:Concept may not also be a Class:

"3.5.1. SKOS Concepts, OWL Classes and OWL Properties

Other than the assertion that skos:Concept is an instance of owl:Class, this specification does not make any additional statement about the formal relationship between the class of SKOS concepts and the class of OWL classes. The decision not to make any such statement has been made to allow applications the freedom to explore different design patterns for working with SKOS in combination with OWL."

https://www.w3.org/TR/2009/REC-skos-reference-20090818/#concepts

So I'm inclined to accept their (the SKOS editors) invitation and go with the flow - i.e. no change required to DCAT or Dublin Core because of dct:type entailments.

@rob-metalinkage

This comment has been minimized.

Copy link
Contributor

@rob-metalinkage rob-metalinkage commented Aug 20, 2018

+1

Is it a DCAt profile guidance issue however to note that use of skos:Concept is fine for dct:type, but by doing so you are explicitly accepting OWL punning, and if you need to keep class and instance models separate you probably need content negotiation by profile?

@jakubklimek

This comment has been minimized.

Copy link
Contributor

@jakubklimek jakubklimek commented Aug 20, 2018

@lvdbrink I would advise against considering usage of dc:type instead of dct:type as the whole dc elements namespace was deprecated by dc terms for use in Linked Data.

IMHO there is a need to provide a specific example of the problem and why its a problem, and how to handle the use cases of soft-typing.

@rob-metalinkage A specific example of a problem would be an inference enabled Linked Data visualizer (or repository such as RDF4J). A typical discovery query is asking for all rdfs:Class and owl:Class instances in a SPARQL endpoint to see what data is there. With inferencing enabled, all instances of skos:Concepts used in a DCAT-rev data catalog to categorize datasets using dct:type could be unintentionally returned as instances of rdfs:Class with no actual instances (those would use rdf:type) causing all kinds of confusions.

I would say that in this case, using dct:type is not worth it just for the sake of reusing an existing property, due to these unintentional side effects. I would suggest either

  1. Explicitly say that the dataset categories are rdfs:Classes, and they should be used as rdf:types
  2. Say that the dataset categories are not implicit rdfs:Classes, and use another property for their attachment, with no "concealed" side effects such ass OWL punning

The fact that as of now, the group does not see an immediate problem does not mean that this will be OK in near future, where, e.g. automated inferencing could become more spread. IMHO better to be safe and unless the inference is intentional (which it is not in this case), I would steer clear of it.

@kcoyle

This comment has been minimized.

Copy link
Contributor

@kcoyle kcoyle commented Aug 20, 2018

I don't know if this helps or hurts in this particular situation, but note that the DCMI community is on the verge of revising DC Terms to move from the standard RDF "range" definitions to a schema.org-like use of "expected values". This means that the stated "range" would no longer be suitable for inferencing (IMO) but instead serves the role of conformance. I believe the upshot of this is that all properties with "expected values" would be annotation properties.

This has not yet been entirely agreed, but is currently under discussion.

@rob-metalinkage

This comment has been minimized.

Copy link
Contributor

@rob-metalinkage rob-metalinkage commented Aug 20, 2018

@jakubklimek - I think you have summed the underlying issue up with " A typical discovery query is asking for all rdfs:Class and owl:Class instances in a SPARQL endpoint to see what data is there." - we have seen such patterns across a range of platforms - where generally there seems to need to be an implicit contract that the client has some sense of the scope and size of a dataset before issuing a query.

I would suggest:

  1. you could make a tighter query if you only wanted classes with instances - losing expressivitiy without a strong driver and a documented "contract" between producer and consumer is a poor trade-off
  2. you should document whats behind an endpoint - "discovery queries" are unsafe (I'm used to spatial and observation data where unbounded discovery queries could return petabytes of data just as easily)
  • RDF datacube and VoiD etc would be a start
  1. IMHO Linked Data is about making links replace discovery - making relationships explicit, transparent and dereferenceable as documentation about the link meaning.
  2. A client has no means of knowing if inferencing is enabled anyway - so cant really trust any results AFAICT
  3. There is a fundamental scalability issue with discovery queries - it might work for a few desktop projects, but there is no evidence I can see that it scales up to either the heterogeneity or numbers of real world.

(I go to a bit of detail here, because this is generally relevant to the drivers for DCAT - ability to describe whats in a dataset and accessible via its end point. My opinion is that if we wanted to consider such an architectural constraint we would need to have a formal documented use case that we can discuss and accept as within scope. I'd be fascinated to see a compelling Use Case for client discovery and access of content starting with, as per your example, an explicit contract that a class model is available at all. If we can see a workable approach it would probably inform us as to the bare minimum MUST have metadata that a dataset description needs to support such an architecture. )

@architolk

This comment has been minimized.

Copy link

@architolk architolk commented Aug 22, 2018

@pmaria

This comment has been minimized.

Copy link

@pmaria pmaria commented Aug 24, 2018

As @architolk states, in our (Dutch government related) case, we apply a modeling approach in which we maintain a clear distinction between instances of skos:Concept and instances of rdfs:Class. For us a skos:Concept represents a unit of thought, and a rdfs:Class a set of particular things which may or may not contain manifestations of a unit of thought. We do this for a variety of reasons, one of which is to keep the door open for OWL-DL reasoning, should we, or our consumers, wish to apply it.

IMHO a fundamentally important standard/recommendation like DCAT shouldn't "force" a more complicated reasoning pattern onto its users. At least, not without a very good reason.

The question for use cases is understandable, however it's quite hard to imagine upfront what data consumers will want to do with the data. We really do not know. That's why we strive for an open modeling approach that shuts as little doors as possible.

Therefore, I strongly agree with the suggestions that @jakubklimek makes above. And I have a strong preference for his second suggestion.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

@dr-shorthair dr-shorthair commented Aug 26, 2018

There is clearly a tension here between the part of the community who still see some promise in web-scale reasoning, and who therefore wish to keep entailments clean and be protected from punning, and the part of the community which is comfortable with gentler semantics as promulgated by schema.org.

SKOS had hedged its bets until now. DC started soft, then veered into stronger semantics with DC-TERMS, but @kcoyle is now reporting a plan to revert to a softer position now aligned with schema.org.

Since it used both SKOS and DC-TERMS, I had assumed that DCAT fell into the soft semantics camp. An implication of that is that anyone who wants stronger reasoning must be selective about what is loaded, and maybe also cull the graph of things like dct:type rdfs:range rdfs:Class . prior to inferencing.

@jakubklimek, @architolk and @pmaria appear to now be advocating that we take DCAT into a stricter direction. Given its heritage and installed base this could be a significant change so we need to be clear about potential side-effects and be careful about these.

(FWIW @architolk I don't think we can be strongly influenced by the behaviour of a single IDE like TopBraid. I'm a TopBraid user myself, but am aware that it makes a bunch of assumptions not all of which are useful and which are not necessarily aligned with the broad community understandings.)

@rob-metalinkage

This comment has been minimized.

Copy link
Contributor

@rob-metalinkage rob-metalinkage commented Aug 27, 2018

There is nothing that forces anyone to resolve a dct:type object reference and do any inferencing over it.

If you chose to load both the dcterms RDFS model and resolve the dct:type reference and find the RDFS model for the referred object, then you would naturally have to accept the intention of the data publisher (not the DCAT specification) that any skos:Concepts are indeed "units of thought" that represent (and entail) rdfs:Class

Distributed reasoning means there must be sophisticated contract that actually makes URI references to objects, intrinsically as instances of things, link to a class model.

TopBraid, for example, has a bunch of built in assumptions that some graphs are loaded - and this is a mixture of explicit OWL imports, (perhaps also TBC controlled imports smuggled into comments in TTL files?) , and reflection based on "magic" patterns in file names in projects, whose Eclipse UI controlled open-or-closed state determines if they are loaded. Its kind of horrible, but seems a fair reflection of the reality that the application context is responsible for determining the graph to reason over, and any entailment regimes.

So, I dont see a reason why the project discussed cannot make its mind up that all references must have a rdfs:Class axiomitisation, and that resources resolved from URIs they use are constrained to be OWL-DL. I think to flip the argument on its head, it seems unwise to force such assumptions on everyone. DCAT users should be free to use whatever reasoning and entailments they choose.

That said, the existence of examples that do implicity rely on (perfectly legal) OWL punning interpretations should carry an explanation that some examples do assume an environment where punning between skos:Concepts and rdfs:Class is allowed, and that DCAT itself does not dictate this either way.

@architolk

This comment has been minimized.

Copy link

@architolk architolk commented Aug 27, 2018

@jakubklimek

This comment has been minimized.

Copy link
Contributor

@jakubklimek jakubklimek commented Aug 27, 2018

@dr-shorthair Yes, I am always for a stricter version. It is actually the use of dct:type with skos:Concepts which has the (completely unintentional) side effects.

@kcoyle Thanks for the heads up. Nevertheless, since the loosening of dcterms rdfs:ranges is still in the discussion phase, it can go either way. I personally would be against any loosening, which only allows more mess to be created and makes any reasonable application on top of such data complicated (too many options).

Regarding TopBraid behaviour, this only emphasizes what can happen when theses side effects are ignored. My exploration query use case is another one.

I strongly believe that inference is not something we can ignore because "it can be turned off if it causes problems". It needs to be taken into account as a natural consequence of using RDF and RDFS. It is quite simple really. We need to start from a use case. The use case is that we want to classify datasets with skos:Concepts. That is fine, but the property dcterms:type is not a good fit for this, because it has a range of rdfs:Class, which would make all used concepts classes, which is something unintended. Therefore, we need another property without such side effects.

@rob-metalinkage I think that placing exploratory queries on unknown endpoints is perfectly legal, and it is in fact the only way of determining that is stored inside - leveraging vocabulary reuse and inferencing (more on that topic in our Survey of Tools for Linked Data Consumption btw). I admit I am a bit lost in your extensive argumentation above, but the situation often is that you have a URL of some foreign SPARQL endpoint and you want to see (automatically) at least something about what is inside - the contract therefore is "here is a SPARQL endpoint, do your best with SPARQL".

@makxdekkers

This comment has been minimized.

Copy link
Contributor

@makxdekkers makxdekkers commented Aug 27, 2018

@jakubklimek Is there a use case that says we want to classify datasets with skos:Concept? Such a use case is not in https://w3c.github.io/dxwg/ucr/.
In the current draft of the new specification at https://w3c.github.io/dxwg/dcat/#Property:resource_type, there is no mention of skos:Concept as range of dct:type. It states correctly that the range of dct:type is rdfs:Class. So far, so good.
Maybe the problem is that there is mention of MARC intellectual resource types which are indeed expliclitly defined as skos:Concept. Could the solution be to remove that example?
Of the other four examples, the terms of the DCMI Type vocabulary are defined as rdf:Class. I don't know whether the other three (ISO19115, DataCite, re3data) have been published as RDF at all. The links link to text or XML enumeration.

@jakubklimek

This comment has been minimized.

Copy link
Contributor

@jakubklimek jakubklimek commented Aug 27, 2018

@makxdekkers I was referring to the initial issue by @lvdbrink, i did not investigate further. If there is no such requirement, maybe it indeed can be resolved by removing the MARC examples, but then another issue arises an that is how to classify datasets using MARC intellectual resource types, and we are back at specifying a new property for that, unless we say it is out of scope.

@kcoyle

This comment has been minimized.

Copy link
Contributor

@kcoyle kcoyle commented Aug 27, 2018

@jakubklimek We have heard the argument that you make about loosening, but the fact is that the inclusion of precise ranges does not constrain the use of the terms. Our preference is that constraint take place in application profiles rather than in the definition of the terms, since usage patterns show that there are often different range needs. But I must say that both arguments make sense, and some years in the future, after linked data has matured, we will find out which one we should have chosen.

@kcoyle

This comment has been minimized.

Copy link
Contributor

@kcoyle kcoyle commented Aug 27, 2018

@jakubklimek @makxdekkers I see no reason why the MARC types cannot be dct:type(s). skos:Concept is an instance of owl:Class, so there would be no conflict. No?

@jakubklimek

This comment has been minimized.

Copy link
Contributor

@jakubklimek jakubklimek commented Aug 27, 2018

skos:Concept is an instance of owl:Class, so there would be no conflict.

@kcoyle actually that is exactly the issue. skos:Concept is an instance of owl:Class - it is a class of all concepts. Then, individual concepts are instances, not subclasses of skos:Concept. Specifically, they themselves are not owl:Classes, unless specified explicitly somewhere else, which is by design.

However, using the individual concepts with dcterms:type entails the concept used is an rdfs:Class - it can have instances. And that is the issue - mere usage of a concept with DCAT this way would unintentionally define that it can have instances, which might not be desirable, as described by at least two use cases here.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

@dr-shorthair dr-shorthair commented Aug 27, 2018

@kcoyle indeed, I already included MARC in the examples - see https://w3c.github.io/dxwg/dcat/#classifying-dataset-types

@rob-metalinkage

This comment has been minimized.

Copy link
Contributor

@rob-metalinkage rob-metalinkage commented Aug 27, 2018

The issue is that is legal to use of a skos:Concept where a range is rdfs:Class, and this then is an explicit case of OWL punning. (I believe this is "intended").

Whether there is another unstated contract that OWL-Full semantics may not be used as intended is a separate matter - i.e. is there a Use Case from which we may derive a requirement that OWL-DL semantics MUST be supported by use of DCAT?

I'm not stating that this unreasonable, just that we dont have evidence to force us to make such a constraint at this stage.

I also wonder whether this is a case where the best approach could be an explicit OWL-DL profile of DCAT, where OWL-DL reasoning can be assumed?

I also think validation or identification of OWL profile is probably an necessary infrastructure demand if we want to enforce anything - its not IMHO reasonable to make all stakeholders expert in these matters.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

@dr-shorthair dr-shorthair commented Aug 28, 2018

the best approach could be an explicit OWL-DL profile of DCAT, where OWL-DL reasoning can be assumed

Indeed.

@jakubklimek

This comment has been minimized.

Copy link
Contributor

@jakubklimek jakubklimek commented Aug 28, 2018

The issue is that is legal to use of a skos:Concept where a range is rdfs:Class, and this then is an explicit case of OWL punning. (I believe this is "intended").

Sure, it is legal to use anything with any RDFS range definition. Only the consequence is that the "anything" becomes an instance of the RDFS range. And that is what I am talking about here. The act of usage of an instance of a skos:Concept with something that has RDFS range of rdfs:Class makes that something and instance of rdfs:Class. That, in my opinion, is unintended. People will simply want to describe datasets with metadata. And, many people will not realize this (as they are not experts).

Whether or not one discovers such effect, i.e. performs inference, is another matter, but the problem is already in there regardless of that. So, my argument is not to cause such side effects just for the sake of reusing dct:type. If we are going to write examples where skos:Concepts are to be used, in a sense other than with dcat:theme, we should have another property for that. Then you do not have to assume anything more.

@makxdekkers

This comment has been minimized.

Copy link
Contributor

@makxdekkers makxdekkers commented Aug 28, 2018

@jakubklimek I am just wondering why you think it is 'unintended' that an instance of skos:Concept used to classify datasets would also be an instance of rdfs:Class.
It's not a case of us saying that all instances of any skos:Concept are also instances of rdfs:Class, just the ones that are being used with dct:type. Could it be solved with a warning at https://w3c.github.io/dxwg/dcat/#Property:resource_type?
As far as I am aware, I don't think there was an explicit intention at DCMI to exclude the use of skos:Concept as object for dct:type. We could ask the people over there, e.g. @tombaker?

@jakubklimek

This comment has been minimized.

Copy link
Contributor

@jakubklimek jakubklimek commented Aug 28, 2018

@makxdekkers

I am just wondering why you think it is 'unintended' that an instance of skos:Concept used to classify datasets would also be an instance of rdfs:Class

OK, let me try to explain it another way. Let's say I have my own skos:ConceptScheme for classifying datasets. There are skos:Concepts for, e.g. genres. According to SKOS, it is intentionally undefined whether those concepts are also rdfs:Classes or not, i.e. it is up to the publisher. So, since it is my skos:ConceptScheme, I decide I do not want them to be rdfs:Classes.

Now, I want to use those concepts to classify my datasets using DCAT. I find dct:type, since that is the property DCAT users will expect for this according to its description in DCAT, and I want them to be able to understand my data. But I still have no intention of my concepts to become rdfs:Classes. However, by using dct:type to link to them, I effectively made them rdfs:Classes thanks to dct:types rdfs:range definition.

My question here is: "Why do my concepts have to become rdfs:Classes just because I want to use them with DCAT recommended property for genre classification?" Classifying dataset with a genre should keep the genre intact, and not imply something about it that was not there before I used it.

The bottom line here I think is, if I wanted the genre concepts to be used as classes, I would have made them classes myself, explicitly, and then probably used them with rdf:type, not dct:type.

Could it be solved with a warning

Well, a warning is the least I would expect there. But the question is why there should be something that needs a usage warning? Do we need the property for classification to be dct:type so bad?

I don't think there was an explicit intention at DCMI to exclude the use of skos:Concept as object for dct:type

And I do not to say that it is/should be excluded to use skos:Concepts as objects for dct:type.
I say that it unintentionally entails information about the used concepts that might not have been there before.
If the concepts were also classes before usage, everything is fine.
But if they were not classes before usage, they become ones just because they were used with DCAT and dct:type. That I think will not be the intention of users of DCAT, and often enough they will not be able to forsee the effects of this, nor should they.

@kcoyle

This comment has been minimized.

Copy link
Contributor

@kcoyle kcoyle commented Aug 28, 2018

I've finally gotten my head around this (sorry it took so long). The problem as I see it is not in dct nor in DCAT, but in the fact that some communities are still using controlled term lists rather than classes to "classify" types. These are generally carry-overs from older metadata practices that had no concept of "class". A loosening of dct:type could make a choice like this more "valid":
dct:type http://purl.org/dc/dcmitype/Text ;
dct:type http://id.loc.gov/vocabulary/marcgt/man ;
dct:type http://registry.it.csiro.au/def/datacite/resourceType/Text ;
dct:type http://registry.it.csiro.au/def/re3data/contentType/doc ;

but I don't think that is the main issue here. The question that I see, instead, is whether there is a negative to be found in declaring something like http://id.loc.gov/vocabulary/marcgt/man to be a class when it is used in that way. To me, it serves the same conceptual role as a class and re-casting it as a class is taking it in the direction in which it should go when used in RDF.

I do not think it would be better to have two properties -

  • dct:type for types that are defined as classes
  • dcat:(someName) for types that are defined as instances

And I don't see a way to have a single property that has a range of both classes and instances unless defined as an annotation property, which has no advantages whatsoever, AFAIK.

My other comment is that although terms lists like those at id.loc.gov already exist, unless one intends to use a significant number of the terms it may be best to define ones' own list of classes, with some links to related classes or terms from other environments. I don't know if there is an analysis of the types that are likely to be useful to DCAT, but my gut feeling is that few of the id.loc.gov content types[1], carrier types[2] or media types[3] will be appropriate, so adding these to the DCAT mix may cause more problems than they solve. These lists and others like them should be replaced by RDF-appropriate classes as their communities move to the use of RDF (although I would not place bets on that happening in my lifetime).

My vote would be: don't use lists from outdated metadata practices; do the right thing and create RDF-appropriate classes.

[1] http://id.loc.gov/vocabulary/contentTypes.html
[2] http://id.loc.gov/vocabulary/carriers.html
[3] http://id.loc.gov/vocabulary/mediaTypes.html

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

@dr-shorthair dr-shorthair commented Aug 28, 2018

@kcoyle wrote -

the fact that some communities are still using controlled term lists rather than classes ...
... don't use lists from outdated metadata practices; ...

Karen - not sure which world you are living in, but it sounds like a very enlightened and privileged one ;-)
Back in the one where I spend my days it is hard enough getting term-vocabularies published on the web at all, so SKOS is often all we can hope for. In fact, you can get a long way with SKOS++ and there is significant innovation in this space - look at QUDT for example where all the key classes are sub-classes of skos:Concept, so individual classifiers are all individual skos:Concepts. And look at the NERC Vocabulary Service which has about 40,000 skos:Concepts and is used widely in the earth and environmental sciences. Now I agree that not all of these would be used as high-level classifiers in the dct:type slot, but some of them would.

Overall, if we disallow the use of SKOS for classification vocabularies I believe we consign DCAT to oblivion.

I also think it is big mistake to propose minting our own sets of term-lists where respectable authorities have already published lists with a URI-per-term. The fact that they are sometimes not described according to perfect DL-conformant OWL is much less important than the fact that an important authority (like LoC) is providing an important service to the linked data community by carrying over their legacy of analysis into a more modern platform. And we don't take on an additional maintenance burden.

Don't let "perfect" be the enemy of "good-enough" - actually more like "really quite good given the level of organizational engagement and community acceptance".

@makxdekkers

This comment has been minimized.

Copy link
Contributor

@makxdekkers makxdekkers commented Aug 29, 2018

I agree with @dr-shorthair that DCAT will lose relevance if we are too strict.
On the other hand, some of the machinery that we're using does care about strict rules; for example, using SHACL you can only validate that the object of a particular statement is an instance of a certain, expliclty defined class: the skos:Concept http://id.loc.gov/vocabulary/marcgt/man fails the test for rdfs:Class. A human observer may not object to it but SHACL definitely does. I've seen a work-around in SHACL to just test whether there is a URI, and not look further into it. So, you could stick any URI into the statement and SHACL would not be able to catch it.
In a way, using rangeIncludes instead of rdfs:range makes the problem go away, but it would make the validation of objects with SHACL less clear (maybe impossible?).

@kcoyle

This comment has been minimized.

Copy link
Contributor

@kcoyle kcoyle commented Aug 29, 2018

@makxdekkers Since "rangeIncludes" is not an RDF standard concept, it wouldn't be treated as a rdfs:range by SHACL. rangeIncludes could be defined in a validation document to be whatever you want it to be. It would be just another locally defined property, which is what it is in schema.org.

But in any case, to use a less strict definition it seems that DCAT should define its own property because dct:type is already defined with a specific range. If you want to include as values skos:concepts, URIs for classes, and perhaps also literals, you'll need a property with no rdfs:range, AFAIK.

@makxdekkers

This comment has been minimized.

Copy link
Contributor

@makxdekkers makxdekkers commented Aug 29, 2018

@kcoyle We could create a new property dcat:datasettype or something similar, based on the arguments in this discussion. However, there is already existing practice: the EU DCAT-AP specifies the use of dct:type with skos:Concept and the EU GeoDCAT-AP uses dct:type with objects from http://inspire.ec.europa.eu/codelist/ResourceType and http://inspire.ec.europa.eu/codelist/SpatialDataServiceType, both of which are defined as skos:ConceptScheme.
If this is wrong practice, those profiles will have to be revised. The question is how easy it will be to convince people that it is necessary -- given that it took us in this group two weeks to get our heads around the issue.

@kcoyle

This comment has been minimized.

Copy link
Contributor

@kcoyle kcoyle commented Aug 29, 2018

@makxdekkers sigh This is a perfect example of why minimum semantics on property definitions is better. It also further convinces me that application profiles are where ALL constraints should be defined. I would prefer that APs use AP-specific constraint terms and not RDF domains and ranges because an AP is defining constraints not axioms for inferencing. This is what schema.org does - it uses very little from RDF, and defines its own terms for literals (schema:Text), URLs (schema:URL) and integers (schema:Integer), as well as for domains and ranges.

Meanwhile, a lot of the use of DC terms does not adhere to the domains and ranges by which they are defined. The world may be ending, but not for that reason. ;-)

@jakubklimek

This comment has been minimized.

Copy link
Contributor

@jakubklimek jakubklimek commented Aug 30, 2018

If this is wrong practice, those profiles will have to be revised.

@makxdekkers I think those profiles will have to be revised after DCAT revision anyway.

@kcoyle Isn't the main goal of DCAT to increase interoperability? If there is a set of different APs, each with a different set of restrictions and specifics, those will not be interoperable. What is the point then?

I actually view what schema.org does as a bit evil. It creates mess in the data, making its processing so hard, only people with a lot of resources are able to process it. I think that having a semantically clean model is a must, and publishers have to be properly motivated to publish their data right. This means there have to be applications able to (easily) process the data (think e.g. simple catalog software). This in my opinion will not be possible if we allow things like properties without ranges, or properties allowing both resources and literals as values.

Anyway, we digressed a bit here. The original issue was with the dct:type property. This property is already established. If it was used in a wrong way in DCAT2014, and this was then "imported" to DCAT-AP, and from there to GeoDCAT-AP, I think now is the time to correct that by introducing the new property for this, with clearly established range (skos:Concept) - and I really do not think this is "too strict".

Btw. errors like this in DCAT2014 were already fixed in the revision (e.g. the literal used as a media type in DCAT2014 examples #170, which some implementations started using) so this would not be the first case.

@rob-metalinkage

This comment has been minimized.

Copy link
Contributor

@rob-metalinkage rob-metalinkage commented Aug 30, 2018

It sounds like the nice solution would be if dct:type was relaxed to range rdfs:Resource

However - it is legal to have a skos:Concept as the target of dct:type - it just needs to be recognised that the intent is thus to treat these targets as rdfs:Classes too.

If the target is declared to be both a skos:Concept and a rdfs:Class already - then having two different predicates adds a complication - do you need to fill in both?

If we want to enforce an OWL-DL compatible profile of DCAT. whereby referenced resources are also resolved to return OWL-DL compatible resources, how do we specify this, enforce it and validate it? This is why I think there needs to be an explict Use Case for OWL-DL semantics to give us requirements - because the current examples are not "used in a wrong way" - they are just used in an OWL-Full way, and nothing at this stage says this is actually wrong AFAICT.

So, if we are to support an OWL-DL compatibility constraint, we need to establish the exact requirements and explore available solutions. A new predicate is a specific solution to a requirement we dont formally recognise (at this stage) IMHO.

If we have such a requirement, we then need to make a decision if this is a matter for DCAT core or for a profile of DCAT.

Perhaps writing a OWL-DL profile of DCAT - that enforces such constraints would be a good exercise anyway - then we can consider how much could be migrated to DCAT core, but right now we are guessing a little how it would be testable in practice, which is a requirement for W3C reccomendations.

@makxdekkers

This comment has been minimized.

Copy link
Contributor

@makxdekkers makxdekkers commented Aug 30, 2018

@jakubklimek It is true that existing profiles may have to be revised as a result of the revision of DCAT, but I was hoping to keep that to the absolutely necessary minimum. We always run the risk that implementers do not feel it is useful to convert data they already have. I agree that if they've done it 'wrong', the new DCAT should not endorse the practice -- e.g in this case of dct:type -- but there is no guarantee that people will have the resources to make the change.

@makxdekkers

This comment has been minimized.

Copy link
Contributor

@makxdekkers makxdekkers commented Aug 30, 2018

@jakubklimek You wrote "If there is a set of different APs, each with a different set of restrictions and specifics, those will not be interoperable. What is the point then?".
People will always implement a standard in their own way, maybe because of their (mis)understanding or maybe because their situation is a little different. Documenting their assumptions and decisions in a profile makes it easier for people with a different set of assumptions and decisions to understand how to interoperate with them, maybe using mapping or conversion tools.
There should be a sweet spot: keeping the standard flexible where possible and strict where necessary.

@jakubklimek

This comment has been minimized.

Copy link
Contributor

@jakubklimek jakubklimek commented Aug 30, 2018

@makxdekkers You are right in both points.

People will always implement a standard in their own way, maybe because of their (mis)understanding or maybe because their situation is a little different.

I like your optimism. In my experience, people usually implement standards only to the minimum degree where no important users/decision makers complain. And decision makers usually complain only when there is a web page with something about their data written in red (i.e. validation errors), making them look bad. So for me, it is important to be able to validate as much as possible automatically, because I know that what cannot be validated automatically, will not get fixed/published right.

Otherwise, I understand the motivation to find the sweet spot and to use application profiles.

@tombaker

This comment has been minimized.

Copy link

@tombaker tombaker commented Aug 30, 2018

@kcoyle

This comment has been minimized.

Copy link
Contributor

@kcoyle kcoyle commented Aug 30, 2018

@jakubklimek The DCMI goal is to be able to use application profiles directly for validation - thus validation works on the metadata creator's set of rules, and presumably both the creator and the user are using the same rules. It seems to be that this would improve interoperability because the metadata definitions would be explicit and testable.

@starred75

This comment has been minimized.

Copy link

@starred75 starred75 commented Aug 30, 2018

@tombaker brings good news for the future of dcterms.

I think at the basis there is a huge problem with the collapsing of the local names of two very relevant properties. Oh, yes, they are different URIs but I would not use any locally named "type" property when there is such a cumbersome guy called "rdf:type" around.
Certainly none is to be blamed: DC was born after RDF (but RDF's specs are younger than DC's), surely when they got married both of them already had their own "type" properties.

Going to logical aspects, I'm with @jakubklimek and @lvdbrink. Yes there's punning, skos:Concepts can be rdfs:Class or owl:Class (it's written in the skos specs as well) etc...
...yet having a property which is meant to point to rdfs:Class and then it is going towards generic resources - probably owl:Individuals if we were in OWL, and I read in some comment that not all of the suggested examples are skos concepts, yet I don't think we are expecting classes - is not gonna help keeping things clean. One thing is that, to the purpose of one ontology somehow merged/imported etc.. with a thesaurus, you want to say that a certain skos:Concept is also a class, one thing is deliberately using a property in the (IMHO) wrong way.
Tom is right, probably DCMI went too far at that time.

Additionally, if we look just from a terminologica point of view, is it really a type? from the suggested target datasets and their data, I see many things that could be topics, genres, maybe "type" would not be what I had in mind.. (yet others could be)

I'd be in favor of using another property or, simply, dc:type (not dct).
@makxdekkers made a point on the existence of existing profiles using dct:type, but at least it was not in the specs of the original DCAT and thus the AP maintainers could in theory continue to use their property and their semantics (possibly redundantly with the new one here if adopting the new DCAT in old data) or, if updating the AP specs to the new DCAT, change the property.

P.S I disagree on the local standards and AP with embedded semantics. This may and will always happen, but at least in principle they should try to be understandable as widely as the World Wide Web.
APs are useful to put together different ontologies, discard some unused properties and tell users which ones to use, concretely suggesting potential target datasets for some properties (as in this case) etc... they should not redefine semantics.
I prefer a world in which if I pull from anywhere a triple such as:
:Mario foaf:knows :Luigi
I'm totally sure that :Luigi is a foaf:Person and not the rabbit pet of :Mario
In my experience, all soft spots in available standards, all those "do it as you wish" create more issues than solutions. But better I stop this "P.S.", I'm on the boundary of being OT ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.