-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion to add dct:format and dcat:mediaType to dcat:DataService #1381
Comments
a convention to use an OpenAPI service description as the dcat:endpointDescription would address the issue. |
Thanks for the suggestion, @smrgeoinfo. Yes, we do recommend using OpenAPI. However, we need to support DataServices which are described using other description languages as well, among others WSDL20 and wadl from W3C. As stated above, you need to know formats/mediaTypes of a DataService, already when you decide if a DataService is (re)usable. My main point which might not be clear enough in my proposal above, is, inside the catalog, our user may need to:
We need these values stored in the catalog instead of extracting them from external endpointDescriptions (which are based on different description languages) on the fly, each and every time a user needs to display or sort/query on formats/mediaTypes. |
@jimjyang In Flanders we have create a DCAT profile https://data.vlaanderen.be/doc/applicatieprofiel/metadata-dcat capturing Open data, Geospatial data and closed data and standalone services. When discussing data services this aspect has been kept as part of the endpoint description. So outside of the metadata that is collected in the DCAT catalogue. We observed that those technical details are very different depending from the ecosystem the data service has been designed in. Moreover many dataservices offer content-negotation so the actual syntax XML, json, etc is less important than knowing the business setting or the technical standards it follows. E.g. it is more important to know it is a OGC:WFS service than it is a service which returns XML. Personally, I am when exploring a catalog of dataservices not interested in the actual data syntax format that is returned. As technical person I am more interested in what the kind of data that provided, it is easy linkable with other service, are there any shared identifiers in the data, etc. The technical format is just added to amount of work and libraries that have to be supported. Note that even if one has the same syntax their is no guarantee that one can connect the two services. So I rather consider the format for a dataservice as a low value property. Looking into the future, we thought of constraints on the endpointdescription (e.g. SOAP WSDL description, Open API descriptions, ...) so that dataservice builders can use their native ecosystem documentation tools in such a way that generic metadata useful in the DCAT catalog context can be derived. |
Thanks for sharing your experiences, @bertvannuffelen! I am not technical but our catalog users (application developers) report that they need to know the formats/mediaTypes of a DataService in order to consider whether or not to use a DataService. Our suggestion is not at all to let DCAT replace/standardize the (practices of) endpointDescriptions, but to make it possible to provide catalog users with the metadata they need without forcing them to leave the Catalog and read external documentations. |
the challenge I see with the proposal is that I am not sure that it will address your catalog users request. Are they really searching for data services that provide data as "application/atom+xml" but not as "application/xml"? It seems natural to include more technical details from data services into a meta data description, but this has its drawbacks. To illustrate the topic: see for instance the formats in https://data.europa.eu/data/datasets?query=&locale=en&minScoring=0 Although I can understand your request, I am reluctant to include it as a specific properties for data services. I would rather for this case first apply it in an profile and see how it turns out, before adopting it in DCAT at this level. This request is for me connected with the conformsTo discussion. I think that is a more valuable discussion. As the format is actually a consequence of that value. If the dataservice conforms to the Norwegian REST API guidelines I know as developer a lot more than the format. I know that it will handle errors in a certain way, which headers are supported, etc. Example of such guideline are here: https://www.gcloud.belgium.be/rest/ Illustrating the above with the first example:
For me the second dataservice description is more informative than the first. The first is reflecting one detail aspect of the second. As such this is for me the core of my reluctance: with the format property proposal the effort in the catalog metadata is towards a derived information rather than recording the source of the decision. Another source of reluctance is that we should not try to transfer all properties of DCAT1.0 Distribution to DCAT2.0 DataServices. If we do that then the distinction between both entities becomes more blurry. The fact that a DataService does not have a format in the core DCAT and Distribution has, is maybe an good aid to distinguishing them. I would rather avoid lifting all the technical aspects of data services to the core DCAT and leave that to profiles. |
Thanks again for sharing your experiences/solutions, @bertvannuffelen! We do share your concerns, and we are also restrictive regarding introducing (national) extensions. We also share your point with improving natural documentations of the DataServices, which is another discussion (in general, about metadata quality of the resources that are listed in, and referred to from, a catalog).
Not necessarily while searching, but at least when evaluating if a DataService that is to be found in the catalog, is reusable. A catalog (at least ours) supports its users in 1) searching for potentially reusable Datasets/Distributions/DataServices, 2) evaluating whether or not a particular Dataset/Distribution/DataService is reusable and 3) using a Dataset/Distribution/DataService that is evaluated as reusable. As we wrote under the Problem statement, "Which formats/mediaTypes a DataService may support is one of the considerations you need to take when deciding whether a DataService is (re)usable or not."
As we understand it, dct:conformsTo semantically does not have the meaning as dct:format and dcat:mediaType.
Your example and arguments against having dct:format/dcat:mediaType in dcat:DataService, may also apply to dcat:Distribution which do have dct:format and dcat:mediaType in addition to dct:conformsTo. In the context of this discussion, the user need which we suggest to cover in DCAT3, is to provide the user with the same set of metadata (dct:format, dcat:mediaType, dct:conformsTo) for a dcat:DataService as (already doing) for a dcat:Distritution. |
Although I personally would not go for collecting this information for that purpose, I understand your request.
indeed, but maybe it refers to more interesting information to decide if the service is reusable.
yes, I know. I have given this example to illustrate that the solution is not persé by adding new properties to DCAT, but often is in the ability of the data providers to provide meaningful/useful data. And since that is for distributions (files) non-trivial, I expect that it is for API's more difficult. Many services will offer/accept data in many different formats. If we follow your suggestion then we best take a look to all related properties for a Distribution and discuss their semantics for a Data Service:
What would be good definitions for them? Are all of them meaningful? Would they make the distinction between Distribution and Data Service more vague or clearer? |
@bertvannuffelen Sorry for my late reply (because of the summer vacation).
|
Project/Milestone modified. Explanation: As DCAT v3 moves through review and hopefully ratification, we want to make sure that open issues and feedback that have yet to be completely addressed are properly recorded and tagged/assigned in github to both clarify their status and to help review and prioritise as a source of improvements and new requirements in future DCAT versions |
formats and mediaTypes of a DataService
Status: In real use in the Norwegian national Data Catalog. Added as national extensions in DCAT-AP-NO.
Creator: Jim J. Yang
Deliverable(s): DCAT3
Stakeholders
Providers of DataServices/APIs
Consumers of DataServices/APIs
Providers of DataService Catalogs
Consumers of DataService Catalogs
Problem statement
Which formats/mediaTypes a DataService may support is one of the considerations you need to take when deciding whether a DataService is (re)usable or not.
In the 2nd Working Draft of DCAT3, dcat:DataService (still) doesn’t have properties dct:format and dcat:mediaType as in dcat:Distribution. Possible ways to describe formats/mediaTypes of a DataService are thus:
Requirements
Add the following properties to dcat:DataService:
Related use cases
"How to express available formats for a dcat:DataService (Issue #1055)", which at that time (Sep 2019) was already “tagged as a future priority”.
The text was updated successfully, but these errors were encountered: