-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve discovery of datacatalogs by registering well-known suffix 'datacatalog' #1290
Comments
Thanks for contributing this proposal, @coret . We have discussed it during the WG call (https://www.w3.org/2021/02/03-dxwgdcat-minutes#t03), and we would like ask you if you can elaborate your use case, to better understand if this requirement falls in scope with DCAT. We checked the issue you point to (netwerk-digitaal-erfgoed/dataset-register#36) and your spec (https://netwerk-digitaal-erfgoed.github.io/requirements-datasets/), but we were not able to find enough information. |
The Dutch Digital Heritage Network (Netwerk Digitaal Erfgoed) is a partnership in the Netherlands that focuses on developing a system of national facilities and services for improving the visibility, usability, and sustainability of digital heritage. The network is open to all institutions and organisations in the digital heritage field. Together we can make the most of our digital heritage and preserve it for future generations. One of the goals is to get a better view of the available datasets in the digital heritage field. With a better understanding datasets can be re-used and links between data(sets) can be made, Linked Open Data is important in the strategy. The "Register"-project stimulates institutions and organisations in the digital heritage field to publish their dataset descriptions (and datacatalogs) online. We formulate requirements (this is where schema.org/Dataset and DCAT Application Profiles play an important role) and educate the organisations and their IT-suppliers. To get the datasetdescriptions (and in the long term build a knowledge graph) we have an API which organisation can use to register their datasetdescriptions. The system contains a validator (SHACL) and crawler to get (and frequently update) the datasetdescriptions (which are stored in a public triple store). This is the re-active side of our crawler. To make our crawler more pro-active in finding datasetsdescriptions, we can have our crawler check the sites of Dutch heritage organisations. But instead of spidering a whole website (like Google does), it would be more efficient if the location of the datacatalog on a website has a fixed URI. This is where the .well-known/datacatalog scheme can help. I can imagine that in the DCAT specification, a paragraph stimulates the use of .well-known/datacatalog as a means to make datacatalogs more discoverable. This would benefit the publishers of datacatalogs and the automated usage of datacatalogs. |
Many thanks, @coret . If I correctly understand, this well-known URI is meant to advertise any data catalogue, irrespective of their thematic content and of the used/supported metadata schema(s). Should this be the case, do you plan to put in place mechanisms (besides harvesting only selected Web sites) to verify (a) if they fit into your domain and (b) if they use a metadata schema you support? /cc @nicholascar , @rob-metalinkage , @aisaac : Could you please give your perspective on this use case in relation to PROF & CONNEG? |
Does this presuppose that a domain can host a maximum of one data catalog? |
@andrea-perego think this is largely orthogonal to connegp which allows resources to self describe alternative views rather than list different collections. A data catalogue view of the website itself would be an option to avoid having to specify a 'well known sub resource. |
@makxdekkers yes, a well-known points (redirects) to one resource (the same other well-knowns on the IANA Well-Known URIs list). But if I'm not mistaken, a dcat:Catalog can contain multiple catalogs. |
@rob-metalinkage where on a website could one find a data catalogue view? Is this the root of a website or can this be any URI? In the latter, well-known is a mechanism to specify a URI which redirects to the resource. well-known/datacatalogs helps machines discover datacatalogs. |
That's correct.
Our crawler we will be "confined" to heritage institutions and will be able to process datasetsdescriptions in DCAT 2 and schema.org/Dataset, the latter will be converted to DCAT so we can more easily query a uniform set of dataset descriptions to get insights. For the well-known/datacatalog registration I think it's wise to be not to limiting in respect to datacatalog vocabularies. I would image that products like Google Dataset Search would also benefit from the easy discovery of datacatalogs. Google Dataset Search is of course not limited to a domain and handles schema.org/Dataset (prefered) and DCAT (limited). |
@coret - yes you could have any resource support connegp - you are correct the "well knownedness" is the issue - connegp would certainly be relevant to allow any well known location (either the site root or a known location - or both) to offer multiple different forms of data catalogue - as opposed to having many alternative well known locations for different forms and needing to poll a range of them to find one a client can use. |
How would a system know that it is encountering a data catalog that includes other data catalogs and then find those catalogs efficiently? |
Project/Milestone modified. Explanation: As DCAT v3 moves through review and hopefully ratification, we want to make sure that open issues and feedback that have yet to be completely addressed are properly recorded and tagged/assigned in github to both clarify their status and to help review and prioritise as a source of improvements and new requirements in future DCAT versions |
RFC 5785 defines a mechanism for reserving 'well-known' URIs on any Web server. By registering the 'datacatalog' suffix and promoting its use, the discovery of datacatalogs can be improved.
Although this proposal is not DCAT specific (eg. schema.org/DataCatalog would also benefit), we do seek support of the DCAT community for this proposal (as well as the schema.org community, therefor a similar issue has been posted at schemaorg/schemaorg#2827).
We have drafted a text which could be included in a specification document (this is highly inspired by https://www.w3.org/TR/void/#well-known):
Broad support for this proposal will help in getting the 'datacatalog' suffix registered. The registration procedure and template from Section 5.1 of RFC 5785 requires a change controller and specification document. Can this community assist in this process?
The text was updated successfully, but these errors were encountered: