Thoughts on including mention of FAIR Principles
Peter Parslow, December 2021
Some resources, linked at [SDW Best Practices Update: Add FAIR principles to the document · Issue #1290 · w3c/sdw (github.com)
Idea: scatter references through the Best Practice, as well as having a section that draws bits together
Add
“Following these guidelines should result in your data fitting more with the FAIR Principles.”
Add new sub-section
The FAIR Principles are described at FAIR Principles - GO FAIR (go-fair.org); they are widely adopted (or at least aimed for) when publishing scientific data including environmental and earth observation data. Although the FAIR principles concentrate on machine readable data, whilst these best practices also cover “data for humans”, there is a lot of overlap between the FAIR Principles and the best practices described in this paper.
Similarly, although not currently expressed in terms of the FAIR Principles, the Data on the Web Best Practices are also designed to make it easier for "data consumers to find, use and link to the data".
There have also been some suggestions for improvement on the FAIR principles, and these are also discussed in this section.
F1. (Meta)data are assigned a globally unique and persistent identifier is partially fulfilled by Best Practice 1
F2. Data are described with rich metadata is a close match to Best Practice 13
F3. Metadata clearly and explicitly include the identifier of the data they describe is fulfilled by using the standard described in Best Practice 13
F4. (Meta)data are registered or indexed in a searchable resource – the context of these best practices is publication on the web, which is by definition a searchable resource – which is acknowledged in F4! It is also supported by Best Practice 2.
Taken together, following these best practice guidelines covers the F in the FAIR Principles. It should result in it being easier for potential users to find your data.
“Once the user finds the required data, she/he/they need to know how can they be accessed, possibly including authentication and authorisation.”
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol is satisfied by publishing the data and metadata on the web.
See also DWBP Best Practice 1: Provide Metadata "to help tasks such as data discovery" (although the intended outcome focuses on 'understand the metadata' (!))
See also DWBP Best Practice 17: Provide Bulk Downlad, Best Practice 18: Provide Subsets for Large Datasets
See also DWBP Best Practice 23: Make data available through an API
A2. Metadata are accessible, even when the data are no longer available – this is not covered in this best practice (because once the data is not available on the web, this BP no longer applies?)
This covers (minimally) data formats, for which see Best Practice 4 (DWBP 14), but also commonly used controlled vocabularies and “a good data model”. (Is there a SDWBP/DWBP for this? There is a little bit about vocabs for the geometry, and some of the metadata standards allow reference to vocab & data model)
I3. (Meta)data include qualified references to other (meta)data is similar to Best Practice 3 and Best Practice 10.
See DWBP Best Practice 3: Provide structural metadata, which doesn’t require the attibution to be 'richly described' or for there to be many attributes (let alone that they be accurate & relevant); but it does at least say that whatever attribution you do provide should be described.
See DWBP Best Practice 31: Enrich data by generating new data
DWBP Best Practice 4: Provide data license information; again, this doesn’t say that the license should be "clear", but it does say that it should be accesible - "attached to data" (via a link from the metadata or embedded in the metadata).
DWBP Best Practice 5: Provide data provenance information "Provide complete information about the origins of the data and any changes you have made."
See DWBP Best Practice 12: Use machine-readable standardized data formats "Make data available in a machine-readable, standardized data format that is well suited to its intended or potential use." This addresses the "low level", data format, aspect of (community) standards
DWBP Best Practice 15: Reuse vocabularies, preferably standardized ones "Use terms from shared vocabularies, preferably standardized ones, to encode data and metadata." This addresses more of the semantic standardization that exists in some community standards: "Use of vocabularies already in use by others captures and facilitates consensus in communities."
I’m not so sure of the match(es) here
A – human accessibility
The crawlers that search engine providers use to index the web are trained to look for human readable information. So the more “human readable” your metadata is, the more likely you are to be found in web searches – whether by humans or machines.
When publishing data on the web, take account of web accessibility guidelines. It is currently challenging to make web visualisations of geospatial data accessible to assistive technology. “Maps on the Web” is looking at this?
Q – quality
No matter how easy it is to find, access, and even use your data, it is of little use unless it is of sufficient quality for the user’s task. However, what is “good” for one task is not necessarily “good” for another.
There are a variety of approaches in use to try to match users with data that will be useful to them. These range from telling the user a lot about the quality of the data to telling them what you (& others) have successfully used it for.
DWBP Best Practice 6: Provide data quality information
Often includes DWBP Best Practice 21: Provide data up to date