Skip to content

Latest commit

 

History

History
170 lines (115 loc) · 8.25 KB

SDWBP FAIR thoughts.asciidoc

File metadata and controls

170 lines (115 loc) · 8.25 KB

Spatial Data on the Web Best Practice: 2021/2022 revision

Thoughts on including mention of FAIR Principles

Peter Parslow, December 2021

Idea: scatter references through the Best Practice, as well as having a section that draws bits together

BP 1 Introduction

Add

“Following these guidelines should result in your data fitting more with the FAIR Principles.”

BP 3 Scope

Add new sub-section

3.x FAIR Principles

The FAIR Principles are described at FAIR Principles - GO FAIR (go-fair.org); they are widely adopted (or at least aimed for) when publishing scientific data including environmental and earth observation data. Although the FAIR principles concentrate on machine readable data, whilst these best practices also cover “data for humans”, there is a lot of overlap between the FAIR Principles and the best practices described in this paper.

Similarly, although not currently expressed in terms of the FAIR Principles, the Data on the Web Best Practices are also designed to make it easier for "data consumers to find, use and link to the data".

There have also been some suggestions for improvement on the FAIR principles, and these are also discussed in this section.

Findable

F2. Data are described with rich metadata is a close match to Best Practice 13

F3. Metadata clearly and explicitly include the identifier of the data they describe is fulfilled by using the standard described in Best Practice 13

F4. (Meta)data are registered or indexed in a searchable resource – the context of these best practices is publication on the web, which is by definition a searchable resource – which is acknowledged in F4! It is also supported by Best Practice 2.

Taken together, following these best practice guidelines covers the F in the FAIR Principles. It should result in it being easier for potential users to find your data.

Accessible

“Once the user finds the required data, she/he/they need to know how can they be accessed, possibly including authentication and authorisation.”

See also DWBP Best Practice 1: Provide Metadata "to help tasks such as data discovery" (although the intended outcome focuses on 'understand the metadata' (!))

See also DWBP Best Practice 17: Provide Bulk Downlad, Best Practice 18: Provide Subsets for Large Datasets

See also DWBP Best Practice 23: Make data available through an API

A2. Metadata are accessible, even when the data are no longer available – this is not covered in this best practice (because once the data is not available on the web, this BP no longer applies?)

Interoperable

This covers (minimally) data formats, for which see Best Practice 4 (DWBP 14), but also commonly used controlled vocabularies and “a good data model”. (Is there a SDWBP/DWBP for this? There is a little bit about vocabs for the geometry, and some of the metadata standards allow reference to vocab & data model)

I3. (Meta)data include qualified references to other (meta)data is similar to Best Practice 3 and Best Practice 10.

Reusable

See DWBP Best Practice 3: Provide structural metadata, which doesn’t require the attibution to be 'richly described' or for there to be many attributes (let alone that they be accurate & relevant); but it does at least say that whatever attribution you do provide should be described.

See DWBP Best Practice 31: Enrich data by generating new data

DWBP Best Practice 4: Provide data license information; again, this doesn’t say that the license should be "clear", but it does say that it should be accesible - "attached to data" (via a link from the metadata or embedded in the metadata).

DWBP Best Practice 5: Provide data provenance information "Provide complete information about the origins of the data and any changes you have made."

See DWBP Best Practice 12: Use machine-readable standardized data formats "Make data available in a machine-readable, standardized data format that is well suited to its intended or potential use." This addresses the "low level", data format, aspect of (community) standards

DWBP Best Practice 15: Reuse vocabularies, preferably standardized ones "Use terms from shared vocabularies, preferably standardized ones, to encode data and metadata." This addresses more of the semantic standardization that exists in some community standards: "Use of vocabularies already in use by others captures and facilitates consensus in communities."

I’m not so sure of the match(es) here

FAIR Challenges/extensions

A – human accessibility

The crawlers that search engine providers use to index the web are trained to look for human readable information. So the more “human readable” your metadata is, the more likely you are to be found in web searches – whether by humans or machines.

When publishing data on the web, take account of web accessibility guidelines. It is currently challenging to make web visualisations of geospatial data accessible to assistive technology. “Maps on the Web” is looking at this?

Q – quality

No matter how easy it is to find, access, and even use your data, it is of little use unless it is of sufficient quality for the user’s task. However, what is “good” for one task is not necessarily “good” for another.

There are a variety of approaches in use to try to match users with data that will be useful to them. These range from telling the user a lot about the quality of the data to telling them what you (& others) have successfully used it for.

DWBP Best Practice 6: Provide data quality information

Often includes DWBP Best Practice 21: Provide data up to date