# DRAFT Analysis Report

## Context

## Objective

- Identify a list of endpoints representing where the data is stored/catalogued, 
- Harvest the endpoints as possible,
- Analyse those data for fair-interoperability work done up till now,
- Report the results & translate into recommendations 

## Analysis methods
- for each of the given endpoints, assessment of the steps needed to find and access the data & other aspects that reduce FAIRness of data:
    - manual assessment of pitfalls regarding data interoperability and re-usability 
    - is data easily findable & accessible?
    - what is the granularity of the available data? 
        - metadata on the dataset
        - dataset as a file
        - datapoints 
    - are semantics clear and unambiguous of the data offered by the endpoint?
    - are data easily integratable with other sources?

These analyses can be consulted in the jupyter notebooks.  

## Results & Discussion

The analyses indicated an overall good level of FAIRness at a basic level. Given the endpoints and documentation, data is generally findable, accessible.  

There is a certain level of standardization present in the way data are made available:
    - at level of the service:  some services were developed following a standard (e.g. SensorThings API, swagger APIs, ERDDAP).
    - at level of the data offered by the service: e.g. use of same column headers across similar kinds of services.
However, there is still room for improvement as semantics are not always clear.  

Furthermore, finding, accessing and using data though the offered services makes the assumption that domain specific knowledge is present on how to handle data. In case one doesn't know how to handle e.g. netCDF files, accessing and using the data becomes more difficult and can only be accomplished after a learning curve.

-------
Data granularity goes to data file level ...
    

though on interoperability & reusability level mistakes are easily made 
- getting the data and using it in analyses requires domain specific knowledge 
- combining data requires lots of additional steps
- semantics of the data is not always clear, 
    - mistakes during analysis more likely
    - & which imposes another threshold to combining data 
-------

## Recommendations

Overall interoperability within services is good at basic level. Data is generally findable, accessible & useable.

- If you want data to be used at wider scale, they it should be more self-descriptive (because then you cannot assume domain knowledge to be present & without being more self descriptive analysis mistakes are very likely to occur)

though there is still room for improvement
to increase interoperability, recommendations at 2 levels:

1. Description of services
    - describe the offered services (~ the endpoints analysed) via LD ---> todo: provide an example!!
    - a quick fix, 
    - to improve finding your way around available services & data, & more quickly determine which service one can use keeping in mind inhouse knowledge (e.g. having someone who can work with json API, S3 buckets, ...)  

2. Common data model
    - develop a data model, with various stakeholders from the community, to better describe the offered data 
      agree on the common entities that are described (instruments, events, observations, ...), and with which properties they're described 
    - tis would allow to more easily integrate data from different sources
    - and make this data available as LOD in the future
        - one can get inspiration from the ARGO & ICOS data models in their respective SPARQL endpoints
        - other common ontologies:
            - prov
            - dct
            - ssn
            - qube --> very suitable for the description of NetCDF files
            - dcat
            - ...
