# Listing NSV content

## basic python setup

In [1]:
from pykg2tbl import DefaultSparqlBuilder, KGSource, QueryResult
from pathlib import Path
from pandas import DataFrame


# SPARQL EndPoint to use - wrapped as Knowledge-Graph 'source'
NSV_ENDPOINT: str = "https://vocab.nerc.ac.uk/sparql/sparql"
NSV:KGSource = KGSource.build(NSV_ENDPOINT)

TEMPLATES_FOLDER = str(Path().absolute())
GENERATOR = DefaultSparqlBuilder(templates_folder=TEMPLATES_FOLDER)


def generate_sparql(name: str, **vars) -> str: 
    """ Simply build the sparql by using the named query and applying the vars
    """
    return GENERATOR.build_syntax(name, **vars)


def execute_to_df(name: str, **vars) -> DataFrame:
    """ Builds the sparql and executes, returning the result as a dataframe.
    """
    sparql = generate_sparql(name, **vars)
    result: QueryResult = NSV.query(sparql=sparql)
    return result.to_dataframe()


## finding collections

The included sparql-template `./nsv-list-collections.sparql` allows for listing the collections that are available at the [NSV vocab server](https://vocab.nerc.ac.uk/)
You can call it:
* without any arguments to just get the full list
* optionally with arguments `N` (limit) and `O` (offset) to paginate the result
* optionally with argument `match` to narrow down to collections with a title matching your regex


In [2]:
# full list
execute_to_df("nsv-list-collections.sparql")

Unnamed: 0,Collection,Title,Description
0,A01,International Coastal Atlas Network Coastal Er...,Terms used at all hierarchical levels in the I...
1,A02,Oregon Coastal Atlas Coastal Erosion Thesaurus...,Terms used at all hierarchical levels in the O...
2,A03,Oregon Coastal Atlas Coastal Erosion Thesaurus...,Terms used at all hierarchical levels in the O...
3,A04,MIDA Coastal Erosion Thesaurus,A collection of terms used by the Irish Marine...
4,A05,AtlantOS Essential Variables,Collection of terms used to group key measurem...
...,...,...,...
292,W07,SensorML Identification Section Terms,Terms used in SensorML to identify an observat...
293,W08,SensorML Contact Section Terms,Terms used in SensorML to describe the role an...
294,W09,Sensor Web Enablement Observations and Measure...,Terms used in SWE Observations and Measurement...
295,W10,SensorML Output Section Data Interface Terms,Terms used in SensorML to describe the softwar...


In [3]:
# page 3 of size 20
size = 20
count = 3
execute_to_df("nsv-list-collections.sparql", N=size, O=(count -1)*size)

Unnamed: 0,Collection,Title,Description
0,C41,BODC marine pollution sources,Terms developed by BODC to provide a standard ...
1,C43,BODC oilspill quantity,Terms developed by BODC to classify the magnit...
2,C45,Marine Strategy Framework Directive descriptor...,Concepts specified as descriptors of good envi...
3,C46,Marine Strategy Framework Directive criteria 2...,Concepts specified as criteria to be used to m...
4,C47,Marine Strategy Framework Directive indicators...,Concepts specified as indicators to be used to...
5,C48,MEDIN Data Guidelines,Documents produced by the Marine Environmental...
6,C59,BODC organisation roles within activities and ...,Generic terms used by BODC to describe the rel...
7,C60,NERC DataGrid vocabulary term relationships,Terms used in the NDG project to describe the ...
8,C61,BODC post town names,Labels used by BODC to populate 'city' or 'pos...
9,C62,BODC administrative region names,Labels used by BODC to populate 'county' metad...


In [4]:
#matching for "platform"
term="platform"
execute_to_df("nsv-list-collections.sparql", regex=term)

Unnamed: 0,Collection,Title,Description
0,B76,BODC Platform Models,Terms used to describe designs or versions of ...
1,C17,ICES Platform Codes,Identifiers and metadata for platform instance...
2,EL1,SeaVoX Sampling and Observation Platform Event...,Terms used to identify actions performed on sa...
3,EL2,SeaVoX Sampling and Observation Platform Event...,Terms used to identify processes performed on ...
4,L06,SeaVoX Platform Categories,2-level grouping term hierarchy used for vehic...
5,P19,Global Change Master Directory platforms,Terms used to describe sensor-bearing platform...
6,R22,Argo platform family,List of platform family/category of Argo float...
7,R23,Argo platform type,List of Argo float types. Argo netCDF variable...
8,R24,Argo platform maker,List of Argo float manufacturers. Argo netCDF ...


## listing terms in selected collections 

The included sparql-template `./nsv-listing.sparql` allows for listing the terms in a selected collection at the [NSV vocab server](https://vocab.nerc.ac.uk/)
You can call it:
* with at least the `cc` argument indicating the code of the selected collection you want to list
* optionally with arguments `N` (limit) and `O` (offset) to paginate the result
* optionally with argument `lang` to get term labels in another language (defaults to 'en')

In [5]:
execute_to_df("nsv-listing.sparql", cc="P06")

Unnamed: 0,IDENTIFIER,PreferedLabelTranslated,AlternateLabel,IsDeprecated,ConceptIRI,AlternateLabelTranslated
0,SDN:P06::BQSM,Becquerels per square metre,Bq/m^2,false,http://vocab.nerc.ac.uk/collection/P06/current...,
1,SDN:P06::BSM3,Becquerels second per cubic metre,Bq.s/m^3,false,http://vocab.nerc.ac.uk/collection/P06/current...,
2,SDN:P06::KMP2,Kilograms per square metre,kg/m^2,false,http://vocab.nerc.ac.uk/collection/P06/current...,kg/m^2
3,SDN:P06::UKMC,Kilograms per cubic metre,kg/m^3,false,http://vocab.nerc.ac.uk/collection/P06/current...,kg/m^3
4,SDN:P06::UUUU,Dimensionless,Dmnless,false,http://vocab.nerc.ac.uk/collection/P06/current...,Dmnless
...,...,...,...,...,...,...
391,SDN:P06::UMIN,Minutes,min,false,http://vocab.nerc.ac.uk/collection/P06/current...,
392,SDN:P06::SQCC,Square centimetres per cubic centimetre,cm^2/cm^3,false,http://vocab.nerc.ac.uk/collection/P06/current...,
393,SDN:P06::MGAL,MilliGals,mGal,false,http://vocab.nerc.ac.uk/collection/P06/current...,
394,SDN:P06::UNDX,Number per square metre per day,#/m^2/d,false,http://vocab.nerc.ac.uk/collection/P06/current...,


## Searching for mathcing terms accross collections

The included sparql-template `./nsv-find.sparql` allows for finding a matching concept in a collection from the [NSV vocab server](https://vocab.nerc.ac.uk/)
You can call it:
* with at least the `collections` argument indicating a list of collections you want to search in - if '*' is passed the search goes across 
* optionally with argument `lang` to get term labels in another language (defaults to 'en')
* optionally with argument `regex` to pass the regex that should match the prefLabel of the Concept

In [8]:
term="id.* specified elsewhere"
execute_to_df("nsv-find.sparql", regex=term, collections=['P01'])

Unnamed: 0,ConceptIRI,Identifier,PreferedLabelTranslated
0,http://vocab.nerc.ac.uk/collection/P01/current...,SDN:P01::CAPWID01,Width of biological entity specified elsewhere...
1,http://vocab.nerc.ac.uk/collection/P01/current...,SDN:P01::LIDYXX01,Concentration of lipids per unit dry weight of...
2,http://vocab.nerc.ac.uk/collection/P01/current...,SDN:P01::LIWTXX01,Concentration of lipids per unit wet weight of...
3,http://vocab.nerc.ac.uk/collection/P01/current...,SDN:P01::NCOLIVXX,Count of colonial individuals per colony of bi...
4,http://vocab.nerc.ac.uk/collection/P01/current...,SDN:P01::SNANID01,Identifier (LSID) of biological entity specifi...
5,http://vocab.nerc.ac.uk/collection/P01/current...,SDN:P01::SOVO0004,Count (January) {midwinter count} of biologica...
6,http://vocab.nerc.ac.uk/collection/P01/current...,SDN:P01::WDPIXEL1,Width (expressed as pixels) of biological enti...


## Collect some details on a known list of terms (identified by URI)

The included sparql template `nsv-get-details.sparql` allows for listing some extra information on specific terms from NVS one has the identifying URI for. You can need to call:
* with a non-empty list of `uris` that match the concept identifiers you are looking up
* optionally with argument `lang` to get term labels in another language (defaults to 'en')

In [12]:
uris = ['http://vocab.nerc.ac.uk/collection/L05/current/180/',
        'http://vocab.nerc.ac.uk/collection/L05/current/181/',
        'http://vocab.nerc.ac.uk/collection/L05/current/22/']
execute_to_df("nsv-get-details.sparql", uris=uris)
#print(generate_sparql("nsv-get-details.sparql", uris=uris))

Unnamed: 0,ConceptIRI,Identifier,PreferedLabelTranslated,isDeprecated
0,http://vocab.nerc.ac.uk/collection/L05/current...,SDN:L05::180,underwater cameras,True
1,http://vocab.nerc.ac.uk/collection/L05/current...,SDN:L05::181,nutrient analysers,False
2,http://vocab.nerc.ac.uk/collection/L05/current...,SDN:L05::22,plankton nets,False
