# Task 1: How to use the datapicker in SoRa
- The **datapicker** is a component of the Geolinking Service SoRa infrastructure, and provides metadata about available datasets and linking methods.
- The user can use it to search for datasets and available and compatible linking methods
### Two options how to use the datapicker
- Via a web-interface: https://sora.gesis.org/unofficial/datapicker/
- Or directly by using the R functions of SoRa to analyse the metadata

### Further Knowledge: Infrastructure of Geolinking Service SoRa
- the arrows indicates the dependencies of the components

[Infrastructure graphic of Geolinking Service SoRa](../images/SoRa_Graphic_Diagram_Infrastructure_EN_240906.html)

## Load functions from SoRa R package
This steps are currently required to load all R functions from /R/ directory. In future, the SoRa R package will be installed directly.

In [1]:
# load R functions from SoRa R Package
path <- "/home/jovyan/R/"
sora_functions  <- dir(path)
for (i in sora_functions) {
  source(paste0(path, i))
}

## Check your changed SORA_API_KEY 
- the environment variable from .Renviron file


In [2]:
#check environment variable for SORA_API_KEY
Sys.getenv("SORA_API_KEY")

## Load metadata about available geospatial datasets

In [3]:
## There exist four csv files (`provided`, `admin`, `spatial`, `linking`) which can be loaded in R. The default is the `provided` file,
## which provides the information of the `geospatial datasets`.
## load the data picker with the default and save it in dp:
dp <- sora_datapicker(content = "spatial")

In [4]:
## which columns are in that saved table
names(dp)

## Questions 1: How many geospatial datasets are currently available on spatial resolution "points" (Points of Interest)?
Hints:
- search for correct spelling of column names for **Title** and **Spatial Resolution** in the previous code cell with `names(dp)`

In [5]:
## get the cross table:
dp_overview_1 <- sora_dp_overview(arg_1 = "title",
                                  arg_2 = "spatial_resolution")

In [6]:
## show cross table:
dp_overview_1

title,100m Raster,200m Raster,500m Raster,1000m Raster,5000m Raster,10000m Raster,Cities (> 50k inhabitants),Districts,Municipal association,Municipal level,points,Quarters,Spatial planning regions,States
<chr>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
Agriculture per resident,0,0,0,0,0,0,17,17,17,17,0,0,17,17
Building density in reference area,8,8,8,11,11,11,11,11,11,11,0,0,11,11
Built-up settlement area and transport space per resident,0,0,0,0,0,0,17,17,17,17,0,0,17,17
Density of total roadway network for motorised vehicles in reference area,18,18,18,18,18,18,18,18,18,18,0,18,18,18
Density of track network in reference area,18,18,18,18,18,18,18,18,18,18,0,18,18,18
Density of use of transport infrastructure,0,0,0,0,0,0,14,14,14,14,0,14,14,14
E02RG,18,18,18,18,17,18,16,16,16,16,0,0,16,16
Effective mesh size of forests (modified),0,0,0,0,0,0,0,5,0,0,0,0,5,5
Effective mesh size of open spaces (modified),0,0,0,0,0,0,0,5,0,0,0,0,5,5
Forest and woodland per resident,0,0,0,0,0,0,17,17,17,17,0,0,17,17


In [7]:
## get sum of column "points"
count_datasets_points <- sum(dp_overview_1$points)
count_datasets_points

## Questions 2: How many geospatial datasets are currently available on spatial resolution of municipalities in year 2020?
Hints:
- search for correct spelling of column names for **Time** and **Spatial Resolution** in the previous code cell with `names(dp)`
- check the name of columns for municipalities in the generated table


In [8]:
## get the cross table:
dp_overview_2 <- sora_dp_overview(arg_1 = "time_frame",
                                  arg_2 = "spatial_resolution")

In [9]:
## show cross table:
dp_overview_2

time_frame,100m Raster,200m Raster,500m Raster,1000m Raster,5000m Raster,10000m Raster,Cities (> 50k inhabitants),Districts,Municipal association,Municipal level,points,Quarters,Spatial planning regions,States
<chr>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
2000,21,21,21,21,21,21,39,45,42,42,0,26,44,45
2006,30,30,30,30,30,30,51,52,51,51,0,29,51,52
2008,28,28,28,28,28,28,48,54,51,51,0,28,53,54
2009,31,31,31,31,31,31,52,53,52,52,0,28,52,53
2010,27,27,27,27,27,27,47,48,47,47,0,28,47,48
2011,28,28,28,29,29,29,51,52,51,51,0,28,51,52
2012,34,34,34,37,37,37,56,61,58,58,0,28,60,61
2013,28,28,28,31,30,31,57,57,56,56,0,28,56,57
2014,28,28,28,31,31,31,57,57,56,56,0,28,56,57
2015,34,34,34,34,34,34,61,61,60,60,2,28,60,61


In [10]:
# reduce table to given year
row_2020 <- dp_overview_2[dp_overview_2$time_frame == 2020, ]

In [12]:
# return value for municipalities
row_2020$`Municipal level`

## Questions 3: What is the geospatial dataset ID for indicator "Green per inhabitant" on spatial resolution of Cities (> 50k inhabitants) in 2018?
Hints:
- function "sora_dp_get_id()" returns dataset ID

In [13]:
## select a specific indicator 'value for title'
dp_select_indicator <- sora_dp_get_id(data_dp = dp, indicator = "Green per inhabitant")

In [14]:
## get an overview of all dataset_id for that specific indicator
dp_select_indicator

title,time_frame,spatial_resolution,dataset_id
<chr>,<int>,<chr>,<chr>
Green per inhabitant,2018,Cities (> 50k inhabitants),ioer-monitor-p01mt-2018-g50
Green per inhabitant,2018,Quarters,ioer-monitor-p01mt-2018-stt


## Use online Web-GUI of Datapicker

You can also open the online Web-GUI and explore the datasets and linking methods

https://sora.gesis.org/unofficial/datapicker/