In [103]:
from kgap_tools import execute_to_df, generate_sparql
import pandas as pd

# Analysis of a Special Collection

#### What is a special collection?

At VLIZ we manually curate **special collections**, these are collections that groups together datasets, publications, projects, people, institutes and/or events according to a certain topic.

##### Example:

The special collection we are going to explore is the *Trophos special collection*. This collection contains everything related to the *Trophos project*.   
This was a 4 year research project between 6 institutes that focussed on obtaining at a better understanding of processes structuring the higher trophic levels in the North Sea. (more info: https://www.vliz.be/projects/trophos/index.php) 

#### General information
First we're going to have a look at general information about the special collection.

To get this information from the graph, we construct a sparql query which you can find under `/queries/spcol_info.sparql`  
With the function `execute_to_df()` this information gets translated into a spreadsheet table.


In [2]:
spcol_info = execute_to_df("spcol_info.sparql")
spcol_info

Unnamed: 0,spcol,type,title,internal_id,keywords,urls
0,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,trophos,35,https://marineregions.org/mrgid/3293,http://dev.marineinfo.org/id/collection/www.af...
1,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,trophos,35,https://marineregions.org/mrgid/3293,http://dev.marineinfo.org/id/collection/www.vl...
2,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,trophos,35,https://marineregions.org/mrgid/3293,http://dev.marineinfo.org/id/collection/www.na...
3,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,trophos,35,https://marineregions.org/mrgid/3293,http://dev.marineinfo.org/id/collection/www.vl...
4,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,trophos,35,https://marineregions.org/mrgid/3293,http://dev.marineinfo.org/id/collection/www.ul...
...,...,...,...,...,...,...
3327,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,Trophos,35,https://marineregions.org/mrgid/5506,https://dx.doi.org/10.1016/s1385-1101(02)00165-x
3328,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,Trophos,35,https://marineregions.org/mrgid/5506,http://dev.marineinfo.org/id/collection/www.in...
3329,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,Trophos,35,https://marineregions.org/mrgid/5506,http://dev.marineinfo.org/id/collection/www.vl...
3330,http://dev.marineinfo.org/id/collection/35,http://purl.org/dc/dcmitype/#Collection,Trophos,35,https://marineregions.org/mrgid/5506,http://dev.marineinfo.org/id/collection/www.we...


In [133]:
keywords = list(spcol_info.keywords.drop_duplicates())
urls = list(spcol_info.urls.drop_duplicates())

print(f"Name: {spcol_info['title'][0]}\n\nType: {spcol_info['type'][0]}\n\nInternal-ID: {spcol_info['internal_id'][0]}\n\nLinks: {urls}\n\nKeywords: {keywords}")

Name: trophos

Type: http://purl.org/dc/dcmitype/#Collection

Internal-ID: 35

Links: ['http://dev.marineinfo.org/id/collection/www.afdelingkust.be', 'http://dev.marineinfo.org/id/collection/www.vliz.be', 'http://dev.marineinfo.org/id/collection/www.natuurpunt.be', 'http://dev.marineinfo.org/id/collection/www.vliz.be/projects/trophos', 'http://dev.marineinfo.org/id/collection/www.ulb.ac.be/assoc/esa/index.htm', 'http://dev.marineinfo.org/id/collection/www.belspo.be/belspo/fedra/proj.asp?l=en&COD=EV/25', 'https://dx.doi.org/10.1016/s0020-7519(03)00253-4', 'https://dx.doi.org/10.1038/sj.hdy.6800496', 'http://dev.marineinfo.org/id/collection/www.ncbi.nlm.nih.gov', 'https://dx.doi.org/10.1016/j.seares.2004.02.004', 'http://dev.marineinfo.org/id/collection/www.belspo.be', 'http://dev.marineinfo.org/id/collection/www.mumm.ac.be', 'https://dx.doi.org/10.1016/s1385-1101(02)00165-x', 'http://dev.marineinfo.org/id/collection/www.inbo.be', 'http://dev.marineinfo.org/id/collection/www.vliz.be/even

#### Resource information 
Next we're going to have a look a what kind of resources are included in the Trophos special collection.

For this, we execute 2 sparql queries and concatenate the results: `/queries/spcol_resource_datasets.sparql` and `/queries/spcol_resource_other.sparql`  
We then check if there are projects, publications, people, insitutes, events and/or datasets in the special collection

In [137]:
other_resources = execute_to_df("spcol_resource_other.sparql")

datasets = execute_to_df("spcol_resource_datasets.sparql")
datasets.rename(columns={"dataset": "resource"}, inplace=True)

spcol_resource = pd.concat([other_resources, datasets])

Unnamed: 0,s,resource
0,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/project/2051
1,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/185
2,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/publication/57828
3,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/publication/71914
4,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/publication/72052
...,...,...
31,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/148
32,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/36
33,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/71
34,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/1848


In [135]:
#Number of unique resources in the special collection
unique_resources = spcol_resource.groupby('resource').nunique().reset_index()
print(f"The special collection contains {unique_resources.shape[0]} unique resources.")

The special collection contains 174 unique resources


In [123]:
#projects
projects = spcol_resource[ spcol_resource.resource.str.contains('project') ]
projects.groupby('resource').nunique()

Unnamed: 0_level_0,s
resource,Unnamed: 1_level_1
http://dev.marineinfo.org/id/project/1074,1
http://dev.marineinfo.org/id/project/1075,1
http://dev.marineinfo.org/id/project/2051,1


In [136]:
#Publications
publications = spcol_resource[ spcol_resource.resource.str.contains('publication') ]
publications.groupby('resource').nunique()

Unnamed: 0_level_0,s
resource,Unnamed: 1_level_1
http://dev.marineinfo.org/id/publication/103102,1
http://dev.marineinfo.org/id/publication/108115,1
http://dev.marineinfo.org/id/publication/108179,1
http://dev.marineinfo.org/id/publication/110144,1
http://dev.marineinfo.org/id/publication/110149,1
...,...
http://dev.marineinfo.org/id/publication/97429,1
http://dev.marineinfo.org/id/publication/97433,1
http://dev.marineinfo.org/id/publication/97552,1
http://dev.marineinfo.org/id/publication/97958,1


In [125]:
#People
people = spcol_resource[ spcol_resource.resource.str.contains('person') ]

Unnamed: 0,s,resource


In [126]:
#Organizations/Institutes
institutes = spcol_resource[ spcol_resource.resource.str.contains('institute') ]

Unnamed: 0,s,resource


In [127]:
#Events
events = spcol_resource[ spcol_resource.resource.str.contains('event') ]

Unnamed: 0,s,resource
1,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/185
9,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/363
24,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/356
26,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/355
35,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/360
43,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/352
49,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/359
60,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/358
69,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/353
70,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/event/362


In [128]:
#Datasets
datasets = spcol_resource[ spcol_resource.resource.str.contains('dataset') ]

Unnamed: 0,s,resource
0,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/55
1,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/209
2,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/70
3,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/72
4,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/62
5,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/1428
6,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/156
7,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/23
8,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/210
9,http://dev.marineinfo.org/id/collection/35,http://dev.marineinfo.org/id/dataset/472
