# Use the Getty ULAN SPARQL endpoint to correct painter's data
- testrun @ 2023-06-16 and added code on 2023-07-03
- SPARQL (“SPARQL Protocol And RDF Query Language”) is a W3C standard for querying RDF and can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware
- SPARQLWrapper is a simple Python wrapper around a SPARQL service for remote query execution. Not only does it enable us to write more complex queries to extract information from RDF than those exposed through a library like rdflib, it can also convert query results into other formats like JSON and CSV!

## Literature
- https://rebeccabilbro.github.io/sparql-from-python/
- https://groups.google.com/g/gettyvocablod/c/mSnqx3rd8lM/m/LKPstWJyAwAJ
- https://sparqlwrapper.readthedocs.io/en/stable/main.html
- https://github.com/RDFLib/sparqlwrapper/blob/master/scripts/example.py


## CSV example
```sparql.setReturnFormat(CSV)
results = sparql.query().convert()
print(results)'''```

# Import

## Import libraries

In [3]:
# import jq
import pickle
import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON, CSV

# Tests

## Test | SPARQLWrapper on wikipedia data

In [6]:
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

# Below we SELECT both the hot sauce items & their labels
# in the WHERE clause we specify that we want labels as well as items
sparql.setQuery("""
SELECT ?item ?itemLabel

WHERE {
  ?item wdt:P279 wd:Q522171.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
""")
# sparql.setReturnFormat(CSV)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

results_df = pd.json_normalize(results['results']['bindings'])

results_df.head()

Unnamed: 0,item.type,item.value,itemLabel.xml:lang,itemLabel.type,itemLabel.value
0,uri,http://www.wikidata.org/entity/Q249114,en,literal,salsa
1,uri,http://www.wikidata.org/entity/Q335016,en,literal,Tabasco sauce
2,uri,http://www.wikidata.org/entity/Q360459,en,literal,Adobo
3,uri,http://www.wikidata.org/entity/Q460439,en,literal,Blair's 16 Million Reserve
4,uri,http://www.wikidata.org/entity/Q736782,en,literal,Llajua


## Test | SPARQLWrapper on ULAN endpoint, works
- retrieves uri for all Person, Artist (ULAN facet).

In [11]:
sparql = SPARQLWrapper("http://vocab.getty.edu/sparql")

# Below we SELECT both the hot sauce items & their labels
# in the WHERE clause we specify that we want labels as well as items
sparql.setQuery("""
SELECT * WHERE { ulan:500000002 skos:member ?p . }
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

KeyboardInterrupt: 

In [8]:
results

{'head': {'vars': ['p']},
 'results': {'bindings': [{'p': {'type': 'uri',
     'value': 'http://vocab.getty.edu/ulan/500116327'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500771934'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500557119'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500632715'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500546545'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500108834'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500385137'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500239508'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500127905'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500084293'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500017424'}},
   {'p': {'type': 'uri', 'value': 'http://vocab.getty.edu/ulan/500153340'}},
   {'p': {'type': 'uri

In [12]:
results_df = pd.json_normalize(results['results']['bindings'])
results_df

Unnamed: 0,p.type,p.value
0,uri,http://vocab.getty.edu/ulan/500116327
1,uri,http://vocab.getty.edu/ulan/500771934
2,uri,http://vocab.getty.edu/ulan/500557119
3,uri,http://vocab.getty.edu/ulan/500632715
4,uri,http://vocab.getty.edu/ulan/500546545
...,...,...
276589,uri,http://vocab.getty.edu/ulan/500077242
276590,uri,http://vocab.getty.edu/ulan/500426555
276591,uri,http://vocab.getty.edu/ulan/500548610
276592,uri,http://vocab.getty.edu/ulan/500602361


# Getty ULAN SPARQL queries via SPARQLWrapper, both work

## Getty, full query, returns all subjects within ULAN
- https://www.getty.edu/vow/ULANFullDisplay?find=&role=&nation=&page=1&subjectid=500000002

**Persons, Artists (ULAN facet)** Note: Records under this level represent information for individuals involved in the creation or production of works of fine art or architecture, for example painters, sculptors, printmakers, and architects. Included are individuals whose biographies are well known (e.g., Rembrandt van Rijn (Dutch painter and printmaker, 1606-1669)) as well as anonymous creators with identified oeuvres but whose names are unknown and whose biography is surmised (e.g., Master of Alkmaar (North Netherlandish painter, active ca. 1490-ca. 1510)). Craftsmen, artisans, engineers, and others who create visual works are included here, even if their works are not considered fine art per se. People whose primary life roles were other than "artist" or "architect," but who created or designed art or architecture in a professional or amateur capacity, are included here with a non-preferred relationship to this facet (e.g., Thomas Jefferson (American statesman, architect, and draftsman, 1743-1826)). Performance artists are included here. 

### Initial SPARQL-query

In [34]:
# set sparql endpoint
sparql = SPARQLWrapper("http://vocab.getty.edu/sparql")

# query
sparql.setQuery("""
PREFIX tgn: <http://vocab.getty.edu/tgn/>
PREFIX gvp: <http://vocab.getty.edu/ontology#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX iso: <http://purl.org/iso25964/skos-thes#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
PREFIX aat: <http://vocab.getty.edu/aat/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

select ?p ?name ?birth ?death ?ScopeNote ?related ?rname ?rbirth ?rdeath ?relatedScopeNote
{ ulan:500000002 skos:member ?p .
optional {?p gvp:prefLabelGVP/xl:literalForm ?name;
     	foaf:focus/gvp:biographyPreferred [
       	schema:description ?bio;
       	gvp:estStart ?birth].}
optional { ?p gvp:prefLabelGVP/xl:literalForm ?name;
     	foaf:focus/gvp:biographyPreferred [
       	schema:description ?bio;
       	gvp:estEnd ?death]. }
optional {?p skos:related ?related . 
         	?related skos:scopeNote [dct:language gvp_lang:en; 
rdf:value ?relatedScopeNote]}  
optional {?related gvp:prefLabelGVP/xl:literalForm ?rname;
 	foaf:focus/gvp:biographyPreferred [
       	schema:description ?bio;
       	gvp:estStart ?rbirth].}
optional { ?related gvp:prefLabelGVP/xl:literalForm ?rname;
     	foaf:focus/gvp:biographyPreferred [
       	schema:description ?bio;
       	gvp:estEnd ?rdeath]. }
optional {?p skos:scopeNote [dct:language gvp_lang:en; rdf:value ?ScopeNote]}}
""")

# returns results as a json
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

In [35]:
# traverses the json
results_df = pd.json_normalize(results['results']['bindings'])
results_df

Unnamed: 0,p.type,p.value,name.type,name.value,birth.datatype,birth.type,birth.value,death.datatype,death.type,death.value,...,related.value,rname.type,rname.value,rbirth.datatype,rbirth.type,rbirth.value,rdeath.datatype,rdeath.type,rdeath.value,rname.xml:lang
0,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,
1,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500072887,literal,"Juarez, Agustin",http://www.w3.org/2001/XMLSchema#gYear,literal,1920,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,
2,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500116328,literal,"Bismuth, Pierre",http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,nl
3,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500116329,literal,"Buchanan, Roderick",http://www.w3.org/2001/XMLSchema#gYear,literal,1965,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,nl
4,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500116330,literal,"Löhr, Christiane",http://www.w3.org/2001/XMLSchema#gYear,literal,1965,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,nl
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1655856,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500245534,literal,"Westergaard, Steffen",http://www.w3.org/2001/XMLSchema#gYear,literal,1377,http://www.w3.org/2001/XMLSchema#gYear,literal,2080,
1655857,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500245537,literal,"Troelstrup, Bjarne",http://www.w3.org/2001/XMLSchema#gYear,literal,1377,http://www.w3.org/2001/XMLSchema#gYear,literal,2080,
1655858,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500245540,literal,"Helweg-Larsen, Klavs",http://www.w3.org/2001/XMLSchema#gYear,literal,1377,http://www.w3.org/2001/XMLSchema#gYear,literal,2080,
1655859,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500245542,literal,"Debray, Régis",http://www.w3.org/2001/XMLSchema#gYear,literal,1377,http://www.w3.org/2001/XMLSchema#gYear,literal,2080,


### Load and save file as a pickle

In [36]:
# with open('result_json.pickle', 'wb') as handle:
#     pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL)

# with open('result_json.pickle', 'rb') as handle:
#     b = pickle.load(handle)

with open('results_df_all_ulan.pickle', 'wb') as handle:
    pickle.dump(results_df, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('results_df_all_ulan.pickle', 'rb') as handle:
    df = pickle.load(handle)

In [40]:
df.shape

(1655861, 21)

In [39]:
df.head()

Unnamed: 0,p.type,p.value,name.type,name.value,birth.datatype,birth.type,birth.value,death.datatype,death.type,death.value,...,related.value,rname.type,rname.value,rbirth.datatype,rbirth.type,rbirth.value,rdeath.datatype,rdeath.type,rdeath.value,rname.xml:lang
0,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,
1,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500072887,literal,"Juarez, Agustin",http://www.w3.org/2001/XMLSchema#gYear,literal,1920,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,
2,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500116328,literal,"Bismuth, Pierre",http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,nl
3,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500116329,literal,"Buchanan, Roderick",http://www.w3.org/2001/XMLSchema#gYear,literal,1965,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,nl
4,uri,http://vocab.getty.edu/ulan/500116327,literal,A1-53167,http://www.w3.org/2001/XMLSchema#gYear,literal,1964,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,...,http://vocab.getty.edu/ulan/500116330,literal,"Löhr, Christiane",http://www.w3.org/2001/XMLSchema#gYear,literal,1965,http://www.w3.org/2001/XMLSchema#gYear,literal,2090,nl


## ULAN on roman active painters between 1400 and 1800
**Rome (inhabited place)** 
Note: City positioned on 7 hills over the swampy Tiber river area; one of the oldest continuously occupied sites in Europe. Archaeological evidence attests to human occupation of the area from ca. 14,000 years ago, but the dense layer of later debris obscures Palaeolithic and Neolithic sites. Was an Etruscan city by 8th cen. BCE, their kings expelled and republic established by 500 BCE; soon ruled vast area and was center of Empire from 31 BCE; declined when capital moved to Constantinople in 330 CE; revived under popes.

### Initial SPARQL-query

In [41]:
# set sparql endpoint
sparql = SPARQLWrapper("http://vocab.getty.edu/sparql")

# query
sparql.setQuery("""
PREFIX tgn: <http://vocab.getty.edu/tgn/>
PREFIX gvp: <http://vocab.getty.edu/ontology#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX iso: <http://purl.org/iso25964/skos-thes#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
PREFIX aat: <http://vocab.getty.edu/aat/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>


SELECT ?id ?name ?bio ?birth ?death {
{SELECT DISTINCT ?id
         {?id foaf:focus/bio:event/(schema:location|(schema:location/gvp:broaderExtended)) tgn:7000874-place}}
OPTIONAL { ?id gvp:prefLabelGVP/xl:literalForm ?name;
          foaf:focus/gvp:biographyPreferred [
          schema:description ?bio;
          gvp:estStart ?birth] . }
OPTIONAL { ?id gvp:prefLabelGVP/xl:literalForm ?name;
          foaf:focus/gvp:biographyPreferred [
		  schema:description ?bio;
          gvp:estEnd ?death] . }
FILTER ("1400"^^xsd:gYear <= ?birth && ?birth <= "1800"^^xsd:gYear)}

""")

# returns results as a json
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

# traverses the json
results_df = pd.json_normalize(results['results']['bindings'])
results_df



### Load and save file as a pickle

In [42]:
# dumps df as a pickle file
with open('results_df_roman.pickle', 'wb') as handle:
    pickle.dump(results_df, handle, protocol=pickle.HIGHEST_PROTOCOL)
    
# reads pickle file
with open('results_df_roman.pickle', 'rb') as handle:
    df = pickle.load(handle)

In [43]:
df.head()

Unnamed: 0,id.type,id.value,name.type,name.value,bio.type,bio.value,birth.datatype,birth.type,birth.value,death.datatype,death.type,death.value,name.xml:lang
0,uri,http://vocab.getty.edu/ulan/500000009,literal,"Morelli, Francesco",literal,"Italian painter, active ca. 1581-1584",http://www.w3.org/2001/XMLSchema#gYear,literal,1540,http://www.w3.org/2001/XMLSchema#gYear,literal,1630,
1,uri,http://vocab.getty.edu/ulan/500000021,literal,"Céspedes, Pablo de",literal,"Spanish painter and writer, before 1548-1608",http://www.w3.org/2001/XMLSchema#gYear,literal,1538,http://www.w3.org/2001/XMLSchema#gYear,literal,1608,nl
2,uri,http://vocab.getty.edu/ulan/500000030,literal,Perino del Vaga,literal,"Italian painter, decorative artist, and drafts...",http://www.w3.org/2001/XMLSchema#gYear,literal,1495,http://www.w3.org/2001/XMLSchema#gYear,literal,1547,nl
3,uri,http://vocab.getty.edu/ulan/500000040,literal,"Terwesten, Augustinus",literal,"Dutch painter, etcher, and draftsman, 1649-1711",http://www.w3.org/2001/XMLSchema#gYear,literal,1649,http://www.w3.org/2001/XMLSchema#gYear,literal,1711,
4,uri,http://vocab.getty.edu/ulan/500000055,literal,"Breenbergh, Bartholomeus",literal,"Dutch painter, printmaker, 1598-1657",http://www.w3.org/2001/XMLSchema#gYear,literal,1598,http://www.w3.org/2001/XMLSchema#gYear,literal,1657,nl


### Subselects data 

In [25]:
# subset cols
df = df[['related.value', 'rname.value', 'rbirth.value', 'rdeath.value']]

# change datatype
df['rbirth.value'] = df['rbirth.value'].astype(float)
df['rdeath.value'] = df['rdeath.value'].astype(float)

# get rid of floats, checked can be deleted
df = df[(df['rdeath.value'].notnull()) &
        (df['rbirth.value'].notnull())]

# subselection on active painters
df = df[(df['rdeath.value'].astype(int) < 1775) &
        (df['rbirth.value'].astype(int) > 1400)]

# split data and names
df[['last_name','first_name','addition','comment']] = df['rname.value'].str.split(', ', expand=True)

In [26]:
df

Unnamed: 0,related.value,rname.type,rname.value,rbirth.value,rdeath.value,last_name,first_name,addition,comment
5,http://vocab.getty.edu/ulan/500089548,literal,"Nicolai, Elias",1590.0,1670.0,Nicolai,Elias,,
32,http://vocab.getty.edu/ulan/500066452,literal,"Guascone, Nicolò",1516.0,1596.0,Guascone,Nicolò,,
62,http://vocab.getty.edu/ulan/500022906,literal,"Bagnato, Johann Caspar",1696.0,1757.0,Bagnato,Johann Caspar,,
67,http://vocab.getty.edu/ulan/500024071,literal,Cristoforo da Seregno,1418.0,1510.0,Cristoforo da Seregno,,,
91,http://vocab.getty.edu/ulan/500057086,literal,"Freman, G.",1630.0,1760.0,Freman,G.,,
...,...,...,...,...,...,...,...,...,...
154148,http://vocab.getty.edu/ulan/500372740,literal,"Giunti, Filippo",1450.0,1517.0,Giunti,Filippo,,
154149,http://vocab.getty.edu/ulan/500372741,literal,Franceso d'Angelo Cecca,1446.0,1488.0,Franceso d'Angelo Cecca,,,
154164,http://vocab.getty.edu/ulan/500448192,literal,"Agard, Joseph-Gabriel",1700.0,1750.0,Agard,Joseph-Gabriel,,
154168,http://vocab.getty.edu/ulan/500360258,literal,"Alleyn, Edward",1566.0,1626.0,Alleyn,Edward,,
