<h1><span style="color:#e8cdda"><u>The Martrioska Project</u></span></h1>

<h2><u>Abstract</u></h2>

<b>Gender inequality</b> is a sadly topical issue, affecting many and diverse sectors. Our research investigates the presence or absence of this phenomenon within the <b>art history field.</b> Through a methodical approach (i.e., from a superficial level to an increasingly specific one), we will try to ascertain whether or not there is a <b>balance in the visibility</b> of men and women art historians.

<h2><u>01. General Overview</u></h2>
<h2><span style="color:#e8cdda">(Giulia)</span></h2>

Let’s start with a simple and preliminary search. We want to understand <b>how many art historians there are in the world</b> divided by gender. We will use <b>Wikidata</b>, an open knowledge base that provides extensive information about the entities it contains, including gender information.

<h3><u>Querying the remote Wikidata SPARQL endpoint</u></h3>

<p>We will use SPARQLWrapper (an extended version of RDFlib), because on the one hand it allows us to query a <b>remote SPARQL endpoint</b> (in this case the Wikidata one) and to get <b>up-to-date</b> result data in <b>JSON format</b>. On the other hand, it does not require us to <b>separate the code</b> for collecting the data from the code for manipulating results.</p>

<p>To do so, we get the <b>URL of the API</b> of the SPARQL endpoint, we prepare the <b>SPARQL query</b> regarding the worldwide number of male and female art historians, we then <b>create the wrapper</b> around the SPARQL API via SPARQLWrapper library, <b>send the query</b> and get the <b>JSON results</b>.</p>

In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON
import ssl
import pprint as pp

ssl._create_default_https_context = ssl._create_unverified_context

wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

total_query = """
SELECT DISTINCT ?gendername (COUNT(?person) AS ?tot)
WHERE
{
       ?person wdt:P31 wd:Q5 ;
              wdt:P21 ?gender ;
              wdt:P106/wdt:P279* wd:Q1792450. 
       ?gender rdfs:label ?gendername . 
       FILTER (lang(?gendername) = 'en')
}
GROUP BY ?gendername
ORDER BY DESC(?tot)
"""

sparql_wd = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(total_query)
sparql_wd.setReturnFormat(JSON)
totalResults = sparql_wd.query().convert()

<h3><u>Manipulating the results</u></h3>

<p> Let's manipulate our results to get a clearer idea of what we found. We create a <b>dictionary</b> containing our numbers per gender to which we'll add the total number obtained through the sum of the gender-divided results.</p>

In [2]:
artHistorians = dict()
totalNumList = list()

for result in totalResults["results"]["bindings"]:
    gender = result["gendername"]["value"]
    total = result["tot"]["value"]
    artHistorians[gender] = total
    totalNumList.append(int(total))
    print("There are " + total +" "+ gender + " art historians.")
    
totalNum = sum(totalNumList)
artHistorians['tot'] = str(totalNum)
print("There is a total of " + str(totalNum) +" art historians.")

final_json = dict()
final_json["historians"] = artHistorians
pp.pprint(artHistorians)

There are 10538 male art historians.
There are 4818 female art historians.
There are 2 non-binary art historians.
There is a total of 15358 art historians.
{'female': '4818', 'male': '10538', 'non-binary': '2', 'tot': '15358'}


We need the results also in percentage:

In [3]:
percList = {'female': [], 'male': [], 'non-binary': []}

for key in artHistorians.keys():
    if key != 'tot':
        num = int(artHistorians.get(key))
        quot = num*100
        perc = quot / totalNum
        percList[key] = round(perc)

final_json["historians_prop"] = percList
print(percList)


{'female': 31, 'male': 69, 'non-binary': 0}


<h3><u>Visualizing the results (percentage)</u></h3>

To visualize the results we use the <b>bokeh library</b>, and choose a <b>pie chart</b> including the percentage of men, women and (not to forget!) non-binary art historians:

In [4]:
from math import pi

import pandas as pd

from bokeh.io import output_notebook, show
from bokeh.palettes import Category20c
from bokeh.plotting import figure
from bokeh.transform import cumsum

chart_colors = ['#e8cdda', '#c4ddda', '#989898']

output_notebook()
x = {'female': percList["female"], 'male': percList["male"], 'non-binary': percList["non-binary"]}

data = pd.Series(x).reset_index(name='value').rename(columns={'index':'genders'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = chart_colors[:len(x)]

p = figure(plot_height=350, title="Gender overview in percentages", toolbar_location=None,
           tools="hover", tooltips="@value %", x_range=(-0.5, 1.0))

p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
        line_color="white", color = "color", legend_field='genders', source=data)

p.axis.axis_label= None
p.axis.visible= False
p.grid.grid_line_color = None

show(p)

<h3><u>Visualizing the results (real numbers)</u></h3>

<p> We set a simple but effective <b>bar chart</b> for visualizing the results in real numbers:</p>

In [5]:
genders = ['male','female','non-binary']

totalm = int(artHistorians.get('male'))
totalf = int(artHistorians.get('female'))
totalx = int(artHistorians.get('non-binary'))

color_list = ['#c4ddda','#e8cdda', '#989898']

p = figure(x_range=genders, plot_height=500, title="Gender overview")
p.vbar(x=genders, top=[totalm, totalf, totalx], color=color_list, width=0.6)

p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p)

<h3><u>First section conclusions</u></h3>

<p>Through this first query, we understand that the number of <b>men art historians</b> in the world (or rather, contained in Wikidata) <b>is greater</b> than that of women. Let’s continue our investigation, go a little deeper, see what happens.</p>

<h2><u>02. Geographical distribution</u></h2>
<h2><span style="color:#c4ddda">(Marco)</span></h2>


<p>After looking at the world situation, let us look more closely and focus <b>on the gender distribution in Europe</b>, by country. We want to see if the world trend reproduces even in a more restricted and contained context or if there is a greater balance between the female and male component.</p>

<h3><u>Querying the remote Wikidata SPARQL endpoint</u></h3>

<p>Again, we will use SPARQLWrapper to query the <b>Wikidata endpoint</b> and get the result data in <b>JSON format</b>. 
<p>We get the URL of the API of the SPARQL endpoint, we set the query regarding the number of male and female art historians in Europe per country, we create the wrapper, send the query and get the JSON results.</p>

<h4><u>The female query</u></h4>

For greater clarity and to avoid weighing down the query, we will divide our research into two parts, starting with the number of <b>female art historians in Europe</b>.

In [6]:
female_query = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT DISTINCT ?countryCode ?countryLabel (COUNT(?person) AS ?totwomen)
WHERE
{
       ?person wdt:P31 wd:Q5 ;
            wdt:P21 wd:Q6581072 ;
            wdt:P106/wdt:P279* wd:Q1792450;
            wdt:P27 ?country.
       ?country wdt:P463 wd:Q458 .
       ?country wdt:P297 ?countryCode.
       SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }
}
GROUP BY ?countryCode ?countryLabel
ORDER BY DESC(?totwomen)
"""

sparql_wd = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(female_query)
sparql_wd.setReturnFormat(JSON)
femaleResults = sparql_wd.query().convert()

for result in femaleResults["results"]["bindings"]:
    country = result["countryCode"]["value"]
    country_label = result["countryLabel"]["value"]
    totaln = result["totwomen"]["value"]
    print("Female art historians in " + country_label + " (" + country + ")" ": " +totaln)


Female art historians in Germany (DE): 465
Female art historians in France (FR): 199
Female art historians in Spain (ES): 147
Female art historians in Italy (IT): 140
Female art historians in Slovenia (SI): 136
Female art historians in Poland (PL): 121
Female art historians in United Kingdom (GB): 108
Female art historians in Austria (AT): 100
Female art historians in Kingdom of the Netherlands (NL): 73
Female art historians in Czech Republic (CZ): 71
Female art historians in Hungary (HU): 56
Female art historians in Sweden (SE): 54
Female art historians in Denmark (DK): 52
Female art historians in Estonia (EE): 41
Female art historians in Finland (FI): 28
Female art historians in Belgium (BE): 23
Female art historians in Lithuania (LT): 13
Female art historians in Greece (GR): 13
Female art historians in Bulgaria (BG): 12
Female art historians in Romania (RO): 10
Female art historians in Slovakia (SK): 8
Female art historians in Portugal (PT): 7
Female art historians in Croatia (HR): 

<h4><u>The male query</u></h4>

Now let's reproduce the same query for <b>male art historians in european countries</b>. 

In [7]:
male_query = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT DISTINCT ?countryCode ?countryLabel (COUNT(?person) AS ?totmen)
WHERE
{
       ?person wdt:P31 wd:Q5 ;
            wdt:P21 wd:Q6581097 ;
            wdt:P106/wdt:P279* wd:Q1792450;
            wdt:P27 ?country.
       ?country wdt:P463 wd:Q458 .
       ?country wdt:P297 ?countryCode.
       SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }
}
GROUP BY ?countryCode ?countryLabel
ORDER BY DESC(?totmen)
"""


sparql_wd = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(male_query)
sparql_wd.setReturnFormat(JSON)
maleResults = sparql_wd.query().convert()

for result in maleResults["results"]["bindings"]:
    country = result["countryCode"]["value"]
    country_label = result["countryLabel"]["value"]
    totaln = result["totmen"]["value"]
    print("Male art historians in " + country_label + " (" + country + ")" ": " +totaln)


Male art historians in Germany (DE): 1879
Male art historians in France (FR): 768
Male art historians in United Kingdom (GB): 384
Male art historians in Italy (IT): 340
Male art historians in Austria (AT): 339
Male art historians in Poland (PL): 257
Male art historians in Spain (ES): 239
Male art historians in Kingdom of the Netherlands (NL): 192
Male art historians in Hungary (HU): 155
Male art historians in Sweden (SE): 146
Male art historians in Czech Republic (CZ): 126
Male art historians in Belgium (BE): 121
Male art historians in Denmark (DK): 103
Male art historians in Slovenia (SI): 102
Male art historians in Greece (GR): 45
Male art historians in Romania (RO): 44
Male art historians in Finland (FI): 43
Male art historians in Estonia (EE): 37
Male art historians in Bulgaria (BG): 25
Male art historians in Croatia (HR): 19
Male art historians in Lithuania (LT): 17
Male art historians in Latvia (LV): 15
Male art historians in Portugal (PT): 13
Male art historians in Slovakia (SK)

<h3><u>Manipulation of the results</u></h3>

Now we manipulate our results by <b>joining</b> them in a single dictionary that will contain as <b>keys</b> the individual countries and as <b>values</b> the numbers obtained for men and women.

In [8]:
import pprint as pp
countries = dict()
for result in maleResults["results"]["bindings"]:
    country = result["countryLabel"]["value"]
    totaln = result["totmen"]["value"]
    countries[country] = [totaln] #male
for result in femaleResults["results"]["bindings"]:
    country = result["countryLabel"]["value"]
    totaln = result["totwomen"]["value"]
    countries[country].append(totaln) #male

pp.pprint(countries)

{'Austria': ['339', '100'],
 'Belgium': ['121', '23'],
 'Bulgaria': ['25', '12'],
 'Croatia': ['19', '7'],
 'Czech Republic': ['126', '71'],
 'Denmark': ['103', '52'],
 'Estonia': ['37', '41'],
 'Finland': ['43', '28'],
 'France': ['768', '199'],
 'Germany': ['1879', '465'],
 'Greece': ['45', '13'],
 'Hungary': ['155', '56'],
 'Ireland': ['7', '3'],
 'Italy': ['340', '140'],
 'Kingdom of the Netherlands': ['192', '73'],
 'Latvia': ['15', '5'],
 'Lithuania': ['17', '13'],
 'Luxembourg': ['5', '3'],
 'Poland': ['257', '121'],
 'Portugal': ['13', '7'],
 'Romania': ['44', '10'],
 'Slovakia': ['9', '8'],
 'Slovenia': ['102', '136'],
 'Spain': ['239', '147'],
 'Sweden': ['146', '54'],
 'United Kingdom': ['384', '108']}


We <b>enrich</b> our dictionary with information that we will need for visualization purposes. We therefore add the <b>country code</b> and the explicitation of the gender turning the original dictionary into a <b>dictionary of dictionaries</b>.

In [9]:
import json
countries = dict()
for result in maleResults["results"]["bindings"]:
    country = result["countryLabel"]["value"]
    country_code = result["countryCode"]["value"]
    totaln = result["totmen"]["value"]
    countries[country] = {"id": country_code}
    countries[country]["male"] = totaln #male
for result in femaleResults["results"]["bindings"]:
    country = result["countryLabel"]["value"]
    if country not in countries.keys():
        country_code = result["countryCode"]["value"]
        countries[country] = {"id": country_code}
    totaln = result["totwomen"]["value"]
    countries[country]["female"] = totaln #male

pp.pprint(countries)

{'Austria': {'female': '100', 'id': 'AT', 'male': '339'},
 'Belgium': {'female': '23', 'id': 'BE', 'male': '121'},
 'Bulgaria': {'female': '12', 'id': 'BG', 'male': '25'},
 'Croatia': {'female': '7', 'id': 'HR', 'male': '19'},
 'Czech Republic': {'female': '71', 'id': 'CZ', 'male': '126'},
 'Denmark': {'female': '52', 'id': 'DK', 'male': '103'},
 'Estonia': {'female': '41', 'id': 'EE', 'male': '37'},
 'Finland': {'female': '28', 'id': 'FI', 'male': '43'},
 'France': {'female': '199', 'id': 'FR', 'male': '768'},
 'Germany': {'female': '465', 'id': 'DE', 'male': '1879'},
 'Greece': {'female': '13', 'id': 'GR', 'male': '45'},
 'Hungary': {'female': '56', 'id': 'HU', 'male': '155'},
 'Ireland': {'female': '3', 'id': 'IE', 'male': '7'},
 'Italy': {'female': '140', 'id': 'IT', 'male': '340'},
 'Kingdom of the Netherlands': {'female': '73', 'id': 'NL', 'male': '192'},
 'Latvia': {'female': '5', 'id': 'LV', 'male': '15'},
 'Lithuania': {'female': '13', 'id': 'LT', 'male': '17'},
 'Luxembourg':

In [10]:
json_list = list()
for country, data in countries.items():
    d = dict()
    if "Netherlands" in country:
        d["name"] = "Netherlands"
    else:
        d["name"] = country
    d["id"] = data["id"]
    if "male" in data.keys():
        d["male"] = data["male"]
    if "female" in data.keys():
        d["female"] = data["female"]
    json_list.append(d)

final_json["geo_data"] = json_list
pp.pprint(json_list)

[{'female': '465', 'id': 'DE', 'male': '1879', 'name': 'Germany'},
 {'female': '199', 'id': 'FR', 'male': '768', 'name': 'France'},
 {'female': '108', 'id': 'GB', 'male': '384', 'name': 'United Kingdom'},
 {'female': '140', 'id': 'IT', 'male': '340', 'name': 'Italy'},
 {'female': '100', 'id': 'AT', 'male': '339', 'name': 'Austria'},
 {'female': '121', 'id': 'PL', 'male': '257', 'name': 'Poland'},
 {'female': '147', 'id': 'ES', 'male': '239', 'name': 'Spain'},
 {'female': '73', 'id': 'NL', 'male': '192', 'name': 'Netherlands'},
 {'female': '56', 'id': 'HU', 'male': '155', 'name': 'Hungary'},
 {'female': '54', 'id': 'SE', 'male': '146', 'name': 'Sweden'},
 {'female': '71', 'id': 'CZ', 'male': '126', 'name': 'Czech Republic'},
 {'female': '23', 'id': 'BE', 'male': '121', 'name': 'Belgium'},
 {'female': '52', 'id': 'DK', 'male': '103', 'name': 'Denmark'},
 {'female': '136', 'id': 'SI', 'male': '102', 'name': 'Slovenia'},
 {'female': '13', 'id': 'GR', 'male': '45', 'name': 'Greece'},
 {'fem

In [11]:
my_countries = list(countries.keys())
males = list()
females = list()
for element in list(countries.values()):
    males.append(element['male'])
    females.append(element['female'])
    
males = list(map(int, males)) 
females = list(map(int, females))

print(my_countries)
print(males)
print(females)

['Germany', 'France', 'United Kingdom', 'Italy', 'Austria', 'Poland', 'Spain', 'Kingdom of the Netherlands', 'Hungary', 'Sweden', 'Czech Republic', 'Belgium', 'Denmark', 'Slovenia', 'Greece', 'Romania', 'Finland', 'Estonia', 'Bulgaria', 'Croatia', 'Lithuania', 'Latvia', 'Portugal', 'Slovakia', 'Ireland', 'Luxembourg']
[1879, 768, 384, 340, 339, 257, 239, 192, 155, 146, 126, 121, 103, 102, 45, 44, 43, 37, 25, 19, 17, 15, 13, 9, 7, 5]
[465, 199, 108, 140, 100, 121, 147, 73, 56, 54, 71, 23, 52, 136, 13, 10, 28, 41, 12, 7, 13, 5, 7, 8, 3, 3]


<h3><u>Visualizing the results (real numbers)</u></h3>

<p> We set a <b>stacked bar chart</b> for visualizing our results:</p>

In [12]:
from math import pi

import pandas as pd

from bokeh.io import output_notebook, show
from bokeh.palettes import Category20c
from bokeh.plotting import figure
from bokeh.transform import cumsum

output_notebook()
gender = ["male", "female"]
colors = ["#c4ddda", "#e8cdda"]

data = {'countries' : my_countries,
        'male'   : males,
        'female' : females}

p = figure(x_range=my_countries,  plot_width=1000, plot_height=500, title="Gender divided art historians per country",
           toolbar_location=None, tools="hover")

p.vbar_stack(gender, x='countries', width=0.9, color=colors, source=data,
             legend_label=gender)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.major_label_orientation = "vertical"

show(p)

Also in this case we want the results to be in <b>percentage</b> that we'll <b>round</b>.

In [13]:
tot = list()
for element in list(countries.values()):
    tot.append(int(element['female'])+int(element['male']))
print(tot)

percentages = dict()

for x in range(26):
    quot1 = males[x] *100
    quot2 = females[x] *100
    total = tot[x]
    percentage1 = quot1/total
    percentage2 = quot2/total
    key = my_countries[x]
    percentages[key] =[round(percentage1), round(percentage2)]
    
pp.pprint(percentages)

male_perc = list()
female_perc = list()

for element in list(percentages.values()):
    male_perc.append(element[0])
    female_perc.append(element[1])
    
print(male_perc)
print(female_perc)

[2344, 967, 492, 480, 439, 378, 386, 265, 211, 200, 197, 144, 155, 238, 58, 54, 71, 78, 37, 26, 30, 20, 20, 17, 10, 8]
{'Austria': [77, 23],
 'Belgium': [84, 16],
 'Bulgaria': [68, 32],
 'Croatia': [73, 27],
 'Czech Republic': [64, 36],
 'Denmark': [66, 34],
 'Estonia': [47, 53],
 'Finland': [61, 39],
 'France': [79, 21],
 'Germany': [80, 20],
 'Greece': [78, 22],
 'Hungary': [73, 27],
 'Ireland': [70, 30],
 'Italy': [71, 29],
 'Kingdom of the Netherlands': [72, 28],
 'Latvia': [75, 25],
 'Lithuania': [57, 43],
 'Luxembourg': [62, 38],
 'Poland': [68, 32],
 'Portugal': [65, 35],
 'Romania': [81, 19],
 'Slovakia': [53, 47],
 'Slovenia': [43, 57],
 'Spain': [62, 38],
 'Sweden': [73, 27],
 'United Kingdom': [78, 22]}
[80, 79, 78, 71, 77, 68, 62, 72, 73, 73, 64, 84, 66, 43, 78, 81, 61, 47, 68, 73, 57, 75, 65, 53, 70, 62]
[20, 21, 22, 29, 23, 32, 38, 28, 27, 27, 36, 16, 34, 57, 22, 19, 39, 53, 32, 27, 43, 25, 35, 47, 30, 38]


<h3><u>Visualizing the results (percentage)</u></h3>

<p> We set another <b>stacked bar chart</b> for visualizing our results in percentage:</p>

In [14]:
gender = ["male", "female"]
colors = ["#c4ddda", "#e8cdda"]

data = {'countries' : my_countries,
        'male'   : male_perc,
        'female' : female_perc}

p = figure(x_range=my_countries,  plot_width=1000, plot_height=500, title="Percentages of women and men in art history per country",
           toolbar_location=None)

p.vbar_stack(gender, x='countries', width=0.7, color=colors, source=data,
             legend_label=gender)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.major_label_orientation = "vertical"

show(p)

<h3><u>Second section conclusions</u></h3>

<p>Through this second query, we understand that the number of <b>men art historians</b> in almost all countries <b>is greater</b> than that of women. Let’s continue our investigation, shift the focus a bit, and see what happens.</p>

<h2><u>03. Occupation</u></h2>
<h2><span style="color:#e8cdda">(Nooshin, Giulia<span style="color:#c4ddda">, Marco</span>)</span></h2>

<p>We have hitherto thought geographically and observed the worldwide and European distribution of art historians. Now let us reflect on another important topic in which we often see a <b>gender inequality</b>, that of <b>occupations</b>.</p> 
<p>We realized by browsing Wikidata that art historians are often <b>multifaceted figures</b>, operating in an extremely <b>interdisciplinary field</b>. We will therefore try to see first which other occupations both females and males tend to have and whether they <b>match or differ</b>. Based on the results we get, we’ll proceed with the investigation.</p>

<h3><u>Querying the remote Wikidata SPARQL endpoint</u></h3>

<p>Again, we will use SPARQLWrapper to query the <b>Wikidata endpoint</b> and get the result data in <b>JSON format</b>. 
<p>We get the URL of the API of the SPARQL endpoint, we set the query regarding male and female art historians other occupations, we create the wrapper, send the query and get the JSON results.</p>

<h4><u>The male query</u></h4>

For greater clarity and to avoid weighing down the query, we will divide our research into two parts, starting with the ten most common <b>male art historians occupations</b>.

In [15]:
from collections import OrderedDict

male_occupation_query = """
SELECT DISTINCT ?occupationLabel (COUNT(?man) AS ?totman)
WHERE
{
       ?man wdt:P31 wd:Q5 ;
              wdt:P21 wd:Q6581097 ;
              wdt:P106/wdt:P279* wd:Q1792450;
              wdt:P106 ?occupation ;
              wdt:P27 ?country.
       ?country wdt:P463 wd:Q458 .
       SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }
}
GROUP BY ?occupationLabel
ORDER BY DESC(?totman)
LIMIT 11
"""

sparql_wd.setQuery(male_occupation_query)
sparql_wd.setReturnFormat(JSON)
occupationResults = sparql_wd.query().convert()

artOccupations = {'male': {},'female': {}}

for result in occupationResults["results"]["bindings"]:
    occupation = result["occupationLabel"]["value"]
    total = result["totman"]["value"]
    artOccupations["male"][occupation] = int(total)


<h4><u>The female query</u></h4>

We do de same with the most common <b>female art historians occupations</b>:

In [16]:
female_occupation_query = """
SELECT DISTINCT ?occupationLabel (COUNT(?woman) AS ?totwoman)
WHERE
{
       ?woman wdt:P31 wd:Q5 ;
              wdt:P21 wd:Q6581072 ;
              wdt:P106/wdt:P279* wd:Q1792450;
              wdt:P106 ?occupation ;
              wdt:P27 ?country.
       ?country wdt:P463 wd:Q458 .
       SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }
}
GROUP BY ?occupationLabel
ORDER BY DESC(?totwoman)
LIMIT 11
"""

sparql_wd.setQuery(female_occupation_query)
sparql_wd.setReturnFormat(JSON)
occupationResults = sparql_wd.query().convert()


<h3><u>Manipulating the results</u></h3>

We manipulate the results and unify them into a single dictionary of dictionaries (<i>artOccupations</i>):

In [17]:
for result in occupationResults["results"]["bindings"]:
    occupation = result["occupationLabel"]["value"]
    total = result["totwoman"]["value"]
    artOccupations["female"][occupation] = int(total)


if 'art historian' in artOccupations["female"]:
    del artOccupations["female"]['art historian']
if 'art historian' in artOccupations["male"]:
    del artOccupations["male"]['art historian']

pp.pprint(artOccupations)

{'female': {'archaeologist': 100,
            'architectural historian': 54,
            'art critic': 75,
            'curator': 116,
            'exhibition curator': 127,
            'historian': 92,
            'journalist': 41,
            'translator': 44,
            'university teacher': 246,
            'writer': 188},
 'male': {'archaeologist': 757,
          'architect': 348,
          'architectural historian': 465,
          'art critic': 278,
          'curator': 256,
          'exhibition curator': 208,
          'historian': 399,
          'painter': 281,
          'university teacher': 1357,
          'writer': 652}}


<p>We sort the dictionary in descending order, so as to immediately view the most common occupations for both males and females:</p>

In [18]:
f_occupation_list = sorted(artOccupations["female"].items(), key=lambda x:x[1],reverse=True)
m_occupation_list = sorted(artOccupations["male"].items(), key=lambda x:x[1],reverse=True)

print("""
Most common occupations for female art historians:
""")
f_occ_list = list()
f_count_list = list()

for occValue in f_occupation_list:
    occ = occValue[0]
    value = str(occValue[1])
    f_occ_list.append(occ)
    f_count_list.append(value)
    print(occ+"("+value+")")
    
f_occ_list.reverse()
f_count_list.reverse()

print(f_occ_list)
print(f_count_list)


print("""
Most common occupations for male art historians:
""")
m_occ_list = list()
m_count_list = list()
for occValue in m_occupation_list:
    occ = occValue[0]
    value = str(occValue[1])
    m_occ_list.append(occ)
    m_count_list.append(value)
    print(occ+"("+value+")")
    
m_occ_list.reverse()
m_count_list.reverse() 

print(m_occ_list)
print(m_count_list)


Most common occupations for female art historians:

university teacher(246)
writer(188)
exhibition curator(127)
curator(116)
archaeologist(100)
historian(92)
art critic(75)
architectural historian(54)
translator(44)
journalist(41)
['journalist', 'translator', 'architectural historian', 'art critic', 'historian', 'archaeologist', 'curator', 'exhibition curator', 'writer', 'university teacher']
['41', '44', '54', '75', '92', '100', '116', '127', '188', '246']

Most common occupations for male art historians:

university teacher(1357)
archaeologist(757)
writer(652)
architectural historian(465)
historian(399)
architect(348)
painter(281)
art critic(278)
curator(256)
exhibition curator(208)
['exhibition curator', 'curator', 'art critic', 'painter', 'architect', 'historian', 'architectural historian', 'writer', 'archaeologist', 'university teacher']
['208', '256', '278', '281', '348', '399', '465', '652', '757', '1357']


<h3><u>Visualizing the results (male)</u></h3>

<p> We set a <b>horizontal bar chart</b> for visualizing the male occupation results:</p>

In [19]:
from bokeh.embed import components
from bokeh.plotting import output_notebook
from bokeh.models import FactorRange

output_notebook()

occupations = m_occ_list
counts = m_count_list

p = figure(y_range=FactorRange(factors=occupations), plot_height=300, title="Most common occupations for men",
            toolbar_location=None, tools="")

p.hbar(y=occupations, right=counts, height=0.7, fill_color="#c4ddda", line_color="#c4ddda")

show(p)

<h3><u>Visualizing the results (female)</u></h3>

<p> And another <b>horizontal bar chart</b> for visualizing the female occupation results:</p>

In [20]:
from bokeh.embed import components
from bokeh.plotting import output_notebook
from bokeh.models import FactorRange

output_notebook()

occupations = f_occ_list
counts = f_count_list

p = figure(y_range=FactorRange(factors=occupations), plot_height=300, title="Most common occupations for women",
            toolbar_location=None, tools="")

p.hbar(y=occupations, right=counts, height=0.7, fill_color="#e8cdda", line_color="#e8cdda")

show(p)

In [21]:
import json

json_out_female = list()
for occ in f_occupation_list:
    json_out_female.append({
        'gender': 'female',
        'occupation': occ[0],
        'number': occ[1],
    })
json_out_male = list()
for occ in m_occupation_list:
    json_out_male.append({
        'gender': 'male',
        'occupation': occ[0],
        'number': occ[1],
    })
    
final_json['occ_data_female'] = json_out_female
final_json['occ_data_male'] = json_out_male

    

<h3><u>First observations</u></h3>

From this first analysis we observe how, for both gender, <b>university teaching</b> and being therefore part of an academic-institutional context, prevails. On the basis of this result, we therefore want to see the visibility of men and women in some <b> representative institutions</b> in european countries. We will then combine these results with those previously obtained for the same countries in the <u>Geographical Distribution(2)</u> section. 
<p>Our goal will be to understand if the proportion between men and women is maintained within organizations or if it changes, increases or decreases further.</p>

<h3><u>Querying the remote Wikidata SPARQL endpoint</u></h3>

<p>We will use SPARQLWrapper to query the <b>Wikidata endpoint</b> and get the result data in <b>JSON format</b>. 
<p>We get the URL of the API of the SPARQL endpoint, we set the query regarding male and female art historians who are members of academies, institutions or organizations. We create the wrapper, send the query and get the JSON results.</p>

In [22]:
membership_query = """
SELECT DISTINCT ?institutionLabel ?countryLabel ?genderLabel (count(?historian) as ?count) WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?historian wdt:P31 wd:Q5;
             wdt:P21 ?gender;
             wdt:P106/wdt:P279* wd:Q1792450;
             wdt:P463 ?institution.
  ?institution wdt:P17 ?country
  
}

group by ?institutionLabel ?countryLabel ?genderLabel order by DESC(?count)
"""


sparql_wd.setQuery(membership_query)
sparql_wd.setReturnFormat(JSON)
membership_query_results = sparql_wd.query().convert()

orgs = dict()

for result in membership_query_results["results"]["bindings"]:
    if result["institutionLabel"]["value"] not in orgs.keys():
        orgs[result["institutionLabel"]["value"]] = dict()
        orgs[result["institutionLabel"]["value"]]["country"] = result["countryLabel"]["value"]
    orgs[result["institutionLabel"]["value"]][result["genderLabel"]["value"]] = result["count"]["value"]
    
print(orgs)

{'German Archaeological Institute': {'country': 'Germany', 'male': '427', 'female': '48'}, 'American Academy of Arts and Sciences': {'country': 'United States of America', 'male': '135', 'female': '18'}, 'Académie des Inscriptions et Belles-Lettres': {'country': 'France', 'male': '93', 'female': '4'}, 'Royal Swedish Academy of Letters, History and Antiquities': {'country': 'Sweden', 'male': '88', 'female': '12'}, 'Bavarian Academy of Sciences and Humanities': {'country': 'Germany', 'male': '84', 'female': '1'}, 'Austrian Archaeological Institute': {'country': 'Austria', 'male': '66', 'female': '4'}, 'Royal Prussian Academy of Sciences': {'country': 'Germany', 'male': '56'}, 'Lincean Academy': {'country': 'Italy', 'male': '54', 'female': '4'}, 'Real Academia de Bellas Artes de San Fernando': {'country': 'Spain', 'male': '53', 'female': '3'}, 'British Academy': {'country': 'United Kingdom', 'male': '51', 'female': '8'}, 'Royal Netherlands Academy of Arts and Sciences': {'country': 'Nethe

<p>Basing ourselves on the results, we select a sample containing the ones that better fit our research:</p>

In [23]:
import json

lst = ['German Archaeological Institute', 'Académie des Inscriptions et Belles-Lettres',
       'Royal Swedish Academy of Letters, History and Antiquities','Austrian Archaeological Institute', 
       'Real Academia de Bellas Artes de San Fernando', 'British Academy',
       'Royal Netherlands Academy of Arts and Sciences', 'Lincean Academy', 
       'Hungarian Academy of Sciences']

    
json_out = list()
for org in lst:
    json_out.append({
        'institution': org,
        'male': orgs[org]['male'],
        'female': orgs[org]['female'],
        'country': orgs[org]['country']
    })
    
final_json['inst_data'] = json_out


<h3><u>Visualizing the results</u></h3>

For clarity purposes, we'll show here a visualization of the academic numbers, considering the previous chart describing countries data (02. Geographical distribution)

In [24]:
males_in_org = list()
females_in_org = list()

for org in lst:
    males_in_org.append(int(orgs[org]['male']))
    females_in_org.append(int(orgs[org]['female']))
    
pp.pprint(males_in_org)   
pp.pprint(females_in_org)    

[427, 93, 88, 66, 53, 51, 48, 54, 31]
[48, 4, 12, 4, 3, 8, 4, 4, 1]


In [25]:
from math import pi

import pandas as pd

from bokeh.io import output_notebook, show
from bokeh.palettes import Category20c
from bokeh.plotting import figure
from bokeh.transform import cumsum

output_notebook()
gender = ["male", "female"]
colors = ["#c4ddda", "#e8cdda"]

data = {'institutions' : lst,
        'male'   : males_in_org,
        'female' : females_in_org}

p = figure(x_range=lst,  plot_width=1000, plot_height=600, title="Gender divided art historians per institutions",
           toolbar_location=None, tools="hover")

p.vbar_stack(gender, x='institutions', width=0.5, color=colors, source=data,
             legend_label=gender)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.major_label_orientation = "vertical"

show(p)

<h2><u>04. Scholarly Works</u></h2>
<h2><span style="color:#e8cdda">(Nooshin)</span></h2>


<p>Research within the institutions/academies has shown that the visibility of women in this context is <b>further reduced</b> compared to that of the host country of the organisation. Now we try to change the checkers, replacing the geographical component with <b>temporal coordinates</b> and translating the visibility into the concreteness of the academic articles produced by both genders.

<h3><u>Querying the remote Wikidata SPARQL endpoint</u></h3>

<p>Again, we will use SPARQLWrapper to query the <b>Wikidata endpoint</b> and get the result data in <b>JSON format</b>. 
<p>We get the URL of the API of the SPARQL endpoint, we set the query for extracting all the scholarly works created by art historians, we create the wrapper, send the query and get the JSON results.</p>

In [26]:
import sys
print(sys.version)

from SPARQLWrapper import SPARQLWrapper, JSON
import ssl

ssl._create_default_https_context = ssl._create_unverified_context


wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"


scholarly_works_query = """
SELECT distinct ?year ?genderLabel (count(?historian) as ?count) WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?historian wdt:P31 wd:Q5;
                    wdt:P21 ?gender;
                    wdt:P106/wdt:P279* wd:Q1792450.
                    
  ?authorial_work wdt:P50 ?historian.
  ?authorial_work wdt:P31/wdt:P279* wd:Q55915575.
  ?authorial_work wdt:P577 ?publication_date.
}

group by (year(xsd:dateTime(?publication_date)) as ?year) ?genderLabel order by (?year)
"""

sparql_wd = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(scholarly_works_query)
sparql_wd.setReturnFormat(JSON)
scholarly_works_results = sparql_wd.query().convert()

for line in scholarly_works_results["results"]["bindings"]:
    print('Number of scholarly works created by', line['genderLabel']['value'],\
          'art historians in year ', line['year']['value'], ':', line['count']['value'])


3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53) 
[Clang 6.0 (clang-600.0.57)]
Number of scholarly works created by male art historians in year  196 : 2
Number of scholarly works created by male art historians in year  198 : 2
Number of scholarly works created by female art historians in year  198 : 2
Number of scholarly works created by male art historians in year  1829 : 2
Number of scholarly works created by male art historians in year  1855 : 2
Number of scholarly works created by male art historians in year  1856 : 2
Number of scholarly works created by male art historians in year  1857 : 2
Number of scholarly works created by male art historians in year  1859 : 2
Number of scholarly works created by male art historians in year  1860 : 4
Number of scholarly works created by male art historians in year  1861 : 2
Number of scholarly works created by male art historians in year  1864 : 6
Number of scholarly works created by male art historians in year  1865 : 4
Number of scholarly wo

<h3><u>Cleaning the data</u></h3>

As we can see, there are two outliers in the data with the presumably incorrect publication year of 198. We first remove this from our data in order to have a clean set.

In [27]:
scholarly_works_results = [i for i in scholarly_works_results["results"]["bindings"] if not i["year"]["value"] == "198"]
for line in scholarly_works_results:
    print('Number of scholarly works created by', line['genderLabel']['value'],\
          'art historians in year ', line['year']['value'], ':', line['count']['value'])


Number of scholarly works created by male art historians in year  196 : 2
Number of scholarly works created by male art historians in year  1829 : 2
Number of scholarly works created by male art historians in year  1855 : 2
Number of scholarly works created by male art historians in year  1856 : 2
Number of scholarly works created by male art historians in year  1857 : 2
Number of scholarly works created by male art historians in year  1859 : 2
Number of scholarly works created by male art historians in year  1860 : 4
Number of scholarly works created by male art historians in year  1861 : 2
Number of scholarly works created by male art historians in year  1864 : 6
Number of scholarly works created by male art historians in year  1865 : 4
Number of scholarly works created by male art historians in year  1866 : 4
Number of scholarly works created by male art historians in year  1867 : 8
Number of scholarly works created by male art historians in year  1868 : 2
Number of scholarly works 

Number of scholarly works created by male art historians in year  2006 : 102
Number of scholarly works created by female art historians in year  2006 : 24
Number of scholarly works created by male art historians in year  2007 : 92
Number of scholarly works created by female art historians in year  2007 : 27
Number of scholarly works created by female art historians in year  2008 : 26
Number of scholarly works created by male art historians in year  2008 : 87
Number of scholarly works created by female art historians in year  2009 : 19
Number of scholarly works created by male art historians in year  2009 : 56
Number of scholarly works created by male art historians in year  2010 : 72
Number of scholarly works created by female art historians in year  2010 : 30
Number of scholarly works created by female art historians in year  2011 : 22
Number of scholarly works created by male art historians in year  2011 : 90
Number of scholarly works created by female art historians in year  2012 : 

<h3><u>Visualizing the results (real numbers)</u></h3>

We show these results in a stacked bar chart, divided in <b>20-year periods</b>.

In [28]:
gender = ["male", "female"]
colors = ["#c4ddda", "#e8cdda"]
    
scores = [(i, i+19) for i in range(1820, 2040, 20)]

current_range_index = 0
score_data = [{'range': i, 'male': 0, 'female': 0} for i in scores]
for line in scholarly_works_results:
    if int(line['year']['value']) > scores[current_range_index][1]:
        current_range_index += 1
    score_data[current_range_index][line['genderLabel']['value']] += int(line['count']['value'])


data = {'20-year-periods' : [str(i) for i in scores],
        'male'   : [i['male'] for i in score_data],
        'female' : [i['female'] for i in score_data]}

p = figure(x_range=data['20-year-periods'],  plot_width=1000, plot_height=500, title="Scholarly work per 20-year period",
           toolbar_location=None, tools="hover")

p.vbar_stack(gender, x='20-year-periods', width=0.9, color=colors, source=data,
             legend_label=gender)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.major_label_orientation = "vertical"

show(p)

In [29]:
json_out = list()
for score in score_data:
    json_out.append({
        'range': str(score['range'][0]) + ' - ' + str(score['range'][1]),
        'male': score['male'],
        'female': score['female']
    })
    
final_json['scholarly_normal'] = json_out

<h3><u>Visualizing the results (percentage)</u></h3>

For a more understandable view of the change in the proportion of male to female we also create the same plot with percentage values.

In [30]:
import numpy as np 

true_f = [i['female'] for i in score_data]
true_m = [i['male'] for i in score_data]
true_len = [f+m for (f, m) in zip(true_f, true_m)]

female_ratio = [np.true_divide(i['female'], l)*100 for (i, l) in zip(score_data, true_len)]
male_ratio = [np.true_divide(i['male'], l)*100 for (i, l) in zip(score_data, true_len)]

data = {'20-year-periods' : [str(i) for i in scores],
        'male'   : male_ratio,
        'female' : female_ratio}

p = figure(x_range=data['20-year-periods'],  plot_width=1000, plot_height=500, title="Scholarly work per 20-year period",
           toolbar_location=None, tools="hover")

p.vbar_stack(gender, x='20-year-periods', width=0.9, color=colors, source=data,
             legend_label=gender)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.major_label_orientation = "vertical"

show(p)

In [31]:
json_out = list()
for score, male, female in zip(scores, male_ratio, female_ratio):
    json_out.append({
        'range': str(score[0]) + ' - ' + str(score[1]),
        'male': str(round(male, 2)),
        'female': str(round(female, 2))
    })
final_json['scholarly_prop'] = json_out

Now we would like to compare this data with data from the art historians' birth years (as an adequate representative of the active art historians of each period). We first <b>extract the birth years of historians</b>. Since our previous dataset includes data from only the 1820s forward, we filter those art historians born before 1790.

In [32]:
ssl._create_default_https_context = ssl._create_unverified_context

wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

birthdate_query = """
SELECT distinct ?year ?genderLabel (count(?historian) as ?count) WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?historian wdt:P31 wd:Q5;
                    wdt:P21 ?gender;
                    wdt:P106/wdt:P279* wd:Q1792450;
             wdt:P569 ?date. FILTER (?date > "1790-01-01"^^xsd:dateTime)
}

group by (year(xsd:dateTime(?date)) as ?year) ?genderLabel order by (?year)
"""

sparql_wd = SPARQLWrapper(wikidata_endpoint,agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(birthdate_query)
sparql_wd.setReturnFormat(JSON)
birthdate_results = sparql_wd.query().convert()

for line in birthdate_results["results"]["bindings"]:
    print('Number of', line['genderLabel']['value'], 'art historians born in',\
          line['year']['value'], ':', line['count']['value'])

Number of male art historians born in 1790 : 2
Number of male art historians born in 1791 : 4
Number of male art historians born in 1792 : 4
Number of male art historians born in 1793 : 5
Number of female art historians born in 1794 : 3
Number of male art historians born in 1794 : 6
Number of male art historians born in 1795 : 5
Number of male art historians born in 1796 : 6
Number of female art historians born in 1797 : 1
Number of male art historians born in 1797 : 9
Number of male art historians born in 1798 : 7
Number of male art historians born in 1799 : 4
Number of male art historians born in 1800 : 9
Number of male art historians born in 1801 : 9
Number of female art historians born in 1801 : 1
Number of male art historians born in 1802 : 8
Number of male art historians born in 1803 : 6
Number of male art historians born in 1804 : 13
Number of male art historians born in 1805 : 14
Number of female art historians born in 1805 : 1
Number of male art historians born in 1806 : 14
Nu

Number of female art historians born in 1961 : 82
Number of male art historians born in 1962 : 99
Number of female art historians born in 1962 : 63
Number of male art historians born in 1963 : 82
Number of female art historians born in 1963 : 79
Number of male art historians born in 1964 : 99
Number of female art historians born in 1964 : 60
Number of male art historians born in 1965 : 106
Number of female art historians born in 1965 : 62
Number of male art historians born in 1966 : 88
Number of female art historians born in 1966 : 50
Number of male art historians born in 1967 : 74
Number of female art historians born in 1967 : 72
Number of male art historians born in 1968 : 58
Number of female art historians born in 1968 : 59
Number of male art historians born in 1969 : 47
Number of female art historians born in 1969 : 64
Number of male art historians born in 1970 : 64
Number of female art historians born in 1970 : 60
Number of female art historians born in 1971 : 61
Number of male ar

<h3><u>Visualizing the results</u></h3>

We will draw a <b>line chart</b> of this data, shifted by 30 years in order to show the approximate date of the art historian's addition to the scholarly sphere.

In [33]:
graph = figure(title = "Art Historians by Date", y_range=[0, 140], plot_width=800, plot_height=500, toolbar_location=None, tools="hover")  
     
female_births = [int(line['year']['value'])+30 for line in birthdate_results["results"]["bindings"] if line['genderLabel']['value'] == 'female']
female_nums = [line['count']['value'] for line in birthdate_results["results"]["bindings"] if line['genderLabel']['value'] == 'female']
male_births = [int(line['year']['value'])+30 for line in birthdate_results["results"]["bindings"] if line['genderLabel']['value'] == 'male']
male_nums = [line['count']['value'] for line in birthdate_results["results"]["bindings"] if line['genderLabel']['value'] == 'male']

graph.line(female_births, female_nums, color="#e8cdda", legend_label='female')  
graph.line(male_births, male_nums, color="#c4ddda", legend_label='male')  
graph.legend.location = "top_left"
graph.legend.orientation = "horizontal"
graph.xaxis.major_label_orientation = "vertical"
     
show(graph) 

<h3><u>Querying the remote Wikidata SPARQL endpoint</u></h3>

Even though this information is useful, we decided to create <b>a plot</b> that can be more easily compared to our previous data. So we started by making another query, taking into account the <b>entire lifespan</b> of the art historians.

In [34]:
import ssl
from SPARQLWrapper import SPARQLWrapper, JSON
ssl._create_default_https_context = ssl._create_unverified_context

wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

lifespan_query = """
SELECT DISTINCT * WHERE {

{ SELECT distinct (year(xsd:dateTime(?dateOfBirth)) as ?yearOfBirth) (year(xsd:dateTime(?dateOfDeath)) as ?yearOfDeath) ?genderLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?historian wdt:P31 wd:Q5;
             wdt:P21 ?gender;
             wdt:P106/wdt:P279* wd:Q1792450;
             wdt:P569 ?dateOfBirth;
             wdt:P570 ?dateOfDeath. FILTER (?dateOfDeath > "1825-01-01"^^xsd:dateTime)
} } UNION 

{ SELECT distinct (year(xsd:dateTime(?dateOfBirth)) as ?yearOfBirth) ?genderLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?historian wdt:P31 wd:Q5;
                    wdt:P21 ?gender;
                    wdt:P106/wdt:P279* wd:Q1792450;
             wdt:P569 ?dateOfBirth. FILTER (?dateOfBirth > "1930-01-01"^^xsd:dateTime)
} }
}
"""

sparql_wd = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(lifespan_query)
sparql_wd.setReturnFormat(JSON)
lifespan_results = sparql_wd.query().convert()

scores = [(i, i+19) for i in range(1820, 2040, 20)]
score_data = [{'range': i, 'male': 0, 'female': 0, 'non-binary': 0} for i in scores]

for idx, span in enumerate(scores):
    for historian in lifespan_results["results"]["bindings"]:
        if int(historian['yearOfBirth']['value'])+30 <= span[1] and ('yearOfDeath' not in historian or int(historian['yearOfDeath']['value']) > span[0]):
            score_data[idx][historian['genderLabel']['value']] += 1
            
for line in score_data:
    print('Number of male art historians active in the timespan', line['range'], ':', line['male'])
    print('Number of female art historians active in the timespan', line['range'], ':', line['female'])


Number of male art historians active in the timespan (1820, 1839) : 212
Number of female art historians active in the timespan (1820, 1839) : 6
Number of male art historians active in the timespan (1840, 1859) : 353
Number of female art historians active in the timespan (1840, 1859) : 7
Number of male art historians active in the timespan (1860, 1879) : 560
Number of female art historians active in the timespan (1860, 1879) : 14
Number of male art historians active in the timespan (1880, 1899) : 882
Number of female art historians active in the timespan (1880, 1899) : 44
Number of male art historians active in the timespan (1900, 1919) : 1317
Number of female art historians active in the timespan (1900, 1919) : 125
Number of male art historians active in the timespan (1920, 1939) : 1706
Number of female art historians active in the timespan (1920, 1939) : 365
Number of male art historians active in the timespan (1940, 1959) : 1945
Number of female art historians active in the timespan 

In [35]:
json_out = list()
for score in score_data:
    json_out.append({
        'range': str(score['range'][0]) + ' - ' + str(score['range'][1]),
        'male': score['male'],
        'female': score['female']
    })
    
final_json['active_normal'] = json_out

<h3><u>Visualizing the results (real numbers)</u></h3>
<p>Now we draw the same plots as before:</p>

In [36]:
data = {'20-year-periods' : [str(i) for i in scores],
        'male'   : [i['male'] for i in score_data],
        'female' : [i['female'] for i in score_data]}

p = figure(x_range=data['20-year-periods'],  plot_width=800, plot_height=500, title="Active Span of Art Historians",
           toolbar_location=None, tools="hover")

p.vbar_stack(gender, x='20-year-periods', width=0.9, color=colors, source=data,
             legend_label=gender)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.major_label_orientation = "vertical"

show(p)

<h3><u>Visualizing the results (percentages)</u></h3>

In [37]:
true_f = [i['female'] for i in score_data]
true_m = [i['male'] for i in score_data]
true_len = [f+m for (f, m) in zip(true_f, true_m)]

female_ratio = [np.true_divide(i['female'], l)*100 for (i, l) in zip(score_data, true_len)]
male_ratio = [np.true_divide(i['male'], l)*100 for (i, l) in zip(score_data, true_len)]

data = {'20-year-periods' : [str(i) for i in scores],
        'male'   : male_ratio,
        'female' : female_ratio}

p = figure(x_range=data['20-year-periods'],  plot_width=800, plot_height=500, title="Active Span of Art Historians",
           toolbar_location=None, tools="hover")

p.vbar_stack(gender, x='20-year-periods', width=0.9, color=colors, source=data,
             legend_label=gender)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.major_label_orientation = "vertical"

show(p)

In [38]:
json_out = list()
for score, male, female in zip(scores, male_ratio, female_ratio):
    json_out.append({
        'range': str(score[0]) + ' - ' + str(score[1]),
        'male': str(round(male, 2)),
        'female': str(round(female, 2))
    })
final_json['active_prop'] = json_out

<h2><u>05. ARTchives integration</u></h2>
<h2><span style="color:#e8cdda">(Giulia,<span style="color:#c4ddda"> Marco</span>)</span></h2>

<h3><u>Querying the remote ARTchives SPARQL endpoint</u></h3>

<p>We will use SPARQLWrapper to get <b>up-to-date</b> result data in <b>JSON format</b>.

<p>We get the <b>URL of the API</b> of the SPARQL endpoint, we prepare the <b>SPARQL query</b> regarding the classes used and number of their elements inside ARTchives. We then <b>create the wrapper</b> around the SPARQL API via SPARQLWrapper library, <b>send the query</b> and get the <b>JSON results</b>.</p>

In [39]:
import pprint as pp
import csv
from pathlib import Path

from SPARQLWrapper import SPARQLWrapper, JSON
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
from rdflib import Namespace , Literal , URIRef
from rdflib.namespace import RDF , RDFS


artchives_endpoint = "http://artchives.fondazionezeri.unibo.it/sparql"


count_individuals_by_class_query = """
SELECT ?class (COUNT(?individual) AS ?tot)
WHERE { ?individual a ?class .}
GROUP BY ?class ?tot
"""

sparql_wd = SPARQLWrapper(artchives_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(count_individuals_by_class_query)
sparql_wd.setReturnFormat(JSON)
results = sparql_wd.query().convert()

artchives_tot = dict()
for result in results["results"]["bindings"]:
    artc_class = result["class"]["value"]
    total = result["tot"]["value"]
    artchives_tot[artc_class] = total
    
final_json["artchives_stats"] = {
    "collections": artchives_tot['http://www.wikidata.org/entity/Q9388534'], 
    "collectors": artchives_tot['http://www.wikidata.org/entity/Q5'], 
    "keepers": artchives_tot['http://www.wikidata.org/entity/Q31855']}
    
pp.pprint(artchives_tot)   

{'http://www.wikidata.org/entity/Q31855': '7',
 'http://www.wikidata.org/entity/Q5': '26',
 'http://www.wikidata.org/entity/Q9388534': '27'}


We already know that humans inside ARTchives are described by the wikidata class **Q5**. We extract the list of art historians' uris and use them in another query in order to see how many of them are male and female. 

In [40]:
count_individuals_by_class_query = """
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?individual
WHERE { ?individual a wd:Q5 .}
"""

sparql_wd = SPARQLWrapper(artchives_endpoint)
sparql_wd.setQuery(count_individuals_by_class_query)
sparql_wd.setReturnFormat(JSON)
results = sparql_wd.query().convert()

artchives_tot = list()
for result in results["results"]["bindings"]:
    artchives_tot.append('<' + result["individual"]["value"] + '>')

artchives_tot = ' '.join(artchives_tot)
artchives_gender_query = """
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?historian ?genderLabel
WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    VALUES ?historian {"""+artchives_tot+"""} .
    ?historian wdt:P21 ?gender .
    } 
"""

sparql_wd = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(artchives_gender_query)
sparql_wd.setReturnFormat(JSON)
results = sparql_wd.query().convert()

artchives_genders = {'male': 0, 'female': 0}
for result in results["results"]["bindings"]:
    if result['genderLabel']['value'] == 'male':
        artchives_genders['male'] += 1
    else:
        artchives_genders['female'] += 1
        
pp.pprint(artchives_genders)
final_json['artchives_genders'] = artchives_genders


{'female': 1, 'male': 25}


At first glance it would seem that there is only one woman present in the platform. 

We will try to understand if there are any hidden and if so, we will try to give them the visibility they deserve.

We will focus on the property of Artchives "hasSubjectPeople", which allows us to extract information about people being subject of collections.

<h3><u>N.B.</u></h3>

For disambiguation purposes, from now on we will refer to the 26 art historians found before as <b>collectors</b> and to the newfound group of people related to the collections as <b>subjects</b>

In [41]:
artc_query = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX art:<https://w3id.org/artchives/>
SELECT ?person ?collector
WHERE {
?collection art:hasSubjectPeople ?person .
?collector a wd:Q5 .
}
"""

sparql_wd = SPARQLWrapper(artchives_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(artc_query)
sparql_wd.setReturnFormat(JSON)
results = sparql_wd.query().convert()

subjects = list()
collectors = list()

for result in results["results"]["bindings"]:
    person_code = result["person"]["value"]
    collector_code = result["collector"]["value"]
    subjects.append(person_code)
    collectors.append(collector_code)
    
pp.pprint(subjects)
pp.pprint(collectors)


['http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity

 'https://w3id.org/artchives/MD1559568562249',
 'https://w3id.org/artchives/MD1559568562249',
 'https://w3id.org/artchives/MD1559568562249',
 'https://w3id.org/artchives/MD1559568562249',
 'https://w3id.org/artchives/MD1559568562249',
 'https://w3id.org/artchives/MD1559568562249',
 'https://w3id.org/artchives/MD1559568562249',
 'https://w3id.org/artchives/MD1559568562249',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q3057287',
 'http:/

 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564704717',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3id.org/artchives/MD1559564713607',
 'https://w3i

 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580018646',
 'https://w3id.org/artchives/MD1559580026847',
 'https://w3id.org/artchives/MD1559580026847',
 'https://w3id.org/artchives/MD1559580026847',
 'https://w3id.org/artchives/MD1559580026847',
 'https://w3id.org/artchives/MD1559580026847',
 'https://w3id.org/artchives/MD1559580026847',
 'https://w3id.org/artchives/MD1559580026847',
 'https://w3id.org/artchives/MD1559580026847',
 'https://w3i

 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580784999',
 'https://w3id.org/artchives/MD1559580795394',
 'https://w3id.org/artchives/MD1559580795394',
 'https://w3id.org/artchives/MD1559580795394',
 'https://w3id.org/artchives/MD1559580795394',
 'https://w3id.org/artchives/MD1559580795394',
 'https://w3id.org/artchives/MD1559580795394',
 'https://w3id.org/artchives/MD1559580795394',
 'https://w3id.org/artchives/MD1559580795394',
 'https://w3i

 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657359130',
 'https://w3id.org/artchives/MD1559657366763',
 'https://w3i

 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579415039',
 'https://w3id.org/artchives/MD1559579429597',
 'https://w3i

 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3id.org/artchives/MD1559659786916',
 'https://w3i

 'http://www.wikidata.org/entity/Q50914856',
 'http://www.wikidata.org/entity/Q50914856',
 'http://www.wikidata.org/entity/Q50914856',
 'http://www.wikidata.org/entity/Q50914856',
 'http://www.wikidata.org/entity/Q50914856',
 'http://www.wikidata.org/entity/Q50914856',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wikidata.org/entity/Q64607',
 'http://www.wik

 'https://w3id.org/artchives/MD1559658636971',
 'https://w3id.org/artchives/MD1559658636971',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3id.org/artchives/MD1559658646697',
 'https://w3i

 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q442213',
 'http://www.wikidata.org/entity/Q4868805',
 'http://www.wikidata.org/entity/Q4868805',
 'http://www.wikidata.org/entity/Q4868805',
 'http://www.wikidata.org/entity/Q4868805',
 'http://www.wikidata.org/entity/Q4868805',
 'http://www.wikidata.org/entity/Q4868805',
 'http://www.wikidata.org/entity/Q4868805',
 'ht

 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q1712683',
 'http://www.wikidata.org/entity/Q537874',
 'http://www.wikidata.org/entity/Q90407',
 'http://www.wikidata.org/entity/Q1373290',
 'http://www.wikidata.org/entity/Q2824734',
 'http://www.wikidata.org/entity/Q61913691',
 'http://www.wikidata.org/entity/Q1296486',
 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q85761254',
 'http://www.wikidata.org/entity/Q1271052',
 'http://www.wikidata.org/entity/Q6700132',
 'http://www.wikidata.org/entity/Q88907',
 'http://www.wikidata.org/entity/Q60185',
 'http://www.wikidata.org/entity/Q3051533',
 'http://www.wikidata.org/entity/Q4

 'http://www.wikidata.org/entity/Q61913691',
 'http://www.wikidata.org/entity/Q1296486',
 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q85761254',
 'http://www.wikidata.org/entity/Q1271052',
 'http://www.wikidata.org/entity/Q6700132',
 'http://www.wikidata.org/entity/Q88907',
 'http://www.wikidata.org/entity/Q60185',
 'http://www.wikidata.org/entity/Q3051533',
 'http://www.wikidata.org/entity/Q457739',
 'http://www.wikidata.org/entity/Q1641821',
 'http://www.wikidata.org/entity/Q18935222',
 'http://www.wikidata.org/entity/Q2527217',
 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q1712683',
 'http://www.wikidata.org/entity

 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q85761254',
 'http://www.wikidata.org/entity/Q1271052',
 'http://www.wikidata.org/entity/Q6700132',
 'http://www.wikidata.org/entity/Q88907',
 'http://www.wikidata.org/entity/Q60185',
 'http://www.wikidata.org/entity/Q3051533',
 'http://www.wikidata.org/entity/Q457739',
 'http://www.wikidata.org/entity/Q1641821',
 'http://www.wikidata.org/entity/Q18935222',
 'http://www.wikidata.org/entity/Q2527217',
 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q1712683',
 'http://www.wikidata.org/entity/Q537874',
 'http://www.wikidata.org/entity/Q90407',
 'http://www.wikidata.org/entity/Q13

 'http://www.wikidata.org/entity/Q1296486',
 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q85761254',
 'http://www.wikidata.org/entity/Q1271052',
 'http://www.wikidata.org/entity/Q6700132',
 'http://www.wikidata.org/entity/Q88907',
 'http://www.wikidata.org/entity/Q60185',
 'http://www.wikidata.org/entity/Q3051533',
 'http://www.wikidata.org/entity/Q457739',
 'http://www.wikidata.org/entity/Q1641821',
 'http://www.wikidata.org/entity/Q18935222',
 'http://www.wikidata.org/entity/Q2527217',
 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q1712683',
 'http://www.wikidata.org/entity/Q537874',
 'http://www.wikidata.org/entity/Q

 'http://www.wikidata.org/entity/Q2824734',
 'http://www.wikidata.org/entity/Q61913691',
 'http://www.wikidata.org/entity/Q1296486',
 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q85761254',
 'http://www.wikidata.org/entity/Q1271052',
 'http://www.wikidata.org/entity/Q6700132',
 'http://www.wikidata.org/entity/Q88907',
 'http://www.wikidata.org/entity/Q60185',
 'http://www.wikidata.org/entity/Q3051533',
 'http://www.wikidata.org/entity/Q457739',
 'http://www.wikidata.org/entity/Q1641821',
 'http://www.wikidata.org/entity/Q18935222',
 'http://www.wikidata.org/entity/Q2527217',
 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity

 'http://www.wikidata.org/entity/Q60185',
 'http://www.wikidata.org/entity/Q3051533',
 'http://www.wikidata.org/entity/Q457739',
 'http://www.wikidata.org/entity/Q1641821',
 'http://www.wikidata.org/entity/Q18935222',
 'http://www.wikidata.org/entity/Q2527217',
 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q1712683',
 'http://www.wikidata.org/entity/Q537874',
 'http://www.wikidata.org/entity/Q90407',
 'http://www.wikidata.org/entity/Q1373290',
 'http://www.wikidata.org/entity/Q2824734',
 'http://www.wikidata.org/entity/Q61913691',
 'http://www.wikidata.org/entity/Q1296486',
 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q

 'http://www.wikidata.org/entity/Q18935222',
 'http://www.wikidata.org/entity/Q2527217',
 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q1712683',
 'http://www.wikidata.org/entity/Q537874',
 'http://www.wikidata.org/entity/Q90407',
 'http://www.wikidata.org/entity/Q1373290',
 'http://www.wikidata.org/entity/Q2824734',
 'http://www.wikidata.org/entity/Q61913691',
 'http://www.wikidata.org/entity/Q1296486',
 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q85761254',
 'http://www.wikidata.org/entity/Q1271052',
 'http://www.wikidata.org/entity/Q6700132',
 'http://www.wikidata.org/entity/Q88907',
 'http://www.wikidata.org/entity

 'http://www.wikidata.org/entity/Q3051533',
 'http://www.wikidata.org/entity/Q457739',
 'http://www.wikidata.org/entity/Q1641821',
 'http://www.wikidata.org/entity/Q18935222',
 'http://www.wikidata.org/entity/Q2527217',
 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q1712683',
 'http://www.wikidata.org/entity/Q537874',
 'http://www.wikidata.org/entity/Q90407',
 'http://www.wikidata.org/entity/Q1373290',
 'http://www.wikidata.org/entity/Q2824734',
 'http://www.wikidata.org/entity/Q61913691',
 'http://www.wikidata.org/entity/Q1296486',
 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q85761254',
 'http://www.wikidata.org/entit

 'http://www.wikidata.org/entity/Q1296486',
 'http://www.wikidata.org/entity/Q995470',
 'http://www.wikidata.org/entity/Q1629748',
 'http://www.wikidata.org/entity/Q85761254',
 'http://www.wikidata.org/entity/Q1271052',
 'http://www.wikidata.org/entity/Q6700132',
 'http://www.wikidata.org/entity/Q88907',
 'http://www.wikidata.org/entity/Q60185',
 'http://www.wikidata.org/entity/Q3051533',
 'http://www.wikidata.org/entity/Q457739',
 'http://www.wikidata.org/entity/Q1641821',
 'http://www.wikidata.org/entity/Q18935222',
 'http://www.wikidata.org/entity/Q2527217',
 'http://www.wikidata.org/entity/Q1361667',
 'http://www.wikidata.org/entity/Q1089074',
 'http://www.wikidata.org/entity/Q19997512',
 'http://www.wikidata.org/entity/Q55453618',
 'http://www.wikidata.org/entity/Q41616785',
 'http://www.wikidata.org/entity/Q1715096',
 'http://www.wikidata.org/entity/Q3057287',
 'http://www.wikidata.org/entity/Q1712683',
 'http://www.wikidata.org/entity/Q537874',
 'http://www.wikidata.org/entity/Q

<h3><u>Manipulating and Cleaning the data</u></h3>

We clean the data by adding every entity to a set, in order to avoid duplicates.

Then, we manipulate the data selecting only those who are inside wikidata.

In [42]:
wiki_subjects = set()
wiki_collectors = set()

for person in subjects:
    if "wikidata.org/entity/" in person:
                uri = "<"+ person + ">"
                wiki_subjects.add(uri)
for person in collectors:
    if "wikidata.org/entity/" in person:
                uri = "<"+ person + ">"
                wiki_collectors.add(uri)
                

pp.pprint("There are " + str(len(wiki_collectors)) + " collectors in artchives.")                
pp.pprint(wiki_subjects)
pp.pprint(wiki_collectors)

'There are 26 collectors in artchives.'
{'<http://www.wikidata.org/entity/Q100511>',
 '<http://www.wikidata.org/entity/Q103498>',
 '<http://www.wikidata.org/entity/Q105944>',
 '<http://www.wikidata.org/entity/Q107970>',
 '<http://www.wikidata.org/entity/Q10860738>',
 '<http://www.wikidata.org/entity/Q108748>',
 '<http://www.wikidata.org/entity/Q1089074>',
 '<http://www.wikidata.org/entity/Q113817>',
 '<http://www.wikidata.org/entity/Q1165526>',
 '<http://www.wikidata.org/entity/Q1177671>',
 '<http://www.wikidata.org/entity/Q123466>',
 '<http://www.wikidata.org/entity/Q1280671>',
 '<http://www.wikidata.org/entity/Q1296486>',
 '<http://www.wikidata.org/entity/Q1361667>',
 '<http://www.wikidata.org/entity/Q1425008>',
 '<http://www.wikidata.org/entity/Q1435690>',
 '<http://www.wikidata.org/entity/Q1465270>',
 '<http://www.wikidata.org/entity/Q1510138>',
 '<http://www.wikidata.org/entity/Q1515397>',
 '<http://www.wikidata.org/entity/Q15453236>',
 '<http://www.wikidata.org/entity/Q160236>',


<h3><u>Integrating gender information and labels</u></h3>

At this point, since there is no information on Artchives about the gender, we do a query on Wikidata that allows us to track down the women present in it.

We make this passage for both <b>collectors</b> and <b>subjects</b>, adding information about their gender and their name. 

We create two dictionaries, in which to group the results.

In [43]:
wd = Namespace("http://www.wikidata.org/entity/") 
wdt = Namespace("http://www.wikidata.org/prop/direct/")

historians = ' '.join(wiki_subjects)

wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

wikidata_query = """
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?historian ?historianname ?gendername
WHERE {
    VALUES ?historian {"""+historians+"""} . 
    ?historian wdt:P21 ?gender . 
    ?historian rdfs:label ?historianname .
    ?gender rdfs:label ?gendername .
    FILTER (langMatches(lang(?gendername), "EN"))
    FILTER (langMatches(lang(?historianname), "EN"))
    
    } 
"""

sparql_wd = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_wd.setQuery(wikidata_query)
sparql_wd.setReturnFormat(JSON)
gender_results = sparql_wd.query().convert()

womenuris = set()
subjectsDict = dict()
femalehistoriansDict = dict()

for result in gender_results["results"]["bindings"]:
    historian = result["historian"]["value"]
    gender = result["gendername"]["value"]
    name = result["historianname"]["value"]
    subjectsDict[historian] = (name,gender)
    if gender == "female":
        womenuris.add("<" + historian + ">")
        femalehistoriansDict[historian] = (name,gender)

        
        
    
pp.pprint(subjectsDict)

{'http://www.wikidata.org/entity/Q100511': ('Willibald Sauerländer', 'male'),
 'http://www.wikidata.org/entity/Q103498': ('Henriette Hertz', 'female'),
 'http://www.wikidata.org/entity/Q105944': ('Richard Hamann', 'male'),
 'http://www.wikidata.org/entity/Q107970': ('Paul Frankl', 'male'),
 'http://www.wikidata.org/entity/Q10860738': ('James Pope-Hennessy', 'male'),
 'http://www.wikidata.org/entity/Q108748': ('Paul Fridolin Kehr', 'male'),
 'http://www.wikidata.org/entity/Q1089074': ('Federico Zeri', 'male'),
 'http://www.wikidata.org/entity/Q113817': ('Justus Bier', 'male'),
 'http://www.wikidata.org/entity/Q123466': ('Heinrich Wölfflin', 'male'),
 'http://www.wikidata.org/entity/Q1280671': ('Karl Leo Heinrich Lehmann',
                                             'male'),
 'http://www.wikidata.org/entity/Q1296486': ('Wolfgang Lotz', 'male'),
 'http://www.wikidata.org/entity/Q1361667': ('Roberto Longhi', 'male'),
 'http://www.wikidata.org/entity/Q1425008': ('Hans Tietze', 'male'),
 'h

In [44]:
wd = Namespace("http://www.wikidata.org/entity/") 
wdt = Namespace("http://www.wikidata.org/prop/direct/")

collectors = ' '.join(wiki_collectors)

wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

collectors_query = """
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?collector ?collectorname ?gendername
WHERE {
    VALUES ?collector {"""+collectors+"""} . 
    ?collector wdt:P21 ?gender . 
    ?collector rdfs:label ?collectorname .
    ?gender rdfs:label ?gendername .
    FILTER (langMatches(lang(?gendername), "EN"))
    FILTER (langMatches(lang(?collectorname), "EN"))
    
    } 
"""

sparql_coll = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_coll.setQuery(collectors_query)
sparql_coll.setReturnFormat(JSON)
gender_c_results = sparql_coll.query().convert()

female_collectors_uris = set()
collectorsDict = dict()
femalecollectorsDict = dict()

for result in gender_c_results["results"]["bindings"]:
    collector = result["collector"]["value"]
    gender = result["gendername"]["value"]
    name = result["collectorname"]["value"]
    collectorsDict[collector] = (name,gender)
    if gender == "female":
        female_collectors_uris.add("<" + collector + ">")
        femalecollectorsDict[collector] = (name,gender)

        
        
    
pp.pprint(collectorsDict)
pp.pprint(len(collectorsDict))

{'http://www.wikidata.org/entity/Q1089074': ('Federico Zeri', 'male'),
 'http://www.wikidata.org/entity/Q1271052': ('Fritz Heinemann', 'male'),
 'http://www.wikidata.org/entity/Q1296486': ('Wolfgang Lotz', 'male'),
 'http://www.wikidata.org/entity/Q1361667': ('Roberto Longhi', 'male'),
 'http://www.wikidata.org/entity/Q1373290': ('Eugenio Battisti', 'male'),
 'http://www.wikidata.org/entity/Q1629748': ('Kurt Badt', 'male'),
 'http://www.wikidata.org/entity/Q1641821': ('Otto Lehmann-Brockhaus', 'male'),
 'http://www.wikidata.org/entity/Q1712683': ('Julius S. Held', 'male'),
 'http://www.wikidata.org/entity/Q1715096': ('Ulrich Middeldorf', 'male'),
 'http://www.wikidata.org/entity/Q18935222': ('Werner Cohn', 'male'),
 'http://www.wikidata.org/entity/Q19997512': ('Everett Fahy', 'male'),
 'http://www.wikidata.org/entity/Q2527217': ('Lionello Venturi', 'male'),
 'http://www.wikidata.org/entity/Q2824734': ('Adolfo Venturi', 'male'),
 'http://www.wikidata.org/entity/Q3051533': ('Ellis Waterh

We then clean the data once more, selecting only the ones who are females into two different dictionaries for collectors and subjects.

In [45]:
pp.pprint(femalehistoriansDict)
pp.pprint(femalecollectorsDict)

{'http://www.wikidata.org/entity/Q103498': ('Henriette Hertz', 'female'),
 'http://www.wikidata.org/entity/Q18342317': ('Elizabeth McGrath', 'female'),
 'http://www.wikidata.org/entity/Q19997511': ('Evelyn Sandberg Vavalà',
                                              'female'),
 'http://www.wikidata.org/entity/Q21176228': ('Annette Michelson', 'female'),
 'http://www.wikidata.org/entity/Q21264725': ('Steffi Roettgen', 'female'),
 'http://www.wikidata.org/entity/Q236958': ('Katherine Anne Porter', 'female'),
 'http://www.wikidata.org/entity/Q449499': ('Svetlana Alpers', 'female'),
 'http://www.wikidata.org/entity/Q61481008': ('Anna Ottani Cavina', 'female'),
 'http://www.wikidata.org/entity/Q7882150': ('Una Pope-Hennessy', 'female')}
{'http://www.wikidata.org/entity/Q61913691': ('Luisa Vertova', 'female')}


We prepare the data for the enrichment process, creating reusable lists of uris.

Then we unify the two collections of uris in one final list.

In [46]:
womenurisList = list(womenuris)

women_collectors_urisList = list(female_collectors_uris)

women_in_artchives = womenurisList + women_collectors_urisList
pp.pprint(women_in_artchives)

['<http://www.wikidata.org/entity/Q103498>',
 '<http://www.wikidata.org/entity/Q236958>',
 '<http://www.wikidata.org/entity/Q21264725>',
 '<http://www.wikidata.org/entity/Q21176228>',
 '<http://www.wikidata.org/entity/Q7882150>',
 '<http://www.wikidata.org/entity/Q18342317>',
 '<http://www.wikidata.org/entity/Q61481008>',
 '<http://www.wikidata.org/entity/Q449499>',
 '<http://www.wikidata.org/entity/Q19997511>',
 '<http://www.wikidata.org/entity/Q61913691>']


We have already achieved a very important result: to identify nine other women present in Artchives in addition to the art historian Luisa Vertova, unique to be effectively visible on ARTchives.

That’s where the interesting part begins. The women we have identified are now names, they have a definition, but we want to enrich it with the information on which we have based our whole martrioska project, namely:
- their names
- their gender
- their country of origin
- their occupations
- whether or not they are members of institutions or academies
- whether they have produced scholarly works.

In [47]:
wd = Namespace("http://www.wikidata.org/entity/") 
wdt = Namespace("http://www.wikidata.org/prop/direct/")

women = ' '.join(women_in_artchives)

wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

women_query = """
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?woman ?name ?genderLabel ?countryLabel ?memberofLabel ?occupationLabel 
WHERE { 
      VALUES ?woman {"""+women+"""} . 
      ?woman rdfs:label ?name;
              wdt:P21 ?gender;
              wdt:P27 ?country;
              wdt:P106 ?occupation;
              wdt:P463 ?memberof. 
  FILTER (langMatches(lang(?name), "EN"))
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }              
}

"""

sparql_women = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_women.setQuery(women_query)
sparql_women.setReturnFormat(JSON)
women_results = sparql_women.query().convert()

final_dict = {}

for result in women_results["results"]["bindings"]:
    woman = result["woman"]["value"]
    empty_set = set()
    empty_set_two = set()
    empty_set_three = set()
    final_dict[woman]= {"name":name, "gender":gender,"occupation":empty_set,"country":empty_set_two, "memberof":empty_set_three}


for result in women_results["results"]["bindings"]:
    woman = result["woman"]["value"]
    name = result["name"]["value"]
    gender = result["genderLabel"]["value"]
    country = result["countryLabel"]["value"]
    occupation = result["occupationLabel"]["value"]
    memberof = result["memberofLabel"]["value"]
    final_dict[woman]["name"] = name
    final_dict[woman]["gender"] = gender
    final_dict[woman]["occupation"].add(occupation)
    final_dict[woman]["country"].add(country)
    final_dict[woman]["memberof"].add(memberof)
    

         
pp.pprint(final_dict)

{'http://www.wikidata.org/entity/Q103498': {'country': {'German Empire'},
                                            'gender': 'female',
                                            'memberof': {'Bibliotheca '
                                                         'Hertziana – Max '
                                                         'Planck Institute of '
                                                         'Art History'},
                                            'name': 'Henriette Hertz',
                                            'occupation': {'art collector',
                                                           'patron of the '
                                                           'arts'}},
 'http://www.wikidata.org/entity/Q18342317': {'country': {'United Kingdom'},
                                              'gender': 'female',
                                              'memberof': {'British Academy',
                                                

In [48]:
women_works_query = """
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?woman ?scholarly_workLabel ?memberofLabel ?occupationLabel 
WHERE { 
      VALUES ?woman {"""+women+"""} . 
      ?woman rdfs:label ?name.
      ?scholarly_work wdt:P50 ?woman.
      ?scholarly_work wdt:P31/wdt:P279* wd:Q55915575.
  FILTER (langMatches(lang(?name), "EN"))
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }              
}

"""


sparql_women_works = SPARQLWrapper(wikidata_endpoint, agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36')
sparql_women_works.setQuery(women_works_query)
sparql_women_works.setReturnFormat(JSON)
women_works_results = sparql_women_works.query().convert()

for result in women_works_results["results"]["bindings"]:
    if result['woman']['value'] in final_dict.keys():
        if 'works' not in final_dict[result['woman']['value']]:
            final_dict[result['woman']['value']]['works'] = [result['scholarly_workLabel']['value']]
        else:
            final_dict[result['woman']['value']]['works'].append(result['scholarly_workLabel']['value'])
            
pp.pprint(final_dict)


{'http://www.wikidata.org/entity/Q103498': {'country': {'German Empire'},
                                            'gender': 'female',
                                            'memberof': {'Bibliotheca '
                                                         'Hertziana – Max '
                                                         'Planck Institute of '
                                                         'Art History'},
                                            'name': 'Henriette Hertz',
                                            'occupation': {'art collector',
                                                           'patron of the '
                                                           'arts'}},
 'http://www.wikidata.org/entity/Q18342317': {'country': {'United Kingdom'},
                                              'gender': 'female',
                                              'memberof': {'British Academy',
                                                

We export all the json results so far collected into a single and reusable json file, that we will use to populate our website.

In [49]:
with open('data.json', 'w') as f:
    json.dump(final_json, f, indent=4)