<font color='red'>NOTE: Please do not edit this file. </font> Go to <font color='blue'>*File > Save a copy in Drive*</font>.

# **openHPI Course: Knowledge Graphs 2023**

## **Week 3: Querying Knowledge Graphs with SPARQL**
### **Notebook 3.2: DBpedia**
---

This is the python notebook for week 3 (Querying Knowledge Graphs with SPARQL) in the openHPI Course **Knowledge Graphs 2023**.

In this colab notebook you will learn how to query the DBpedia Knowledge Graph.

*Please make a copy of this notebook to try out your own adaptions via "File -> Save Copy in Drive"*

In [None]:
!pip install -q sparqlwrapper    #install SPARQLwrapper

In [None]:
from SPARQLWrapper import SPARQLWrapper, JSON, XML, RDF
import pandas as pd

In [None]:
sparql = SPARQLWrapper("http://dbpedia.org/sparql") #determine SPARQL endpoint
sparql.setReturnFormat(JSON) #determine the output format

#### **Query 1**: Show all the Nobel laureates in Literature over the years, if available print out their thumbnail and description ####

In [None]:
#SPARQL query to be executed
sparql.setQuery("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbc: <http://dbpedia.org/resource/Category:>
PREFIX dct: <http://purl.org/dc/terms/>

Select distinct ?date ?author ?authorName ?thumbnail ?description


WHERE {
?author rdf:type dbo:Writer ;
              dct:subject dbc:Nobel_laureates_in_Literature ;
        rdfs:label ?authorName ;
        dbo:wikiPageWikiLink ?link ;
        rdfs:comment ?description
 FILTER ((lang(?authorName)="en")&&(lang(?description)="en")) .

?link dct:subject dbc:Nobel_Prize_in_Literature ;
         dbp:holderLabel ?date .
 OPTIONAL { ?author dbo:thumbnail ?thumbnail . }
}

ORDER BY ?date

""")

sparql.setReturnFormat(JSON)   # Return format is JSON
results = sparql.query().convert()   # execute SPARQL query and write result to "results"

Let's try out another visualisation of query results and format it into HTML.

In [None]:
from google.colab import files
import IPython

In [None]:
with open('authors.html', 'w') as f:
	# Create HTML output
	f.write('<html><head><title>Nobeal Laureates in Literature</title></head>')
	f.write('<body><h1>Nobel Laureates in Literature over years</h1>')
	f.write('<ul>')
	for result in results["results"]["bindings"]:
			if ("author" in result):
				wikiurl = "http://en.wikipedia.org/wiki/" + result["author"]["value"].split('/')[-1] #Create a Wikipedia Link
			else:
				wikiurl = 'NONE'
			if ("authorName" in result):
				name = result["authorName"]["value"]
			else:
				name = 'NONE'
			if ("date" in result):
				date = result["date"]["value"]
			else:
				date = 'NONE'
			if ("description" in result):
				description = result["description"]["value"]
			else:
				description = ' '
			if ("thumbnail" in result):
				pic = result["thumbnail"]["value"]
			else:
				pic = 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/b0/Question_mark2.svg/71px-Question_mark2.svg.png'

			f.write('<li><b>{}</b> -- <img src="{}" height="60px"> <a href="{}">{}</a>, {} </li>'.format(date, pic.replace("300", "60"), wikiurl, name, description))

	f.write('</ul>')
	f.write('</body></html>')

files.download('authors.html')
IPython.display.HTML(filename='authors.html')



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

#### **Query 2**: List all the poets in DBpedia ####

Remember, there is not one and only one correct way to model the same facts! Thus, the fact that a person is/was a poet is modelled in a minimum of three different ways in the dbpedia.


In [None]:
sparql.setQuery("""
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX dbr: <http://dbpedia.org/resource/>

SELECT DISTINCT ?name ?occupation

WHERE {

{?person rdf:type dbo:Poet} UNION {?person dbo:occupation dbr:Poet} UNION {?person dbp:occupation ?occupation}  .
?person rdfs:label ?name .
FILTER ((LANG(?name)="en")&&(LANG(?occupation) = "en")) .

#FILTER (LANG(?occupation) = "en").
FILTER(regex(?occupation, "[pP]oet" ))

}

 """
)
results = sparql.query().convert()   # execute SPARQL query and write result to "results"
results_df = pd.io.json.json_normalize(results['results']['bindings'])
results_df[['name.value', 'occupation.value']]

  results_df = pd.io.json.json_normalize(results['results']['bindings'])


Unnamed: 0,name.value,occupation.value
0,Caetano da Costa Alegre,poet
1,Cai Yan,"Composer, poet, writer"
2,Cale Young Rice,poet and dramatist
3,Camerina Pavón y Oviedo,Poet
4,Camil Petrescu,poet
...,...,...
9491,Telimxan,Iranian Poet
9492,Taije Silverman,"Poet, Translator, Professor"
9493,Xavier de Magallon,"Poet, translator, politician"
9494,Yan Shu,"Calligrapher, essayist, poet, and politician"
