# Praca z pakietem RDFLib, czyli na początku była trójka


## O pakiecie RDFLib
  
RDFLib to pakiet Pythona przeznaczony do pracy z modelem danych RDF. Podstawowym interfejsem dostępnym w pakiecie jest `Graph`. Graf w RDFLib jest nieuporządkowanym zbioram trójek: 

```
[
 (podmiot_0, predykat_0, obiekt_0),
 (podmiot_1, predykat_1, obiect_1),
 ...
 (podmiot_N, predykat_N, obiekt_N)
]
```

Można więc na nim wykonywać operacje jak na zwykłych zbiorach, np. `add()`, aby dodać trójkę do grafu, oraz inne metody, które szukają i zwracają trójki w dowolnej kolejności.


W skład pakietu RDFLib wchodzą:
- parsery i serializatory dla RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, RDFa, Microdata i JSON-LD
- baza trójek (ang. *triple store*) do podręcznego przechowywania trójek w pamięci, jak też ich trwałego przechowywania w bazie Berkeley
- mechanizmy inferencji zarówno dla pojedynczego grafu jak i wielu grafów nazwanych (ang. *named graphs*)
- implementacja SPARQL 1.1, w tym obsługa kwerend typu SELECT i UPDATE.   
     



Dokumentacja pakietu oraz dyskusje społeczności rozwijającej ten pakiet są dostępne pod adresami:
- [https://rdflib.readthedocs.io/en/stable/](https://rdflib.readthedocs.io/en/stable/)
- [https://github.com/RDFLib/rdflib/](https://github.com/RDFLib/rdflib/)
- [https://www.w3.org/RDF/](https://www.w3.org/RDF/)
- [http://groups.google.com/group/rdflib-dev](http://groups.google.com/group/rdflib-dev)

## Instalacja

RDFLib instalujemy za pomocą komendy  ```!pip3 install rdflib``` (lub ```!pip install rdflib```)

In [None]:
# !pip3 install rdflib

Aby korzystać z tego pakietu importuje go w następujący sposób:

In [13]:
import rdflib

## Parsujemy istniejący graf

Najpierw zobacz jakie informacje kryją się pod linkiem http://www.w3.org/People/Berners-Lee/card. Zobaczysz mało ciekawą stronę www. Jest jednak coś interesującego w tej stronie - została ona wygenerowana na podstawie grafu! Sieć semantyczna daje nam narzędzia, aby tego typu grafy przetwarzać w dololny sposób. Spróbujmy!

In [15]:
# Utwórz pusty graf o nazwie g_tbl ("tbl" od Tim Berners-Lee).
from rdflib import Graph
g_tbl = Graph()

# Wypełnij graf trójkami z grafu kryjącego się pod "http://www.w3.org/People/Berners-Lee/card".
g_tbl.parse(source='http://www.w3.org/People/Berners-Lee/card', format='xml')

<Graph identifier=N5b7775b942cb4ec8af86d76382c5ac14 (<class 'rdflib.graph.Graph'>)>

Jeśli nie podasz formatu w jakim został zapisany graf, RDFLib założy, że jest to RDF/XML (nie musieliśmy więc w komórce wyżej dodawać `format='xml'`). 

Jeśli nie jesteśmy pewni serializacji grafu, możemy skorzystać z funkcji `rdflib.util.guess_format()`, która zgaduje serializację grafu na podstawie rozszerzenie pliku (jeśli mamy tylko URL grafu, jak np. "http://www.w3.org/People/Berners-Lee/card", to funkcja ta na nic się zda).

In [26]:
from rdflib.util import guess_format
print('Format twojego grafu to:', guess_format('my_file.rdf'))
print('Format twojego grafu to:', guess_format('my_file.ttl'))
print('Format twojego grafu to:', guess_format('http://www.w3.org/People/Berners-Lee/card'))

Format twojego grafu to: xml
Format twojego grafu to: turtle
Format twojego grafu to: None


Wróćmy jednak do naszego grafu `g_tbl`.

In [27]:
# Sprawdźmy ile trójek jest w grafie g_tbl.
print("Graph g_tbl ma {} trójek.".format(len(g_tbl)))

Graph g_tbl ma 86 trójek.


In [28]:
# Wyświetl zawartość grafu g_tbl w serializacji Turtle.
print(g_tbl.serialize(format="turtle").decode("utf-8"))

@prefix : <http://xmlns.com/foaf/0.1/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix cert: <http://www.w3.org/ns/auth/cert#> .
@prefix con: <http://www.w3.org/2000/10/swap/pim/contact#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix doap: <http://usefulinc.com/ns/doap#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sioc: <http://rdfs.org/sioc/ns#> .
@prefix solid: <http://www.w3.org/ns/solid/terms#> .
@prefix space: <http://www.w3.org/ns/pim/space#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://dig.csail.mit.edu/2005/ajar/ajaw/data#Tabulator> doap:developer <https://www.w3.org/People/Berners-Lee/card#i> .

<http://dig.csail.mit.edu/2007/01/camp/data#course> :maker <https://www.w3.org/People/Berner

In [29]:
# Wyświetl zawartość grafu g_tbl w serializacji RDF/XML.
print(g_tbl.serialize(format="xml").decode("utf-8"))

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns="http://xmlns.com/foaf/0.1/"
   xmlns:cc="http://creativecommons.org/ns#"
   xmlns:cert="http://www.w3.org/ns/auth/cert#"
   xmlns:con="http://www.w3.org/2000/10/swap/pim/contact#"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:dct="http://purl.org/dc/terms/"
   xmlns:doap="http://usefulinc.com/ns/doap#"
   xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
   xmlns:ldp="http://www.w3.org/ns/ldp#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:schema="http://schema.org/"
   xmlns:sioc="http://rdfs.org/sioc/ns#"
   xmlns:solid="http://www.w3.org/ns/solid/terms#"
   xmlns:space="http://www.w3.org/ns/pim/space#"
   xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
>
  <rdf:Description rdf:about="https://www.w3.org/People/Berners-Lee/card#i">
    <rdfs:seeAlso rdf:resource="https://timbl.com/timbl/Public/friends.ttl"/>
    <name>Timothy Berners-Lee</na

Uwaga! Jeśli nie pamiętasz czym są serializacje grafów, wróć do sekcji wprowadzającej do tego rozdziału. 

RDFLib obsługuje serializacje: ``turtle`` (w skrócie ``ttl``), ``xml``, ``n3`` oraz ``ntriples`` (w skrócie ``nt``). Dodatkowo można również korzystać z serializacji ``JSON-LD`` (tu będziemy potrzebować jednak dodatkowego pluginu) oraz ``trix``, gdy mamy do czynienia z grafami nazwanymi (ang. *named graphs*) lub bazami trójek.

In [30]:
# Zapisz graf g_tbl w serializacji Turtle w pliku o nazwie "tbl.ttl".
g.serialize('tbl.ttl', format='turtle')

## Tworzymy własny graf

Aby utworzyć własny graf musimy sobie przypomnieć jak mogą wygląć trójki w grafie RDF. Pamiętamy, że trójki mogą zawierać identyfikatory zasobów, węzły puste w podmiocie i dopełnieniu trójki oraz literały w dopełnieniu trójki. 

RDFLib pozawala na utworzenie 
- identyfikatora dowolnego zasobu za pomocą funkcji ``URIRef``,
- literału za pomocą funkcji ``Literal`` oraz
- węzła pustego za pomocą funcji ``BNode``.

In [None]:
# Zaimportuj funkcje "URIRef", "Literal" oraz "BNode".
from rdflib import URIRef, Literal, BNode

In [None]:
uri_jana_kowalskiego = URIRef('http://ksiazka-si.pl/jan_kowalski')
uri_nazywa_sie = URIRef('http://ksiazka-si.pl/nazywa_sie')
uri_zna = URIRef('http://ksiazka-si.pl/zna')
literal_jan_kowalski_pl = Literal('Jan Kowalski', lang='pl')
bnode_kogos = BNode()
literal_mariana_zuka_pl = Literal('Marian Żuk', lang='pl')

Poniżej stworzymy nowy graf stwierdzający, że zasób o identyfikatorze "http://ksiazka-si.pl/jan_kowalski" nazywa się (w języku polskim) Jan Kowalski i zna kogoś, kto nazywa się Marian Żuk.

In [None]:
# Utwórz nowy graf.
g = Graph()

# Dodaj trójki do grafu.
g.add((uri_jana_kowalskiego, uri_nazywa_sie, literal_jan_kowalski_pl))
g.add((uri_jana_kowalskiego, uri_zna, bnode_kogos))
g.add((bnode_kogos, uri_nazywa_sie, literal_mariana_zuka_pl))

# Wyświetl zawartość grafu w serializacji Turtle.
print(g.serialize(format='turtle').decode("utf-8"))

W powyższym przykładzie użyliśmy dwóch wymyśłonych przez nas relacji "http://ksiazka-si.pl/nazywa_sie" i "http://ksiazka-si.pl/zna". Idea Sieci Semantycznej jest jednak taka, aby wymyślać nowe relacje tylko wówczas, kiedy nie istnieją dobre już wymyślone. Tak się akurat składa, że odpowiedniki tych relacji znajdziemy w standardzie FOAF (http://xmlns.com/foaf/spec/). RDFLib obsługuje FOAF.

In [None]:
# Zajmortuj przestrzeń nazw FOAF
from rdflib.namespace import FOAF

Zmodyfikujmy teraz nasz przykład powyżej w taki sposób, abyśmy mogli skorzystać z relacji ze standardu FOAF.

In [None]:
# Utwórz nowy graf.
g = Graph()

# Dodaj trójki do grafu.
g.add((uri_jana_kowalskiego, FOAF.name, literal_jan_kowalski_pl))
g.add((uri_jana_kowalskiego, FOAF.knows, bnode_kogos))
g.add((bnode_kogos, FOAF.name, literal_mariana_zuka_pl))

# Wyświetl zawartość grafu w serializacji Turtle.
print(g.serialize(format='turtle').decode("utf-8"))

Aby zmienić prefiks przestrzeni FOAF musimy go zdefiniować zanim zaczniemy wypełniać graf trójkami.

In [None]:
# Utwórz nowy graf.
g = Graph()

# Przypisz prefiks "foaf" do przestreni FOAF
g.bind('foaf', FOAF)

# Dodaj trójki do grafu.
g.add((uri_jana_kowalskiego, FOAF.name, literal_jan_kowalski_pl))
g.add((uri_jana_kowalskiego, FOAF.knows, bnode_kogos))
g.add((bnode_kogos, FOAF.name, literal_mariana_zuka_pl))

# Wyświetl zawartość grafu w serializacji Turtle.
print(g.serialize(format='turtle').decode("utf-8"))

In [None]:
from rdflib.namespace import CSVW, DC, DCAT, DCTERMS, DOAP, FOAF, ODRL2, ORG, OWL,\
                             PROF, PROV, RDF, RDFS, SDO, SH, SKOS, SOSA, SSN, TIME,\
                             VOID, XMLNS, XSD

print(RDF.type)
# = rdflib.term.URIRef("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")

print(FOAF.knows)
# = rdflib.term.URIRef("http://xmlns.com/foaf/0.1/knows")

print(PROF.isProfileOf)
# = rdflib.term.URIRef("http://www.w3.org/ns/dx/prof/isProfileOf")

print(SOSA.Sensor)
# = rdflib.term.URIRef("http://www.w3.org/ns/sosa/Sensor")

In [None]:
# g.add((ed, FOAF.nick, Literal("ed", datatype=XSD.string))) 
# g.add((ed, FOAF.mbox, URIRef("mailto:e.scissorhands@example.org")))

Literals can be created from Python objects, this creates `data-typed literals`, for the details on the mapping see 2.1.3 Literals.

For creating many URIRefs in the same `namespace`, i.e. URIs with the same prefix, RDFLib has the `rdflib.namespace.Namespace` class:

In [None]:
from rdflib import Namespace

n = Namespace('http://ksiazka-si.pl/')

uri_jana_kowalskiego = URIRef(n.jan_kowalski)
uri_nazywa_sie = URIRef(n.nazywa_sie)
uri_zna = URIRef(n.zna)
literal_jan_kowalski_pl = Literal('Jan Kowalski', lang='pl')
bnode_kogos = BNode()
literal_mariana_zuka_pl = Literal('Marian Żuk', lang='pl')

# Utwórz nowy graf.
g = Graph()

g.bind('foaf', FOAF)

# Dodaj trójki do grafu.
g.add((uri_jana_kowalskiego, FOAF.name, literal_jan_kowalski_pl))
g.add((uri_jana_kowalskiego, FOAF.knows, bnode_kogos))
g.add((bnode_kogos, FOAF.name, literal_mariana_zuka_pl))

# Wyświetl zawartość grafu w serializacji Turtle.
print(g.serialize(format='turtle').decode("utf-8"))

For some properties, only one value per resource makes sense (i.e they are functional properties, or have maxcardinality of 1). The `set()` method is useful for this:

In [None]:
# Utwórz nowy graf.
g = Graph()

g.bind('foaf', FOAF)

g.add((uri_jana_kowalskiego, FOAF.nick, Literal('Wafel'))) 
# print("Bob is", g.value(uri_jana_kowalskiego, FOAF.age)) # prints: Bob is 42

g.add((uri_jana_kowalskiego, FOAF.nick, Literal('Chudy'))) 
# print("Bob is", g.value(uri_jana_kowalskiego, FOAF.age)) # prints: Bob is 42

# Wyświetl zawartość grafu w serializacji Turtle.
print(g.serialize(format='turtle').decode("utf-8"))

In [None]:
print(g.value(uri_jana_kowalskiego, FOAF.nick))

In [None]:
# Utwórz nowy graf.
g = Graph()

g.bind('foaf', FOAF)

g.add((uri_jana_kowalskiego, FOAF.nick, Literal('Wafel'))) 
g.add((uri_jana_kowalskiego, FOAF.nick, Literal('Chudy'))) 
g.set((uri_jana_kowalskiego, FOAF.nick, Literal('Szczypior'))) # replaces 42 set above 

# Wyświetl zawartość grafu w serializacji Turtle.
print(g.serialize(format='turtle').decode("utf-8"))

## Usuwanie trójek z grafu

Trójki usuwamy z grafu za pomocą metody `remove()`. 

In [None]:
# Utwórz nowy graf.
g = Graph()

g.bind('foaf', FOAF)

# Dodaj trójki do grafu.
g.add((uri_jana_kowalskiego, FOAF.name, literal_jan_kowalski_pl))
g.add((uri_jana_kowalskiego, FOAF.knows, bnode_kogos))
g.add((bnode_kogos, FOAF.name, literal_mariana_zuka_pl))

# Wyświetl zawartość grafu w serializacji Turtle.
print('PRZED:')
print(g.serialize(format='turtle').decode("utf-8"))

# Usuń wszystkie trójki orzekające o uri_jana_kowalskiego
g.remove((uri_jana_kowalskiego, None, None)) 

# Wyświetl zawartość grafu w serializacji Turtle.
print('PO:')
print(g.serialize(format='turtle').decode("utf-8"))

## Filtrowanie grafu

Teraz poznasz szereg metod pozwalających na filtrowanie grafu, tj. wyszukanie w grafie jego fragmentu spełniającego określone własności.

In [31]:
from rdflib import Graph
g_tbl = Graph()
g_tbl.parse(source='http://www.w3.org/People/Berners-Lee/card')

<Graph identifier=N2218f9d79e974237ac4b18273264d5fc (<class 'rdflib.graph.Graph'>)>

Zacznij od wypisanie wszystkich trójek. Możesz to zrobić tak:

In [36]:
for triple in g_tbl:
    print(triple) 

(rdflib.term.URIRef('https://www.w3.org/People/Berners-Lee/card#i'), rdflib.term.URIRef('http://www.w3.org/ns/solid/terms#profileBackgroundColor'), rdflib.term.Literal('#ffffff'))
(rdflib.term.URIRef('http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01'), rdflib.term.URIRef('http://www.w3.org/2000/10/swap/pim/contact#participant'), rdflib.term.URIRef('https://www.w3.org/People/Berners-Lee/card#i'))
(rdflib.term.BNode('N9b604ad43d7145f18b6dc08d3b2dcf08'), rdflib.term.URIRef('http://www.w3.org/2000/10/swap/pim/contact#country'), rdflib.term.Literal('USA'))
(rdflib.term.URIRef('https://www.w3.org/People/Berners-Lee/card#i'), rdflib.term.URIRef('http://www.w3.org/2000/10/swap/pim/contact#office'), rdflib.term.BNode('N42fc8a4c1a84433bbc9871a4a6f7f21c'))
(rdflib.term.URIRef('https://www.w3.org/People/Berners-Lee/card#i'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/family_name'), rdflib.term.Literal('Berners-Lee'))
(rdflib.term.BNode('N9b604ad43d7145f18b6dc08d3b2dcf08'), rd

lub tak:

In [34]:
for s, p, o in g_tbl:
    print("s =", s, "\n\t p =", p, "\n\t\t o =", o)

s = https://www.w3.org/People/Berners-Lee/card#i 
	 p = http://www.w3.org/ns/solid/terms#profileBackgroundColor 
		 o = #ffffff
s = http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01 
	 p = http://www.w3.org/2000/10/swap/pim/contact#participant 
		 o = https://www.w3.org/People/Berners-Lee/card#i
s = N9b604ad43d7145f18b6dc08d3b2dcf08 
	 p = http://www.w3.org/2000/10/swap/pim/contact#country 
		 o = USA
s = https://www.w3.org/People/Berners-Lee/card#i 
	 p = http://www.w3.org/2000/10/swap/pim/contact#office 
		 o = N42fc8a4c1a84433bbc9871a4a6f7f21c
s = https://www.w3.org/People/Berners-Lee/card#i 
	 p = http://xmlns.com/foaf/0.1/family_name 
		 o = Berners-Lee
s = N9b604ad43d7145f18b6dc08d3b2dcf08 
	 p = http://www.w3.org/2000/10/swap/pim/contact#street 
		 o = 32 Vassar Street
s = https://www.w3.org/People/Berners-Lee/card#i 
	 p = http://www.w3.org/ns/solid/terms#editableProfile 
		 o = https://timbl.com/timbl/Public/friends.ttl
s = https://www.w3.org/People/Berners-L

To oczywiście nie jest jeszcze żadne filtrowanie. Ale jeśli dodasz `if`, to już będzie to jakiś filtr:

In [55]:
from rdflib import URIRef
from rdflib.namespace import RDFS

tbl = URIRef('https://www.w3.org/People/Berners-Lee/card#i')

for s, p, o in g_tbl:
    if s == tbl and p == RDFS.label:
        print('\"{}\" jest etykietą zasobu o identyfikatorze {}'.format(o,s))

"Tim Berners-Lee" jest etykietą zasobu o identyfikatorze https://www.w3.org/People/Berners-Lee/card#i


Instead of iterating through all triples, RDFLib graphs support basic triple pattern matching with a `triples()` function. This function is a generator of triples that match the pattern given by the arguments. The arguments of these are RDF terms that restrict the triples that are returned. Terms that are `None` are treated as a wildcard. For example:

In [59]:
from rdflib.namespace import RDF

for s, p, o in g_tbl.triples((tbl, RDF.type, None)):
    print("{} is a {}".format(s, o)) 

https://www.w3.org/People/Berners-Lee/card#i is a http://www.w3.org/2000/10/swap/pim/contact#Male
https://www.w3.org/People/Berners-Lee/card#i is a http://xmlns.com/foaf/0.1/Person


In [None]:
peoplegraph = Graph()
peoplegraph += g.triples((None, RDF.type, FOAF.Person))
print(len(peoplegraph))

bobgraph = Graph()
bob = URIRef("http://example.org/people/bob")
linda = URIRef("http://example.org/people/linda")
name = Literal("bob")
bobgraph.add((bob, RDF.type, FOAF.Person)) 
bobgraph.add((bob, FOAF.name, name))
bobgraph.add((bob, FOAF.knows, linda)) 
print(len(bobgraph))

newgraph = Graph()
newgraph = peoplegraph + bobgraph
print(len(newgraph))


If you are not interested in whole triples, you can get only the bits you want with the methods `objects()`, `subjects()`, `predicates()`, `predicate_objects()`, etc. Each take parameters for the components of the triple to constraint:

In [None]:
 for person in g.subjects(RDF.type, FOAF.Person): 
        print("{} is a person".format(person))

Finally, for some properties, only one value per resource makes sense (i.e they are functional properties, or have max-cardinality of 1). The `value()` method is useful for this, as it returns just a single node, not a generator:

In [None]:
bobgraph = Graph()
bob = URIRef("http://example.org/people/bob")
linda = URIRef("http://example.org/people/linda")
name = Literal("bob")
bobgraph.add((bob, RDF.type, FOAF.Person)) 
bobgraph.add((bob, FOAF.name, name))
bobgraph.add((bob, FOAF.knows, linda)) 

name = bobgraph.value(bob, FOAF.name) # get any name of bob
print(name)

# get the one person that knows bob and raise an exception if more are found 
mbox = bobgraph.value(predicate = FOAF.name, object=bob, any=False)
print(mbox)

### Graph methods for accessing triples 

Here is a list of all convenience methods for querying Graphs:

- `Graph.label(subject, default='')`

Query for the RDFS.label of the subject

Return default if no label exists or any label if multiple exist.

- `Graph.preferredLabel(subject, lang=None, default=None, labelProperties=rdflib.term.URIRef('http://www.w3.org/2004/02/skos/core#prefLabel'),
rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'))` 

Find the preferred label for subject.

By default prefers skos:prefLabels over rdfs:labels. In case at least one prefLabel is found returns those, else returns labels. In case a language string (e.g., “en”, “de” or even “” for no lang-tagged literals) is given, only such labels will be considered.

Return a list of (labelProp, label) pairs, where labelProp is either skos:prefLabel or rdfs:label.

In [None]:
from rdflib import ConjunctiveGraph, URIRef, RDFS, Literal
from rdflib.namespace import SKOS
from pprint import pprint
g = ConjunctiveGraph()
u = URIRef("http://example.com/foo")
g.add([u, RDFS.label, Literal("foo")])
g.add([u, RDFS.label, Literal("bar")])
pprint(sorted(g.preferredLabel(u)))

In [None]:
g.add([u, SKOS.prefLabel, Literal("bla")])
pprint(g.preferredLabel(u))

In [None]:
g.add([u, SKOS.hiddenLabel, Literal("blubb", lang="en")])
sorted(g.preferredLabel(u))

In [None]:
g.preferredLabel(u, lang="")

In [None]:
pprint(g.preferredLabel(u, lang="en"))

- `Graph.triples(triple)` 

Generator over the triple store

Returns triples that match the given triple pattern. If triple pattern does not provide a context, all contexts will be searched.

- `Graph.value(subject=None, predicate=rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#value'), object=None, default=None, any=True)`

Get a value for a pair of two criteria

Exactly one of subject, predicate, object must be None. Useful if one knows that there may only be one value.

It is one of those situations that occur a lot, hence this ‘macro’ like utility

Parameters: subject, predicate, object – exactly one must be None default – value to be returned if no values found any – if True, return any value in the case there is more than one, else, raise UniquenessError

- `Graph.subjects(predicate=None, object=None)`

A generator of subjects with the given predicate and object

- `Graph.objects(subject=None, predicate=None)`
A generator of objects with the given subject and predicate

- `Graph.predicates(subject=None, object=None)`

A generator of predicates with the given subject and object

- `Graph.subject_objects(predicate=None)`

A generator of (subject, object) tuples for the given predicate

- `Graph.subject_predicates(object=None)`

A generator of (subject, predicate) tuples for the given object

- `Graph.predicate_objects(subject=None)`

A generator of (predicate, object) tuples for the given subject


## Querying with SPARQL <a class="anchor" id="chapter1.5"></a>

#### Run a Query

The RDFLib comes with an implementation of the SPARQL 1.1 Query and SPARQL 1.1 Update languages. See: https://www.w3.org/TR/rdf-sparql-query/

Queries can be evaluated against a graph with the `rdflib.graph.Graph.query()` method, and updates with `rdflib.graph.Graph.update()`.

The query method returns a `rdflib.query.Result` instance. 

- For SELECT queries, iterating over this return `rdflib.query.ResultRow` instances, each containing a set of variable bindings. 

- For CONSTRUCT/DESCRIBE queries, iterating over the result object gives the triples. 

- For ASK queries, iterating will yield the single boolean answer, or evaluating the result object in a boolean-context (i.e. `bool(result)`)

```
PREFIX rdfs: <...>
PREFIX owl: <...>
PREFIX foaf: <...>

SELECT ?var1 ?var2 ...
FROM #graph
WHERE { 
    #query pattern with ?var1 and ?var2
}
```

In [None]:
import rdflib

g = rdflib.Graph()

# ... add some triples to g somehow ...
g.parse("http://www.w3.org/People/Berners-Lee/card")

In [None]:
print(g.serialize(format="turtle").decode("utf-8"))

In [None]:
print(g.value(rdflib.URIRef("https://www.w3.org/People/Berners-Lee/card#i"),rdflib.namespace.RDFS.label))

In [None]:
qres = g.query(
    """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX con: <http://www.w3.org/2000/10/swap/pim/contact#>
    SELECT DISTINCT ?aname ?astreet
        WHERE {
            ?a foaf:name ?aname ;
               con:office ?_office .
            ?_office con:address ?_address .
            ?_address con:street ?astreet .
        }""")

for row in qres:
    print("%s lives at %s." % row)

The results are tuples of values in the same order as your SELECT arguments. Alternatively, the values can be accessed by variable name, either as attributes, or as items: `row.b` and `row["b"]` is equivalent.

In [None]:
row["aname"]

In [None]:
row.aname

As an alternative to using `PREFIX` in the SPARQL query, namespace bindings can be passed in with the initNs kwarg, see Namespaces and Bindings.

Variables can also be pre-bound, using `initBindings` kwarg can be used to pass in a dict of initial bindings, this is particularly useful for prepared queries, as described below.

#### Query a Remote Service

The SERVICE keyword of SPARQL 1.1 can send a query to a remote SPARQL endpoint.

In [10]:
import rdflib

g = rdflib.Graph(store="SPARQLStore")
g.open(configuration="https://lei.info/sparql")

qres = g.query('''
               SELECT DISTINCT * 
               WHERE {?s ?p ?o .} 
               LIMIT 13
               ''') 

for row in qres: 
    print(row)

(rdflib.term.BNode('bE23695BBx47929984'), rdflib.term.URIRef('http://lei.info/voc/l1/registrationAuthority'), rdflib.term.URIRef('http://lei.info/95980032D9G16EDM0J06'))
(rdflib.term.BNode('bE23695BBx47929983'), rdflib.term.URIRef('http://lei.info/voc/l1/legalAddress'), rdflib.term.URIRef('http://lei.info/95980032D9G16EDM0J06'))
(rdflib.term.Literal('DP3Q', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.URIRef('http://lei.info/voc/l1/entityLegalFormCode'), rdflib.term.URIRef('http://lei.info/95980032D9G16EDM0J06'))
(rdflib.term.URIRef('http://lei.info/95980032D9G16EDM0J06#lei'), rdflib.term.URIRef('http://lei.info/voc/l1/identifiedBy'), rdflib.term.URIRef('http://lei.info/95980032D9G16EDM0J06'))
(rdflib.term.URIRef('graph://lei.info/627aefeaa947098c706c956b81aaaf30926a6cdafc489ddca36fad52222944c1'), rdflib.term.URIRef('http://lei.info/voc/l1/latestGraph'), rdflib.term.URIRef('graph://lei.info/95980032D9G16EDM0J06'))
(rdflib.term.URIRef('graph://lei

In [12]:
g = rdflib.Graph()
g.parse("https://lei.info/X9AJL60ON2ZGVBEMAJ31")
for s,p,o in g:
    print(s, p, o)

Nd5e088ff3c0e448088207db098da4549 http://lei.info/voc/l1/city Basel
http://lei.info/X9AJL60ON2ZGVBEMAJ31 http://lei.info/voc/l1/registrationAuthority N6daad966bdcf484d883ec17501abad52
http://lei.info/X9AJL60ON2ZGVBEMAJ31 http://lei.info/voc/l1/primaryValidationAuthority N3fc2fb113f284a65a8e9d4efddfcfff2
http://lei.info/X9AJL60ON2ZGVBEMAJ31 http://lei.info/voc/l1/headquartersAddress Nd5e088ff3c0e448088207db098da4549
http://lei.info/X9AJL60ON2ZGVBEMAJ31 http://lei.info/voc/l1/identifiedBy http://lei.info/X9AJL60ON2ZGVBEMAJ31#lei
Nd5e088ff3c0e448088207db098da4549 http://lei.info/voc/l1/postalCode 4052
N40bb97ca35e94ceba01991c077264fec http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://lei.info/voc/l1/Address
N3fc2fb113f284a65a8e9d4efddfcfff2 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://lei.info/voc/l1/ValidationAuthority
Nd5e088ff3c0e448088207db098da4549 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://lei.info/voc/l1/Address
N40bb97ca35e94ceba01991c077264fec http://le

#### Prepared Queries

RDFLib lets you prepare queries before execution, this saves re-parsing and translating the query into SPARQL Algebra each time.

The method `rdflib.plugins.sparql.prepareQuery()` takes a query as a string and will return a `rdflib.plugins.sparql.sparql.Query object`. This can then be passed to the `rdflib.graph.Graph.query()` method.
The `initBindings` kwarg can be used to pass in a dict of initial bindings:

In [None]:
from rdflib.namespace import FOAF
from rdflib.plugins.sparql import prepareQuery

q = prepareQuery('SELECT ?s WHERE { ?person foaf:name ?s .}', initNs = {"foaf": FOAF})

g = rdflib.Graph() 
g.parse("http://www.w3.org/People/Berners-Lee/card")

tim = rdflib.URIRef("https://www.w3.org/People/Berners-Lee/card#i")

for row in g.query(q, initBindings={'person': tim}): 
    print(row)

#### Custom Evaluation Functions

For experts, it is possible to override how bits of SPARQL algebra are evaluated. By using the setuptools entry-point `rdf.plugins.sparqleval`, or simply adding to an entry to `rdflib.plugins.sparql.CUSTOM_EVALS`, a custom function can be registered. The function will be called for each algebra component and may raise `NotImplementedError` to indicate that this part should be handled by the default implementation.

#### EXAMPLES

In [None]:
# sparqlstore_example.py

from rdflib import Graph, URIRef, Namespace
from rdflib.plugins.stores.sparqlstore import SPARQLStore

dbo = Namespace("http://dbpedia.org/ontology/")

In [3]:
# EXAMPLE 1: using a Graph with the Store type string set to "SPARQLStore"
graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

pop = graph.value(URIRef("http://dbpedia.org/resource/Berlin"), dbo.populationTotal)

print("According to DBPedia, Berlin has a population of {0:,}.".format(int(pop), ",d"))

NameError: name 'Graph' is not defined

In [None]:
# # EXAMPLE 2: using a SPARQLStore object directly
# st = SPARQLStore("http://dbpedia.org/sparql")

# for p in st.objects(URIRef("http://dbpedia.org/resource/Brisbane"), dbo.populationTotal):
#     print(
#         "According to DBPedia, Brisbane has a population of "
#         "{0:,}".format(int(pop), ",d")
#     )

In [None]:
# EXAMPLE 3: doing RDFlib triple navigation using SPARQLStore as a Graph()
from rdflib.namespace import RDF, SKOS

graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

# we are asking DBPedia for 3 skos:Concept instances
count = 0

for s in graph.subjects(predicate=RDF.type, object=SKOS.Concept):
    count += 1
    print(s)
    if count >= 3:
        break

In [None]:
# # EXAMPLE 4: using a SPARQL endpoint that requires Basic HTTP authentication
# # NOTE: this example won't run since the endpoint isn't live (or real)
# s = SPARQLStore(query_endpoint="http://fake-sparql-endpoint.com/repository/x", auth=("my_username", "my_password"))
# # do normal Graph things

In [None]:
# sparql_update_example.py

import rdflib

g = rdflib.Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")

print("Initially there are {} triples in the graph".format(len(g)))

g.update(
    """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX dbpedia: <http://dbpedia.org/resource/>
    INSERT
        { ?s a dbpedia:Human . }
    WHERE
        { ?s a foaf:Person . }
    """
)
print("After the UPDATE, there are {} triples in the graph".format(len(g)))

### Utilities and convenience functions <a class="anchor" id="chapter1.6"></a>

#### User-friendly labels

Use `label()` to quickly look up the RDFS label of something, or better use `preferredLabel()` to find a label using several different properties (i.e. either rdfs:label, skos:preferredLabel, dc:title, etc.).

In [None]:
import rdflib
from rdflib import Graph, Namespace, Literal
from rdflib.namespace import RDFS, SKOS, DC

ns = Namespace("http://kul.pl/ontology/")

g = Graph()
g.add((ns.bob, RDFS.label, Literal("Bob")))
g.add((ns.bob, SKOS.prefLabel, Literal("Bobinio")))
g.add((ns.bob, DC.title, Literal("Bob Title")))

In [None]:
print("-", g.label(ns.bob))
print("-", g.preferredLabel(ns.bob))

#### Functional properties
Use `value()` and `set()` to work with functional properties, i.e. properties than can only occur once for a resource.

In [None]:
print(g.value(ns.bob, RDFS.label))
print(g.value(ns.bob, DC.title))

In [None]:
g.set((ns.bob, DC.title, Literal("New Bob's Title")))

In [None]:
print(g.value(ns.bob, DC.title))

#### Slicing graphs

Python allows slicing arrays with a slice object, a triple of start, stop index and step-size:
```
>>> range(10)[2:9:3] 
[2, 5, 8]
```
 
RDFLib graphs override ``__getitem__`` and we pervert the slice triple to be a RDF triple instead. This lets slice syntax be a shortcut for `triples()`, `subject_predicates()`, `contains()`, and other Graph query-methods:

```
graph[:]
# same as
iter(graph)

graph[ns.bob]
# same as
graph.predicate_objects(ns.bob)

graph[ns.bob : FOAF.knows]
# same as
graph.objects(ns.bob, RDFS.label)

graph[ns.bob : FOAF.knows : ns.bill]
# same as
(ns.bob, FOAF.knows, ns.bill) in graph

graph[:FOAF.knows]
# same as 
graph.subject_objects(FOAF.knows)
```

In [None]:
for p, o in g[ns.bob]: # g[ns.bob] same as g.predicate_objects(ns.bob)
    print(p, o)

In [None]:
for o in g[ns.bob : RDFS.label]: # g[ns.bob : RDFS.label] same as g.objects(ns.bob, RDFS.label)
    print(o)

In [None]:
from rdflib.namespace import FOAF
g[ns.bob : FOAF.knows : ns.bill] # same as (ns.bob, FOAF.knows, ns.bill) in g

In [None]:
for s, p in g[:RDFS.label]: # same as g.subject_objects(RDFS.label)
    print(s, p)

In [None]:
from rdflib import Graph
from rdflib.namespace import RDF
graph = Graph()

g.load("https://ebiquity.umbc.edu/person/foaf/Francis/Ferraro/foaf.rdf", format="xml")

for person in graph[: RDF.type : FOAF.Person]:
    friends = list(graph[person : FOAF.knows * "+" / FOAF.name])
    if friends:
        print("%s's circle of friends:" % graph.value(person, FOAF.name))
        for name in friends:
            print(name)

#### Parsing data from a string

In [None]:
import rdflib
from rdflib import Graph

graph = Graph()
graph.parse(data = '<urn:a> <urn:p> <urn:b>.', format='n3')
for s, p, o in graph:
    print(s, p, o)

## Examples <a class="anchor" id="chapter1.7"></a>

### ConjunctiveGraph

## Containers & Collections
There are two convenience classes for RDF Containers & Collections which you can use instead of declaring each triple of a Containers or a Collections individually:
- `Container()` (also `Bag, Seq & Alt`) and
- `Collection()`

See their documentation for how.

## Navigating Graphs

An RDF Graph is a set of RDF triples, and we try to mirror exactly this in RDFLib. The Python Graph() tries to emulate a container type.

#### Graphs as Iterators
RDFLib graphs override `__iter__()` in order to support iteration over the contained triples:

In [None]:
someGraph = Graph()
for subject, predicate, object in someGraph:
    if not (subject, predicate, object) in someGraph:
        raise Exception("Iterator / Container Protocols are Broken!!")

#### Contains check
Graphs implement `__contains__()`, so you can check if a triple is in a graph with triple in graph syntax:

In [None]:
from rdflib import URIRef
from rdflib.namespace import RDF

graph = Graph()

bob = URIRef("http://example.org/people/bob")

graph.add((bob, RDF.type, FOAF.Person))

if (bob, RDF.type, FOAF.Person) in graph:
    print("This graph knows that Bob is a person!")

Note that this triple does not have to be completely bound:

In [None]:
if (bob, None, None) in graph:
    print("This graph contains triples about Bob!")

#### Set Operations on RDFLib Graphs

Graphs override several pythons operators: `__iadd__()`, `__isub__()`, etc. This supports addition, subtraction and other set-operations on Graphs:

<table>
<tr>
<th>operation</th>
<th>effect</th>
</tr>

<tr>
<td>G1 + G2</td>
<td>returns new graph with union</td>
</tr>

<tr>
<td>G1 += G1</td>
<td>union / addition</td>
</tr>
    
<tr>
<td>G1 - G2</td>
<td>returns new graph with difference</td>
</tr>

<tr>
<td>G1 -= G1</td>
<td>difference / subtraction</td>
</tr>
    
<tr>
<td>G1 & G2</td>
<td>intersection (triples in both graphs)</td>
</tr>

<tr>
<td>G1 ^ G1</td>
<td>xor (triples in either G1 or G2, but not in both)</td>
</tr>    
</table>

**Warning** Set-operations on graphs assume Blank Nodes are shared between graphs. This may or may not do what you want. See 2.4 Merging graphs for details.

In [None]:
"""
An RDFLib ConjunctiveGraph is an (unnamed) aggregation of all the named graphs
within a Store. The :meth:`~rdflib.graph.ConjunctiveGraph.get_context`
method can be used to get a particular named graph for use such as to add
triples to, or the default graph can be used

This example shows how to create named graphs and work with the
conjunction (union) of all the graphs.
"""

from rdflib import Namespace, Literal, URIRef
from rdflib.graph import Graph, ConjunctiveGraph
from rdflib.plugins.memory import IOMemory


ns = Namespace("http://love.com#")

mary = URIRef("http://love.com/lovers/mary")
john = URIRef("http://love.com/lovers/john")

cmary = URIRef("http://love.com/lovers/mary")
cjohn = URIRef("http://love.com/lovers/john")

store = IOMemory()

g = ConjunctiveGraph(store=store)

g.bind("love", ns)

# add a graph for Mary's facts to the Conjunctive Graph
gmary = Graph(store=store, identifier=cmary)

# Mary's graph only contains the URI of the person she love, not his cute name
gmary.add((mary, ns["hasName"], Literal("Mary")))
gmary.add((mary, ns["loves"], john))



# add a graph for John's facts to the Conjunctive Graph
gjohn = Graph(store=store, identifier=cjohn)

# John's graph contains his cute name
gjohn.add((john, ns["hasCuteName"], Literal("Johnny Boy")))


print("\n1. ===================\n")
# enumerate contexts
for c in g.contexts():
    print("-- %s " % c)

print("\n2. ===================\n")
# separate graphs
print(gjohn.serialize(format="n3").decode("utf-8"))
print("===================")
print(gmary.serialize(format="n3").decode("utf-8"))

print("\n3. ===================\n")
# full graph
print(g.serialize(format="n3").decode("utf-8"))

print("\n4. ===================\n")
# query the conjunction of all graphs
xx = None
for x in g[mary : ns.loves / ns.hasCuteName]:
    xx = x
print("Q: Who does Mary love?")
print("A: Mary loves {}".format(xx))

print("\n5. ===================\n")
# query the conjunction of all graphs
xx = None
for x in gmary[mary : ns.loves / ns.hasCuteName]:
    xx = x
print("Q: Who does Mary love?")
print("A: Mary loves {}".format(xx))

### Mapping between RDF data-typed literals and Python objects

In [None]:
"""
RDFLib can map between RDF data-typed literals and Python objects.

Mapping for integers, floats, dateTimes, etc. are already added, but
you can also add your own.

This example shows how :meth:`rdflib.term.bind` lets you register new
mappings between literal datatypes and Python objects
"""

from rdflib import Graph, Literal, Namespace, XSD
from rdflib.term import bind

# complex numbers are not registered by default
# no custom constructor/serializer needed since
# complex('(2+3j)') works fine
bind(XSD.complexNumber, complex)

ns = Namespace("urn:my:namespace:")

c = complex(2, 3)

l = Literal(c)

g = Graph()
g.add((ns.mysubject, ns.myprop, l))

n3 = g.serialize(format="n3")
print(n3)

# round-trip through n3 serialize/parse
g2 = Graph()
g2.parse(data=n3, format="n3")
    
l2 = list(g2)[0][2]

print(l2)

print(l2.value == c)  # back to a python complex object


### custom evaluation function

In [None]:
"""
This example shows how a custom evaluation function can be added to
handle certain SPARQL Algebra elements.

A custom function is added that adds ``rdfs:subClassOf`` "inference" when
asking for ``rdf:type`` triples.

Here the custom eval function is added manually, normally you would use
setuptools and entry_points to do it:
i.e. in your setup.py::

    entry_points = {
        'rdf.plugins.sparqleval': [
            'myfunc =     mypackage:MyFunction',
            ],
    }
"""

import rdflib

from rdflib.plugins.sparql.evaluate import evalBGP
from rdflib.namespace import FOAF

inferredSubClass = rdflib.RDFS.subClassOf * "*"  # any number of rdfs.subClassOf


def customEval(ctx, part):
    """
    Rewrite triple patterns to get super-classes
    """

    if part.name == "BGP":

        # rewrite triples
        triples = []
        for t in part.triples:
            if t[1] == rdflib.RDF.type:
                bnode = rdflib.BNode()
                triples.append((t[0], t[1], bnode))
                triples.append((bnode, inferredSubClass, t[2]))
            else:
                triples.append(t)

        # delegate to normal evalBGP
        return evalBGP(ctx, triples)

    raise NotImplementedError()



# add function directly, normally we would use setuptools and entry_points
rdflib.plugins.sparql.CUSTOM_EVALS["exampleEval"] = customEval

g = rdflib.Graph()
g.load("http://www.w3.org/People/Berners-Lee/card")

# Add the subClassStmt so that we can query for it!
g.add((FOAF.Person, rdflib.RDFS.subClassOf, FOAF.Agent))

# Find all FOAF Agents
for x in g.query("PREFIX foaf: <%s> SELECT * WHERE { ?s a foaf:Agent}" % FOAF):
    print(x)

### manage your movies review

https://imdbpy.readthedocs.io/en/latest/

In [None]:
"""
Requires download and import of Python imdb library from
https://imdbpy.github.io/ - (warning: installation
will trigger automatic installation of several other packages)
"""

# !pip install imdbpy

In [None]:
import datetime
import os
import sys
import re
import time

try:
    import imdb
except ImportError:
    imdb = None

from rdflib import BNode, ConjunctiveGraph, URIRef, Literal, Namespace, RDF
from rdflib.namespace import FOAF, DC

storefn = os.path.expanduser("movies.n3")
storeuri = storefn

IMDB = Namespace("http://imdb.com/")

class Store:
    def __init__(self):
        self.graph = ConjunctiveGraph()
        if os.path.exists(storefn):
            self.graph.load(storeuri, format="n3")
        self.graph.bind("dc", DC)
        self.graph.bind("foaf", FOAF)
        self.graph.bind("imdb", IMDB)

    def save(self):
        self.graph.serialize(storeuri, format="n3")

    def new_movie(self, movie):
        movieuri = URIRef("http://www.imdb.com/title/tt%s/" % movie.movieID)
        self.graph.add((movieuri, RDF.type, IMDB["Movie"]))
        self.graph.add((movieuri, DC["title"], Literal(movie["title"])))
        self.graph.add((movieuri, IMDB["year"], Literal(int(movie["year"]))))
        self.save()

    def movie_is_in(self, uri):
        return (URIRef(uri), RDF.type, IMDB["Movie"]) in self.graph


i = imdb.IMDb()
movie = i.get_movie("0133093")
print("%s (%s)" % (movie["title"].encode("utf-8"), movie["year"]))

for director in movie["director"]:
    print("directed by: %s" % director["name"].encode("utf-8"))

for writer in movie["writer"]:
    print("written by: %s" % writer["name"].encode("utf-8"))

s = Store()    
s.new_movie(movie)

In [None]:
movie['rating']

In [None]:
movie.keys()

In [None]:
person = i.get_person("0424060")
print(person['name'])
person['filmography']

In [None]:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLStore
import pandas as pd

graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

qres = graph.query('''
                    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                    PREFIX dbo: <http://dbpedia.org/ontology/>
                    PREFIX yago: <http://dbpedia.org/class/yago/>
                    SELECT ?s
                    WHERE {?s ?p 'Scarlett Johansson'@en .        
                            }
                    '''
                  )

for row in qres:
    print(row)

In [None]:
person.keys()

## ĆWICZENIA

#### ĆWICZENIE 1

Użyj kwerendy SPARQLowej, aby wydobyć z DBpedii następujące informacje o polskich poeatach:
- nazwa (imię i nazwisko)
- data urodzenia
- biogram (w języku polskim)

Jakich relacji i klas użyjesz?

Wynik kwerendy umieść w liście, którą zapisz następnie jako plik excele.

In [None]:
# Pani Alicja
from rdflib import Graph
import pandas as pd
graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

lista = graph.query('''
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?birthName ?birthDate ?abstract
WHERE {?s rdf:type <http://dbpedia.org/class/yago/WikicatPolishPoets> ;
                    dbo:birthName ?birthName ;
                    dbo:birthDate ?birthDate ;
                    dbo:abstract ?abstract }
''')

poets_list = []
for row in lista:
    poets_list.append([row.birthName, row.birthDate, row.abstract])

# poets = {
#     'birthName': [],
#     'birthDate': [],
#     'abstract': []
# }

# for poet in poets_list:
#     poets['birthName'].append(poet[0])
#     poets['birthDate'].append(poet[1])
#     poets['abstract'].append(poet[2])

 
df = pd.DataFrame(poets_list, columns = ['birthName', 'birthDate', 'abstract'])
df.to_excel('poets_alicja.xlsx', index=False, header=True)

In [None]:
# Pan Rafał

graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

temp = graph.query('''
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?birthName ?birthDate ?abstract
WHERE {?ppoet rdf:type <http://dbpedia.org/class/yago/WikicatPolishPoets> ;
                    dbo:birthName ?birthName ;
                    dbo:birthDate ?birthDate ;
                    dbo:abstract ?abstract}'''
                  )

print(temp)

poets_list = []
for row in temp:
    poets_list.append([row.birthName, row.birthDate, row.abstract])

df = pd.DataFrame(poets_list, columns = ['Name', 'Birth date', 'Abstract'])
df.to_excel('Lista_poetow_Rafal.xlsx', encoding='UTF-8')

In [None]:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLStore
import pandas as pd

graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

qres = graph.query('''
                    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                    PREFIX dbo: <http://dbpedia.org/ontology/>
                    PREFIX yago: <http://dbpedia.org/class/yago/>
                    SELECT ?birthName ?birthDate ?abstract
                    WHERE {?ppoet rdf:type yago:WikicatPolishPoets ;
                                    dbo:birthName ?birthName ; 
                                    dbo:birthDate ?birthDate ;
                                    dbo:abstract ?abstract }
                    '''
                  )

poets_list = []

for row in qres:
    if row.abstract.language == "pl":
        poets_list.append([row.birthName.value, 
                           str(row.birthDate.value), 
                           row.abstract.value])   
    
poets_list[0]
df = pd.DataFrame(poets_list)
df.to_excel("polish_poets_pl.xlsx", header=["Imię i nazwisko", "Data urodzenia", "Biogram"])  

In [None]:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLStore
import pandas as pd

graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

qres = graph.query('''
                    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                    PREFIX dbo: <http://dbpedia.org/ontology/>
                    PREFIX yago: <http://dbpedia.org/class/yago/>
                    SELECT ?birthName ?birthDate ?abstract
                    WHERE {?ppoet rdf:type yago:WikicatPolishPoets ;
                                    dbo:birthName ?birthName ; 
                                    dbo:birthDate ?birthDate ;
                                    dbo:abstract ?abstract .
                           FILTER(lang(?abstract) = 'pl')         
                            }
                    '''
                  )

poets_list = []

for row in qres:
    poets_list.append([row.birthName.value, 
                       str(row.birthDate.value), 
                       row.abstract.value])   
    
poets_list[0]
df = pd.DataFrame(poets_list)
df.to_excel("polish_poets2_pl.xlsx", header=["Imię i nazwisko", "Data urodzenia", "Biogram"])  

In [None]:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLStore
import pandas as pd

graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

qres = graph.query('''
                    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                    PREFIX dbo: <http://dbpedia.org/ontology/>
                    PREFIX yago: <http://dbpedia.org/class/yago/>
                    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
                    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
                    SELECT ?foafName ?birthDate ?abstract
                    WHERE {?ppoet rdf:type yago:WikicatPolishPoets .
                           OPTIONAL { ?ppoet foaf:name ?foafName . }          
                           OPTIONAL { ?ppoet dbo:birthDate ?birthDate . }
                           OPTIONAL { ?ppoet dbo:abstract ?abstract . }
                           FILTER (lang(?abstract) = 'pl')
                           FILTER (REGEX(STR(?birthDate),"[0-9]{4}-[0-9]{2}-[0-9]{2}")).
                            }
                    '''
                  )

poets_list = []

for row in qres:
    poets_list.append([str(row.foafName), 
                       str(row.birthDate), 
                       row.abstract.value])   
    
df = pd.DataFrame(poets_list)
df.to_excel("polish_poets4_date.xlsx", header=["Imię i nazwisko", "Data urodzenia", "Biogram"])  

In [None]:
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLStore
import pandas as pd

graph = Graph("SPARQLStore", identifier="http://dbpedia.org")
graph.open("http://dbpedia.org/sparql")

qres = graph.query('''
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?name ?birth ?description ?person 
WHERE {?person a dbo:Philosopher .
?person dbo:birthPlace dbr:Warsaw .
?person dbo:birthDate ?birth .
?person foaf:name ?name .
?person rdfs:comment ?description .
FILTER (LANG(?description) = 'pl') . 
} 
ORDER BY ?name
''')

for row in qres:
    print(row.name, "\n", " ", row.birth, "\n", "  ", row.person, "\n", "   ", row.description, "\n")

### Ćwiczenie 2

- Załaduj ontologię https://spec.edmcouncil.org/auto/ontology/master/latest/AboutAUTODev.rdf
- Stwórz listę wszystkich odnośników (URL) importowanych w powyższej ontologii (skorzystaj z `owl:imports`)
- Korzystając z listy importów, załaduj wszystkie importowane ontologie do jednego grafu
- Napisz 2 kwerendy SPARQLowe, które sprawdzą, czy zasoby tych ontologii mają `rdfs:label` i `skos:definition`; kweredy te muszą zawierać "WARN" podobnie do kwerendy https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1103.sparql

In [None]:
import rdflib
from rdflib import Graph
from rdflib.namespace import OWL

g = Graph()
g.load("https://spec.edmcouncil.org/auto/ontology/master/latest/AboutAUTODev.rdf")
print(len(g))

In [None]:
import_list = []
for ontology_import in g.objects(None, OWL.imports):
    import_list.append(ontology_import)
 
for ontology_import in import_list:
    print(ontology_import)
    
graph = Graph()
for ontology_import in import_list:
    graph.load(ontology_import)    

In [None]:
label_query = graph.query('''
    PREFIX owl:   <http://www.w3.org/2002/07/owl#>
    SELECT DISTINCT ?error
    WHERE {
      ?class a owl:Class .
      FILTER NOT EXISTS {?class rdfs:label ?label}
      FILTER (!isBlank(?class))
      BIND (concat ("WARN: Class ", str(?class), " has no label.") AS ?error)
    }
    ''')

print(len(label_query))

for row in label_query:
    print(row.error)

In [None]:
definition_query = graph.query('''
                PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
                PREFIX owl:   <http://www.w3.org/2002/07/owl#>

                SELECT DISTINCT ?error
                WHERE {
                  ?class a owl:Class .
                  FILTER NOT EXISTS {?class skos:definition ?definition}
                  FILTER (!isBlank(?class))
                  BIND (concat ("WARN: Class ", str(?class), " has no definition.") AS ?error)
                }
                ''')

print(len(definition_query))

for row in definition_query:
    print(row.error)