In [1]:
# Define Neo4j connections
import pandas as pd
from neo4j import GraphDatabase
host = 'neo4j://localhost:7687'
user = 'neo4j'
password = 'letmein'
driver = GraphDatabase.driver(host,auth=(user, password))

In [2]:
def run_query(query):
    with driver.session() as session:
        result = session.run(query)
        return pd.DataFrame([r.values() for r in result], columns=result.keys())

## Agenda
- Install Neosemantics library
- Graph model
- Construct WikiData SPARQL query
- Import RDF Graph
- Reverse Geocode with OSM API
- Verify Data

### Install Neosemantics library
In this blog series, we will use the standard APOC and GDS libraries, which we can install with a single click in the Neo4j Desktop application. On top of that, we will add the Neosemantics library to our stack. It is used to interact with RDF data in the Neo4j environment. We can either import RDF data to Neo4j or export property graph model in RDF format.
To install the Neosemantics library, we download the latest release and save it to the Neo4j plugins folder. We also need to add the following line to the Neo4j configuration file.

<code>dbms.unmanaged_extension_classes=n10s.endpoint=/rdf</code>

We are now ready to start our Neo4j instance. First, we need to initiate the Neosemantics configuration with the following cypher procedure.

<code>CALL n10s.graphconfig.init({handleVocabUris: "IGNORE"})</code>

Take a look at the documentation for information about the configuration options.

Graph model
Monuments are in the center of our graph. We have their name and the URL of the image stored as a node property. The monuments have been influenced by various architectural styles, which we indicate as a relationship to an Architecture node. We will save the city and the state of the monument as a two-level hierarchical location tree.
Graph model created with draw.ioThe Neosemantics library requires a unique constraint on the property "uri" of the nodes labeled Resource. We will also add indexes for the State and City nodes. The <code>apoc.schema.assert</code> procedure allows us to define many indexes and unique constraints with a single call.

In [3]:
n10s_config_query = """

CALL n10s.graphconfig.init({handleVocabUris: "IGNORE"})
"""
run_query(n10s_config_query)

Unnamed: 0,param,value
0,handleVocabUris,IGNORE
1,handleMultival,OVERWRITE
2,handleRDFTypes,LABELS
3,keepLangTag,False
4,multivalPropList,
5,keepCustomDataTypes,False
6,customDataTypePropList,
7,applyNeo4jNaming,False
8,classLabel,Class
9,subClassOfRel,SCO


In [4]:
schema_assert_query = """

CALL apoc.schema.assert(
  {State:['id'], City:['id']},
  {Resource:['uri']})

"""

run_query(schema_assert_query)

Unnamed: 0,label,key,keys,unique,action
0,State,id,[id],False,DROPPED
1,City,id,[id],False,DROPPED
2,State,id,[id],False,CREATED
3,City,id,[id],False,CREATED
4,Resource,uri,[uri],True,KEPT


### Construct WikiData SPARQL query

For me, the easiest way to construct a new SPARQL query is using the WikiData query editor. It has a lovely autocomplete feature. It also helps with query debugging.We want to retrieve all the instances of monuments located in Spain. I have found the easiest way to find various entities on WikiData is by simply using Google search. You can then inspect all the available properties of the entity on the website.
The first two lines in the WHERE clause define the entities we are looking for:
<pre>
// Entity is an instance of monument entity
?item wdt:P31 wd:Q4989906 . 
// Entity is located in Spain
?item wdt:P17 wd:Q29 .
</pre>
Next, we also determine which properties of the entities we are interested in. In our case, we would like to retrieve the monument's name, image, location, and architectural style. If we run this query in the query editor, we get the following results.

Now we can go ahead and import the graph to Neo4j.

In [5]:
import_query = """

WITH 'PREFIX sch: <http://schema.org/> 
CONSTRUCT{ ?item a sch:Monument; 
            sch:name ?monumentName; 
            sch:location ?location; 
            sch:img ?imageAsStr; 
            sch:ARCHITECTURE ?architecture. 
           ?architecture a sch:Architecture; 
            sch:name ?architectureName } 
WHERE { ?item wdt:P31 wd:Q4989906 . 
        ?item wdt:P17 wd:Q29 . 
        ?item rdfs:label ?monumentName . 
         filter(lang(?monumentName) = "en") 
        ?item wdt:P625 ?location . 
        ?item wdt:P149 ?architecture . 
        ?architecture rdfs:label ?architectureName .
         filter(lang(?architectureName) = "en") 
        ?item wdt:P18 ?image . 
         bind(str(?image) as ?imageAsStr) }' AS sparql 
CALL n10s.rdf.import.fetch(
  "https://query.wikidata.org/sparql?query=" +   
   apoc.text.urlencode(sparql),"JSON-LD", 
   { headerParams: { Accept: "application/ld+json"} , 
     handleVocabUris: "IGNORE"}) 
YIELD terminationStatus, triplesLoaded
RETURN terminationStatus, triplesLoaded

"""

run_query(import_query)

Unnamed: 0,terminationStatus,triplesLoaded
0,OK,7542


Let's start with some exploratory graph queries. We will first count the number of monuments in our graph.

In [6]:
monument_count_query = """

MATCH (n:Monument) 
RETURN count(*)

"""

run_query(monument_count_query)

Unnamed: 0,count(*)
0,1401


We have imported 1401 monuments into our graph. We will continue with counting the number of monuments grouped by an architectural style.

In [7]:
monument_count_style_query = """

MATCH (n:Architecture) 
RETURN n.name as monument, 
       size((n)<--()) as count 
ORDER BY count DESC 
LIMIT 5

"""

run_query(monument_count_style_query)

Unnamed: 0,monument,count
0,Gothic architecture,243
1,Romanesque architecture,223
2,baroque architecture,178
3,vernacular architecture,158
4,Renaissance architecture,127


In [8]:
architecture_subclass_import = """

MATCH (a:Architecture) 
WITH ' PREFIX sch: <http://schema.org/> 
CONSTRUCT { ?item a sch:Architecture; 
             sch:SUBCLASS_OF ?style. 
            ?style a sch:Architecture; 
             sch:name ?styleName;} 
WHERE { filter (?item = <' + a.uri + '>) 
        ?item wdt:P279 ?style . 
        ?style rdfs:label ?styleName 
         filter(lang(?styleName) = "en") } ' AS sparql 
CALL n10s.rdf.import.fetch(
  "https://query.wikidata.org/sparql?query=" + 
    apoc.text.urlencode(sparql),"JSON-LD", 
  { headerParams: { Accept: "application/ld+json"} , 
    handleVocabUris: "IGNORE"}) 
YIELD terminationStatus, triplesLoaded 
RETURN terminationStatus, triplesLoaded

"""

run_query(architecture_subclass_import)

Unnamed: 0,terminationStatus,triplesLoaded
0,OK,4
1,OK,0
2,OK,4
3,OK,0
4,OK,4
...,...,...
91,OK,7
92,OK,4
93,OK,0
94,OK,7


In [9]:
architecture_hierarchy_query = """

MATCH (a:Architecture)-[:SUBCLASS_OF]->(b:Architecture)
RETURN a.name as child_architecture,
       b.name as parent_architecture
LIMIT 5
"""

run_query(architecture_hierarchy_query)

Unnamed: 0,child_architecture,parent_architecture
0,pre-Romanesque art,medieval art
1,Art Nouveau,decorative arts
2,Victorian architecture,Gothic Revival architecture
3,Gothic art,medieval art
4,modernism,Art Nouveau


It seems that modernism is a child category of Art Noveau, and Art Noveau is a child category of decorative arts.
### Spatial enrichment
At first, I wanted to include the municipality information of monuments available on WikiData, but as it turned out, this information is relatively sparse. No worries though, I later realized we could use the reverse geocode API to retrieve this information. APOC has a dedicated procedure available for reverse geocoding. By default, it uses Open Street Map API, but we can customize it to work with other providers as well. Check the documentation for more information.
First, we have to transform the location information to a spatial point data type.

In [10]:
transform_location_point_query = """

MATCH (m:Monument) 
WITH m, 
   split(substring(m.location, 6, size(m.location) - 7)," ") as point 
SET m.location_point = point(
  {latitude: toFloat(point[1]), 
   longitude: toFloat(point[0])})

"""

run_query(transform_location_point_query)

Check a sample response from the OSM reverse GeoCode API.

In [11]:
example_osm_response = """

MATCH (m:Monument)
WITH m LIMIT 1
CALL apoc.spatial.reverseGeocode(
  m.location_point.latitude,
  m.location_point.longitude)
YIELD data
RETURN data

"""

run_query(example_osm_response)

Unnamed: 0,data
0,"{'country': 'España', 'country_code': 'es', 'i..."


Open Street Map API is a tad interesting as it differs between cities, towns, and villages. Also, the monuments located in the Canaries have no state available but are a part of the Canaries archipelago. We will treat archipelago as a state and lump city, town, and village under a single label City. For batching purposes, we will use the <code>apoc.periodic.iterate</code> procedure.

In [12]:
import_spatial_query = """

LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/tomasonjo/blogs/master/Traveling_tourist/traveling_tourist_cities.csv" as row
MATCH (m:Monument{uri:row.uri})
MERGE (c:City{id:row.city})
MERGE (s:State{id:row.state})
MERGE (m)-[:IS_IN]->(c)
MERGE (c)-[:IS_IN]->(s);

"""

run_query(import_spatial_query)

In [13]:
location_exists_check = """

MATCH (m:Monument) 
RETURN exists ((m)-[:IS_IN]->()) as has_location, 
       count(*) as count

"""

run_query(location_exists_check)

Unnamed: 0,has_location,count
0,True,1395
1,False,6


In [14]:
catalunya_monument_style = """

MATCH (s:State{id:'Catalunya'})<-[:IS_IN*2..2]-(:Monument)-[:ARCHITECTURE]->(architecture)
RETURN architecture.name as architecture,
       count(*) as count
ORDER BY count DESC
LIMIT 5

"""

run_query(catalunya_monument_style)

Unnamed: 0,architecture,count
0,vernacular architecture,156
1,Romanesque architecture,117
2,Gothic architecture,66
3,Art Nouveau,31
4,modernism,21


In [15]:
states_monument_style = """

MATCH (s:State)
CALL {
  WITH s
  MATCH (s)<-[:IS_IN*2..2]-()-[:ARCHITECTURE]->(a)
  RETURN a.name as architecture, count(*) as count
  ORDER BY count DESC LIMIT 1
}
RETURN s.id as state, architecture, count
ORDER BY count DESC 
LIMIT 5

"""

run_query(states_monument_style)

Unnamed: 0,state,architecture,count
0,Catalunya,vernacular architecture,156
1,Euskadi,baroque architecture,59
2,Castilla y León,Romanesque architecture,48
3,Andalucía,Gothic architecture,32
4,Comunitat Valenciana,medieval architecture,27
