# Neo4J

* [neo4j Homepage](https://neo4j.com/)
* [py2neo](https://py2neo.org/2021.1/#)
* [Docker Image](https://hub.docker.com/_/neo4j/)
* [neo4j Cheatsheet](https://neo4j.com/docs/cypher-refcard/current/)
* [rq wiki](https://rq.bitplan.com/index.php/Neo4j)

## Docker Setup
1) Download docker image `$docker pull neo4j`
2) Start the docker container `$docker run --publish=7474:7474 --publish=7687:7687 --volume=$HOME/neo4j/data:/data --env=NEO4J_AUTH=none neo4j`
 
Afterwards the neo4j server should be running and avaliable at [localhost:7474/](http://127.0.0.1:7474/browser/).
The graph data is stored at `$HOME/neo4j/data` and can be changed by adjusting the cmdline parameters.

## neo4j UI
After starting the docker you can checkout the [neo4j Browser UI](http://127.0.0.1:7474/browser/). The Web Interface provides an easy access to the database to explore the data and write queries. Furthermore, it provides example data that can be loaded into the graph such as the Movie graph that is shown below in the picture.

<img src="./assets/img/neo4j_browser_ui_example.png" alt="neo4j browser UI" width=800 height=800 />

## Install python modules
To interact with the neo4j database from python you can use the library [py2neo](https://py2neo.org/2021.1/#).

In [None]:
!pip install py2neo

In [None]:
from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("", "")) # If the docker was started with enabled authentication you need to provide the corresponding credentials. Otherwise, leave it as is

# First Steps
Queries in neo4j are written in [Cypher](https://neo4j.com/docs/cypher-manual/current/). More details on the query language and syntax can be found at:
* [py2neo](https://py2neo.org/2021.1/#)
* [neo4j Cheatsheet](https://neo4j.com/docs/cypher-refcard/current/)
* [Cypher Manual](https://neo4j.com/docs/cypher-manual/current/)

## Creating a new Node

To create a new node we use the `CREATE` statement. For example we want to add a person node with the name `Alice` and the birtday `2022-01-01`.
There are different ways to write the query to add this information.

* CREATE node and than SET the properties:
```cypher
CREATE (p:Person)
SET p.name='Alice',
    p.birthday=date('2022-01-01')
```
* CREATE node and properties
```cypher
CREATE (p:Person {name:'Alice', birthday:date('2022-01-01')})
```
* CREATE node and assign map of properties
```cypher
CREATE (p:Person)
SET p = {name:'Alice', birthday:date('2022-01-01')}
```
> Note: With `p += {name:'Alice', birthday:date('2022-01-01')}` we can merge existing node properties with the given map. As it will be later used in a parameterized query. 


> In `p:Person` the `p` is a query variable (similar to ?p in SPARQL) for the node that can be used within the query to match or add information to the node. `Person` is the label of the node it can be considered as a class the node belongs to. Furthermore, multiple labels can be assigned to a node e.g `(p:Person:Female {name:'Alice'})`

In [None]:
query=""
CREATE (p:Person)
SET p.name='Alice',
    p.birthday=date('2022-01-01')

"""
graph.run(query).stats()

Instead of the statistics, the query can also return information form the variables in the query.

By adding `RETURN properties(p) AS properties, labels(p) AS labels` we get as query response the properties and labels of p.
* [properties(p)](https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-properties): resolves p to the properties of p
* [labels(p)](https://neo4j.com/docs/cypher-manual/current/functions/list/#functions-labels): resolves p to the labels of p

In [None]:
query="""
MATCH (p:Person)
RETURN properties(p) AS properties, labels(p) AS labels
"""
graph.run(query)#.data() #to get a LoD

## Adding a Relationship
Now we want to add the information that `Bob` is the `father` of `Alice`.
> Note: Here the `FATHER` relation is bottom-up from the child to its father

* Match the existing node of Alice and than create the relation
```cypher
MATCH (c:Person {name:'Alice'})
CREATE (c)-[r:FATHER]->(f:Person {name:'Bob'})
RETURN *
```
* Creating the new relation with the MERGE clause
```cypher
MERGE (c:Person {name:'Alice'})-[r:FATHER]->(f:Person {name:'Bob'})
RETURN *
```
The MERGE clause tries to match an existing node and creates a new one if no existing node matches the defined pattern.


In [None]:
query="""
MATCH (c:Person {name:'Alice'})
CREATE (c)-[r:FATHER]->(f:Person {name:'Bob'})
RETURN *
"""
graph.run(query)

## Query Over Node Relationship
Write a query that returns the names of all fathers and their childs.

In [None]:
# Execute this cell to add additional person data and relationships
query="""
MERGE (e:Person {name:'Eve'})-[:FATHER]->(b:Person {name:'Bob'})
MERGE (d:Person {name:'Daniel'})-[:FATHER]->(es:Person {name:'Eskil'})
"""
graph.run(query).stats()

In [None]:
# Possible Solution
query="""
MATCH (c:Person)-[:FATHER]->(f:Person)
return f.name, c.name
"""
graph.run(query)

> To aggregate the childs of one father we can use the function [collect()](https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#functions-collect)

In [None]:
# Possible Solution
query="""
MATCH (c:Person)-[:FATHER]->(f:Person)
return f.name, collect(c.name)
"""
graph.run(query)

# Systematically adding data to the graph
Consider the following excerpt from the RESTful query [https://conferencecorpus.bitplan.com/eventseries/ICEIS](https://conferencecorpus.bitplan.com/eventseries/ICEIS)
```json
{
    "acronym": "ICEIS",
    "city": "Heraklion",
    "cityWikidataid": "Q160544",
    "country": "Greece",
    "countryIso": "GR",
    "countryWikidataid": "Q41",
    "dblpSeriesId": "conf/iceis",
    "endDate": "2019-05-05",
    "eventId": "iceis2019",
    "location": "Heraklion, Greece",
    "ranks": "",
    "region": "Crete Region",
    "regionIso": "GR-M",
    "regionWikidataid": "Q1267522",
    "seriesId": "iceis",
    "seriesTitle": "International Conference on Enterprise Information Systems (ICEIS)",
    "source": "confref",
    "startDate": "2019-05-03",
    "submissionExtended": 0,
    "title": "International Conference on Enterprise Information Systems (ICEIS)",
    "url": "http://portal.confref.org/list/iceis2019",
    "year": 2019
}
```
In this result we see that it contains data from diffferent entities such as data about the event and data about different locations and it also contains relationships to other entites such as the reference to dblp or the series in the same data source.

Adding the entities and their relations can be done with different strategies.
### Create main entity, relations and related entities in one query
One can create multiple entities and relations in one query with the MERGE clause but assigning the values to the different entites under certain conditions becomes complicated to maintain.
```Cypher
MERGE (e:Event:ConfRef {acronym:"ICEIS", ordinal:21, year:2019})
MERGE (c:City {wikidataid:"Q1267522"})
MERGE (s:EventSeries:ConfRef {acronym:"iceis"})
MERGE (sd:EventSeries:DBLP {id:"conf/iceis"})
MERGE (e)-[:city]->(c)
MERGE (e)-[:inEventSeries]->(s)
MERGE (s)-[sa:SAME_AS]-(sd)
SET e.title="International Conference on Enterprise Information Systems (ICEIS)",
    e.url="http://portal.confref.org/list/iceis2019",
    s.startDate=date("2019-05-03"),
    sd.acronym="iceis",
    c += {name:"Heraklion"},
    sa.definedBy = "conf/iceis"
```

> Note: `MERGE (e:Event:ConfRef {acronym:"ICEIS", ordinal:21, year:2019})-[:city]->(c:City {wikidataid:"Q1267522"})` is not equal to `MERGE (e:Event:ConfRef {acronym:"ICEIS", ordinal:21, year:2019})MERGE (e)-[:city]->(c)`. The MERGE clause tries to match its complete clause and if it does not find a match new nodes are created. Here this means if the relation between event and city does not exist we also create new node for city and event even if they already exist. So either create seperate MERGE clause for each entity or use the MATCH clause for the entites that shoul not be created.



In [None]:
query="""
MERGE (e:Event {acronym:"ICEIS", ordinal:21, year:2019})
MERGE (c:City {wikidataid:"Q1267522"})
MERGE (s:EventSeries:ConfRef {acronym:"iceis"})
MERGE (sd:EventSeries:DBLP {id:"conf/iceis"})
MERGE (e)-[:city]->(c)
MERGE (e)-[:inEventSeries]->(s)
MERGE (s)-[sa:SAME_AS]-(sd)
SET e.title="International Conference on Enterprise Information Systems (ICEIS)",
    e.url="http://portal.confref.org/list/iceis2019",
    s.startDate=date("2019-05-03"),
    sd.acronym="iceis",
    c += {name:"Heraklion"},
    sa.definedBy = "conf/iceis"
"""
graph.run(query)

### Decoupled Creation of Entities and Relations
Instead of creating the relationships when inserting the data we can also dump the data first and then use another query to add the resulting relation ships.
With this decoupeling of insertion and relationship assignment we can write independent queries 

#### Insert Query

```
MERGE (e:Event:ConfRef {name:$event.name})
ON CREATE SET e = $event
ON MATCH SET e += $event
```

#### Adding Relationships

##### Location Relationship
```cypher
MATCH (e:Event:ConfRef)
WHERE e.cityWikidataid IS NOT NULL
MERGE (c:City {wikidataid:e.cityWikidataid})
MERGE (e)-[:city]->(c)
```
> Note: With `WHERE e.cityWikidataid IS NOT NULL` we ensure that we only create the city/city relation for those events that have the cityWikidataid property.

#####  Series Relationship
```cypher
MATCH (e:Event:ConfRef)
WHERE e.seriesId IS NOT NULL
MERGE (s:EventSeries:ConfRef {acronym:e.seriesId})
MERGE (e)-[:inEventSeries]->(s)
```
##### Equivalent Series in Different Datasource Relationship
```cypher
MATCH (e:Event:ConfRef)
WHERE e.dblpSeriesId IS NOT NULL
MATCH (e)-[:inEventSeries]->(s:EventSeries:ConfRef)
MERGE (sd:EventSeries:DBLP {id:e.dblpSeriesId})
MERGE (s)-[r:SAME_AS]-(sd)
ON CREATE SET r.definedBy = [e.eventId]
ON MATCH SET r.definedBy = r.definedBy + e.eventId
```

> Note: with `r.definedBy =` we assign a property to the relationship between `s` and `d` this can such as in this example be used to store the provenance of the relation.

In [None]:
params={
    "event":{
        "acronym": "ICEIS",
        "city": "Heraklion",
        "cityWikidataid": "Q160544",
        "country": "Greece",
        "countryIso": "GR",
        "countryWikidataid": "Q41",
        "dblpSeriesId": "conf/iceis",
        "endDate": "2019-05-05",
        "eventId": "iceis2019",
        "location": "Heraklion, Greece",
        "ranks": "",
        "seriesId": "iceis",
        "source": "confref",
        "startDate": "2019-05-03",
        "submissionExtended": 0,
        "title": "International Conference on Enterprise Information Systems (ICEIS)",
        "url": "http://portal.confref.org/list/iceis2019",
        "year": 2019
    }
}
insertQuery="""
MERGE (e:Event:ConfRef {acronym:$event.acronym})
ON CREATE SET e = $event
ON MATCH SET e += $event
"""
graph.run(insertQuery, parameters=params)

In [None]:
seriesRelQuery="""
MATCH (e:Event:ConfRef)
WHERE e.seriesId IS NOT NULL
MERGE (s:EventSeries:ConfRef {acronym:e.seriesId})
MERGE (e)-[:inEventSeries]->(s)
"""
graph.run(seriesRelQuery)

In [None]:
locationRelQuery="""
MATCH (e:Event:ConfRef)
WHERE e.cityWikidataid IS NOT NULL
MERGE (c:City {wikidataid:e.cityWikidataid})
MERGE (e)-[:city]->(c)
"""
graph.run(seriesRelQuery)

In [None]:
seriesSameAsRelQuery="""
MATCH (e:Event:ConfRef)
WHERE e.dblpSeriesId IS NOT NULL
MATCH (e)-[:inEventSeries]->(s:EventSeries:ConfRef)
MERGE (sd:EventSeries:DBLP {id:e.dblpSeriesId})
MERGE (s)-[r:SAME_AS]-(sd)
ON CREATE SET r.definedBy = [e.eventId]
ON MATCH SET r.definedBy = r.definedBy + e.eventId
"""
graph.run(seriesSameAsRelQuery)