# Practice: Query Neo4J-POLE

Neo4j is a graph database. Data can be stored in Neo4j in the form of a graph, very very large graph.

To query the Neo4j database, we use Cypher. Cypher is designed to look like SQL. It is ease to learn Cypher. The key is to understand the reason for the existence of graph databases and the kind of queries made on graph databases.

We will work on the Example Project that comes with Neo4j Desktop version. The data in this database has the following structure:

POLE-50 Crime Investigation Knowledge Graph

The POLE-50 dataset represents a real-world crime investigation knowledge graph built on the POLE model — Person, Object, Location, and Event.
It is designed to capture relationships and patterns between people, places, things, and incidents that occur in a crime domain.

This dataset is stored in Neo4j, a native graph database that stores data as nodes (entities) and relationships (connections).
We query this graph using Cypher, Neo4j’s query language that is intuitive and similar to SQL but optimized for graph traversal and pattern discover

The following figure summarizes this. 

In [1]:
!pip install pandas



In [2]:
import matplotlib.pyplot as plt

In [3]:
from neo4j import GraphDatabase

In [4]:
!pip install py2neo



In [5]:
import pandas as pd

In [6]:
import py2neo

In [10]:
graph_db = Graph("bolt://localhost:7687", auth=("neo4j","Aayansh@12345"), name="pole-50")

We are now set. The modules are loaded. The database connection is established. We can query the database now. 

Let's start with finding the number of nodes and edges in the graph.

In [12]:
numNodes = """
MATCH (n)
RETURN count(n) as numNodes
"""
graph_db.run(numNodes)

numNodes
61521


In [13]:
numRels = """
MATCH ()-[r] -()
RETURN count(r) as numRels
"""
graph_db.run(numRels)

numRels
211680


There are 500 relationships.

Let's now increase granularity of our queries. There are two labels in the graph. Let's query how many nodes belong to each of these labels.

In [63]:
numPersons = """
MATCH (p:Person)
RETURN count(p) as numPersons
"""
graph_db.run(numPersons)

numPersons
369


So there are 131 Person nodes and 38 Movie nodes in the 169 total nodes.

In [65]:
numCrimes = """
MATCH (c:Crime)
RETURN count(c) AS numCrimes
"""
graph_db.run(numCrimes)

numCrimes
28762


We can do a similar exercise for relations as well.

In [66]:
numLocations = """
MATCH (l:Location)
RETURN count(l) AS numLocations
"""
graph_db.run(numLocations)

numLocations
14904


In [14]:
numRelationships = """
MATCH ()-[r]->()
RETURN count(r) AS numRelationships
"""
graph_db.run(numRelationships)

numRelationships
105840


In [15]:
topCrimes = """
MATCH (c)
WHERE c.category IS NOT NULL OR c.type IS NOT NULL
RETURN coalesce(c.category, c.type) AS crimeType, count(*) AS total
ORDER BY total DESC
LIMIT 5
"""
graph_db.run(topCrimes)

crimeType,total
Violence and sexual offences,8765
Public order,4839
Criminal damage and arson,3587


In [19]:
fivePeople = """
MATCH (people:Person)
RETURN people
LIMIT 5
"""
graph_db.run(fivePeople)

people
"(_0:Person {name: 'Todd', nhs_no: '117-66-8129', surname: 'Hamilton'})"
"(_2:Person {name: 'Benjamin', nhs_no: '991-70-5333', surname: 'Hamilton'})"
"(_5:Person {name: 'Nancy', nhs_no: '620-83-1546', surname: 'Campbell'})"


In [20]:
fivePeopleCursor = graph_db.run(fivePeople)
dfFivePeople = fivePeopleCursor.to_data_frame()
dfFivePeople

Unnamed: 0,people
0,"{'nhs_no': '117-66-8129', 'surname': 'Hamilton..."
1,"{'nhs_no': '991-70-5333', 'surname': 'Hamilton..."
2,"{'nhs_no': '620-83-1546', 'surname': 'Campbell..."
3,"{'nhs_no': '595-90-8809', 'surname': 'Garcia',..."
4,"{'nhs_no': '556-65-1110', 'surname': 'Turner',..."


In [21]:
type(fivePeopleCursor)

py2neo.cypher.Cursor

In [22]:
fivePeopleCursor.to_table()

This is a more elegant solution compared to what I showed your seniors. Given the paucity of time, I did not explore enough and I jumped into my comfort zone - Pandas.

In [23]:
import pandas as pd

In [24]:
pd.DataFrame(graph_db.run(fivePeople))

Unnamed: 0,0
0,"{'nhs_no': '117-66-8129', 'surname': 'Hamilton..."
1,"{'nhs_no': '991-70-5333', 'surname': 'Hamilton..."
2,"{'nhs_no': '620-83-1546', 'surname': 'Campbell..."
3,"{'nhs_no': '595-90-8809', 'surname': 'Garcia',..."
4,"{'nhs_no': '556-65-1110', 'surname': 'Turner',..."


Ouch!!! No Forrest Gump there. Obviously an incomplete dataset. Hollywood would have made 38 movies in one year. The list of movies here are clearly not exhaustive. Anyway we are just practising. Let's keep practising.

Notice that we used a function `type()` in the query. There are a lot of functions you can use to add details to your queries.

Notice that Hanks directed a movie. Let's see if he was associated with movies in non-acting roles. 

Talking about directors let's get back to Keanu Reeves and see who directed The Matrix.

So far we have done basic queries. We had specific questions to ask. The queries involved specific questions about nodes and their relationships. 

Now lets introduce the idea of __paths__. You can query a travesal beyond a single hop.

You can examine this and see that all the details of the people and the movies are captured. There are 59 colleages in total which whom these relationships exist.