# Why Graph Databases?

In a world where connectivity between people dominates the technology landscape, having a database that accurately represents relationship data from user to user is essential. Without it, the speed at which Facebook an LinkedIn allow you to add and go through mutual friends, browse through user profiles, and view a curated list of posts would not be possible. Almost all Relational Databases (SQLite, MongoDB, Oracle) require the computation of relationships between data, graphDB's store them between nodes, thus 

NOTE: Please install this edition of Neo4j from this link: https://neo4j.com/download/community-edition/

When you download it, open the application and star the server. It will prompt you to go to a localhost link where you can set your server password. Default password is 'neo4j'.

In this tutorial, we cover one of the most common graph databses, Neo4j, and examine its query language, Cypher, which will lead us to a fundamental understanding of a technology that more and more massive companies are beginning to utilize..

What we will learn in this tutorial and when:
1. What is Cypher and how do we use it?
2. Basic Cypher queries and Not-So-Basic Cypher queries with Py2Neo
3. Theories and Concepts in OGM and py2Neo
4. Using jgraph to 3D model networks/graphs

We'll be going through many examples, which I've found through personal experience to be the easiest way to learn Cypher and understand GDB's.

We can install py2neo with pip with the command below.

In [67]:
!pip install py2neo



# Understanding Cypher and the Property Graph Model

Cypher is the query language for Neo4j, the graphDB we wll be using in this tutorial. It uses ASCII-art to make queries easily readable yet fully functional.

There are 2 main blocks of the property graph model used by Neo4j, Node and Relationship. A Node is somewhat self-explanatory: it is the primary unit of data storage within a graph and contains key-value pairs. The key is used to identify the node (e.g. In the IMDB dataset, the show title "Game of Throne" or the reviewer's User ID may be the key). The values represent multiple attributes of the object the key is describing (e.g. In the same dataset, the movie's directors/lead actors and its star rating or the reviewer's name, location, and email address could be values). A relationship is a typed, direct connection/edge between a pair of nodes. Like nodes, relationships may also contain a set of properties.

The code below shows how to create a couple of nodes and a relationship joining them. Each node has a single property, name, and is labelled as a Person. The relationship ab describes a KNOWS connection from the first node a to the second node b.

In [68]:
from py2neo import Node,Relationship,Graph,authenticate

authenticate("localhost:7474", "neo4j", "ingopoobla3")
#When you download neo4j, your username will always be neo4j
#and your password you can set by accessing the Neo4j site on
#your localhost
sgraph = Graph("http://localhost:7474/db/data/")
import pandas
a = Node("Person", name="Lord_Varys")
b = Node("Person", name="Margaery_Tyrell")
ab = Relationship(a, "KNOWS", b)
print(ab)

(lord_varys)-[:KNOWS]->(margaery_tyrell)


Cypher uses ASCII-Art to represent patterns. We surround nodes with parentheses which look like circles, e.g. (node). If we later want to refer to the node, we’ll give it an variable like (p) for person or (t) for thing. In real-world queries, we’ll probably use longer, more expressive variable names like (person) or (thing). If the node is not relevant to your question, you can also use empty parentheses ()

In [69]:
c = Node("Person", name="Cersei_Lannister")
class WorksFor(Relationship): pass
ac = WorksFor(a, c)
print(ac)

(lord_varys)-[:WORKS_FOR]->(cersei_lannister)


A Subgraph is a collection of nodes and relationships. The simplest way to construct a subgraph is by combining nodes and relationships using standard set operations. For example, the output of the cell below shows the characters from Game of Thrones and their attributes, as well as all the relationships present in this graph:

In [70]:
s = ab | ac
print(s)

({(margaery_tyrell:Person {name:"Margaery_Tyrell"}), (cersei_lannister:Person {name:"Cersei_Lannister"}), (lord_varys:Person {name:"Lord_Varys"})}, {(lord_varys)-[:WORKS_FOR]->(cersei_lannister), (lord_varys)-[:KNOWS]->(margaery_tyrell)})


We can access the subgraph's nodes and relationships through these commands:

In [71]:
print("NODES",s.nodes())
print("RELATIONSHIPS",s.relationships())

('NODES', frozenset([(margaery_tyrell:Person {name:"Margaery_Tyrell"}), (cersei_lannister:Person {name:"Cersei_Lannister"}), (lord_varys:Person {name:"Lord_Varys"})]))
('RELATIONSHIPS', frozenset([(lord_varys)-[:WORKS_FOR]->(cersei_lannister), (lord_varys)-[:KNOWS]->(margaery_tyrell)]))


Let's now use a sample dataset provided by Neo4j to run queries and extract information. 

In [72]:
from py2neo import Graph
authenticate("localhost:7474", "neo4j", "ingopoobla3")
graph = Graph("http://localhost:7474/db/data/")
graph = Graph(password="excalibur")
graph.run("MATCH (a:Person) RETURN a.name, a.born LIMIT 4")

<py2neo.database.Cursor at 0x115edfe10>

As we've seen in the 15-388/15-688 assignments, we can use Pandas dataframes to make data analysis faster. Pandas has a multitude of features that make it extremely useful for data analysis and it keeps data organized in efficient structures like dataframes. Our py2neo library and pandas can work together to allow us to convert our Neo4j graph into a pandas dataframe. The code for doing this is shown below:

In [73]:
from pandas import DataFrame
DataFrame(graph.data("MATCH (a:Person) RETURN a.name, a.born LIMIT 4"))

Unnamed: 0,a.born,a.name
0,1964,Keanu Reeves
1,1967,Carrie-Anne Moss
2,1961,Laurence Fishburne
3,1960,Hugo Weaving


A NodeSelector can be used to locate nodes that fulfil a specific set of criteria. A single node can be identified passing a specific label and property key-value pair. But we can use any number of labels and almost any conditions are supported with the WHERE clause.

In [74]:
from py2neo import NodeSelector
graph = Graph()
selector = NodeSelector(graph)
selected = selector.select("Person").where("_.name =~ 'J.*'", "1960 <= _.born < 1970")
list(selected)[0:5]

[(e041381:Person {born:1967,name:"James Marshall"}),
 (f773abd:Person {born:1966,name:"John Cusack"}),
 (c1580f6:Person {born:1960,name:"John Goodman"}),
 (ac9bb3b:Person {born:1965,name:"John C. Reilly"}),
 (bab3262:Person {born:1967,name:"Julia Roberts"})]

The ogm module provides Object to Graph Mapping features. Conceptually, a mapped object owns a single node within the graph along with all of that node’s outgoing relationships. These features are managed via a pair of attributes called node and rel which store details of the mapped node and the outgoing relationships respectively. The only specific thing we need is a null constructor to make new instances. Here we import GraphObject and different relationships, and create classes just like we would in Python. each class takes in a GraphObject as a parameter, which makes logical sense.

In [75]:
from py2neo.ogm import GraphObject, Property, RelatedFrom
from py2neo import Graph

class Movie(GraphObject):
    __primarykey__ = "title"

    title = Property()
    tag_line = Property("tagline")
    released = Property()

    actors = RelatedFrom("Person", "ACTED_IN")
    directors = RelatedFrom("Person", "DIRECTED")
    producers = RelatedFrom("Person", "PRODUCED")


class Person(GraphObject):
    __primarykey__ = "name"

    name = Property()
    born = Property()

    acted_in = RelatedTo(Movie)
    directed = RelatedTo(Movie)
    produced = RelatedTo(Movie)

Here's another example graph from Marco Bonzanini. It follows the same pattern we've been discussing so far. We create 2 nodes Marco and Daniela which are "User"'s. we also create 2 types of beers under the "Beer" label. We can then create relationships that are both uni-directional and bi-directional, which you can tell by the labeled comments. We then make a sample query to tell what berr's Mark enjoys. This example shows the complete process of creating a graph so that you can follow along without having to jump through text and other cells.

In [76]:
db = GraphDatabase("http://localhost:7474", username="neo4j", password="ingopoobla3")

# Create some nodes with labels
user = db.labels.create("User")
u1 = db.nodes.create(name="Marco")
user.add(u1)
u2 = db.nodes.create(name="Daniela")
user.add(u2)

beer = db.labels.create("Beer")
b1 = db.nodes.create(name="Punk IPA")
b2 = db.nodes.create(name="Hoegaarden Rosee")
#You can associate a label with many nodes in one go
beer.add(b1, b2)
#User-likes->Beer relationships
u1.relationships.create("likes", b1)
u1.relationships.create("likes", b2)
u2.relationships.create("likes", b1)
#Bi-directional relationship
u1.relationships.create("friends", u2)

 
q = 'MATCH (u:User)-[r:likes]->(m:Beer) WHERE u.name="Marco" RETURN u, type(r), m'
# "db" as defined above
results = db.query(q, returns=(client.Node, str, client.Node))
for r in results:
    print("(%s)-[%s]->(%s)" % (r[0]["name"], r[1], r[2]["name"]))

(Marco)-[likes]->(Punk IPA)
(Marco)-[likes]->(Hoegaarden Rosee)
(Marco)-[likes]->(Punk IPA)
(Marco)-[likes]->(Hoegaarden Rosee)
(Marco)-[likes]->(Punk IPA)
(Marco)-[likes]->(Hoegaarden Rosee)
(Marco)-[likes]->(Punk IPA)
(Marco)-[likes]->(Hoegaarden Rosee)
(Marco)-[likes]->(Punk IPA)
(Marco)-[likes]->(Hoegaarden Rosee)
(Marco)-[likes]->(Punk IPA)
(Marco)-[likes]->(Hoegaarden Rosee)
(Marco)-[likes]->(Punk IPA)
(Marco)-[likes]->(Hoegaarden Rosee)


Here we have an example from Neo4j in which we first declare 2 people of type "Person" and give them names and ages. We then create two similar types of nodes, "Drink" and "Manufacturer". These a respectively named mtdew and cokezero as well as coke and pepsi. We can create our graph by using the create function in conjunction with the "|" operator to differetiate nodes.

In [77]:
from py2neo import Node

nicole = Node("Person", name="Nicole", age=24)
drew = Node("Person", name="Drew", age=20)

mtdew = Node("Drink", name="Mountain Dew", calories=9000)
cokezero = Node("Drink", name="Coke Zero", calories=0)

coke = Node("Manufacturer", name="Coca Cola")
pepsi = Node("Manufacturer", name="Pepsi")

graph.create(nicole | drew | mtdew | cokezero | coke | pepsi)

Now we create the relationships. Just like in the real world, we can make relationships like "Nicole LIKES Coke Zero" and represent it graphically, which is extremely powerful and intuitive. We create 5 relationships here, each connecting different entities in the graph. 

In [78]:
from py2neo import Relationship

graph.create(Relationship(nicole, "LIKES", cokezero))
graph.create(Relationship(nicole, "LIKES", mtdew))
graph.create(Relationship(drew, "LIKES", mtdew))
graph.create(Relationship(coke, "MAKES", cokezero))
graph.create(Relationship(pepsi, "MAKES", mtdew))


This query matches what drink people enjoy and displays it as a relationship. 

In [79]:
query = """
MATCH (person:Person)-[:LIKES]->(drink:Drink)
RETURN person.name AS name, drink.name AS drink
"""

data = graph.run(query)

for d in data:
    print(d)

(u'name': u'Nicole', u'drink': u'Coke Zero')
(u'name': u'Drew', u'drink': u'Mountain Dew')
(u'name': u'Nicole', u'drink': u'Mountain Dew')
(u'name': u'Drew', u'drink': u'Mountain Dew')
(u'name': u'Nicole', u'drink': u'Mountain Dew')
(u'name': u'Nicole', u'drink': u'Coke Zero')
(u'name': u'Drew', u'drink': u'Mountain Dew')
(u'name': u'Nicole', u'drink': u'Mountain Dew')
(u'name': u'Nicole', u'drink': u'Coke Zero')
(u'name': u'Drew', u'drink': u'Mountain Dew')
(u'name': u'Nicole', u'drink': u'Mountain Dew')
(u'name': u'Nicole', u'drink': u'Coke Zero')


Pass parameters to Cypher queries by passing additional key-value arguments to Graph.cypher.execute. Parameters in Cypher are named and are wrapped in curly braces. We use {name} as the parameter and when calling the query, we replace it with the name we want to use. This allows for more control over queries and significantly more reusability.

In [80]:
query = """
MATCH (p:Person)-[:LIKES]->(drink:Drink)
WHERE p.name = {name}
RETURN p.name AS name, AVG(drink.calories) AS avg_calories
"""

data = graph.run(query, name="Nicole")

for d in data:
    print(d)

(u'name': u'Nicole', u'avg_calories': 4500.0)


Here is another sample query that finds out the average number of calories of the drinks Nicole likes. Again, the example is fairly simple, but the best way of understanding Cypher is by looking at multiple examples of code. 

In [81]:
query = """
MATCH (p:Person)-[:LIKES]->(drink:Drink)
WHERE p.name = {name}
RETURN p.name AS name, AVG(drink.calories) AS avg_calories
"""

data = graph.run(query, name="Nicole")

for d in data:
    print(d)

(u'name': u'Nicole', u'avg_calories': 4500.0)


We install ipython-cypher and jgraph as we'll be using them in the next segment. Run the cell below and make sure to wait for the library to download fully before proceeding.

In [82]:
!pip install ipython-cypher
!pip install jgraph



Here's another query to show which person likes which drink! (Just written slightly differently)

In [83]:
from py2neo import Graph as PGraph
import jgraph 

neo4j = PGraph()

query = """
MATCH (person:Person)-[:LIKES]->(drink:Drink)
RETURN person.name AS source, drink.name AS target
"""
data = neo4j.run(query)
tups = []

for d in data:
    tups.append((d["source"], d["target"]))
print(tups)

[(u'Nicole', u'Coke Zero'), (u'Drew', u'Mountain Dew'), (u'Nicole', u'Mountain Dew'), (u'Drew', u'Mountain Dew'), (u'Nicole', u'Mountain Dew'), (u'Nicole', u'Coke Zero'), (u'Drew', u'Mountain Dew'), (u'Nicole', u'Mountain Dew'), (u'Nicole', u'Coke Zero'), (u'Drew', u'Mountain Dew'), (u'Nicole', u'Mountain Dew'), (u'Nicole', u'Coke Zero')]


We can use jgraph to see 3D visual representations of our graph. Because the example we're using currently is rather complex, the notebook takes and unusually long amount of time to run, therefore I've shown an example graph that shows how the jgraph draw function works. When you experiment with Cypher later, you can draw your networks and view them from any angle. Visualization is a powerful Data Science tool that may be really useful in this context to see important trends like graph density, etc.

In [84]:
import jgraph

jgraph.draw([(1, 2), (2, 3), (3, 4), (4, 1), (4, 5), (5, 2)])