# Week 13 Problem 3

If you are not using the `Assignments` tab on the course JupyterHub server to read this notebook, read [Activating the assignments tab](https://github.com/lcdm-uiuc/info490-sp17/blob/master/help/act_assign_tab.md).

A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer in anywhere else other than where it says `YOUR CODE HERE`. Anything you write anywhere else will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_  → _Save and CheckPoint_)

5. You are allowed to submit an assignment multiple times, but only the most recent submission will be graded.
-----
# Problem 13.3. Neo4J

In this problem, we will persist a NetworkX graph in Neo4J and then make queries using [CQL](https://www.tutorialspoint.com/neo4j/neo4j_cql_introduction.htm).

In [1]:
import networkx as nx
from py2neo import authenticate, Graph, Node, Relationship
from py2neo.database import cypher

from nose.tools import assert_equal, assert_true, assert_is_instance

First, let's get connected to the Neo4J database. 
In the following code cell, we read in the current user's netid to obtain a unique database name for this Notebook.
If you are not able to get connected, you should post in the forum and email TAs immediately. Try not to wait until the last minute, since there might be a lot of traffic that makes the server down.

In [2]:
# Filename containing user's netid
fname = '/home/data_scientist/users.txt'
with open(fname, 'r') as fin:
    netid = fin.readline().rstrip()

# We will delete our working directory if it exists before recreating.
dbname = '{0}'.format(netid)

host_ip = '141.142.211.60:7474'
username = 'neo4j'
password = 'Lcdm#info490'

# First we authenticate
authenticate(host_port=host_ip, user=username, password=password)

# Now create database URL
db_url = 'http://{0}/db/{1}'.format(host_ip, dbname)

print('Creating connection to {0}'.format(db_url))
graph = Graph(db_url)

version = graph.dbms.kernel_version
print('Neo4J Kernel version {0}.{1}.{2}'.format(version[0], version[1], version[2]))

Creating connection to http://141.142.211.60:7474/db/wyu13
Neo4J Kernel version 2.3.10


We use the social network of [Florentine Families](https://en.wikipedia.org/wiki/Category:Families_of_Florence) data set. For more information, see [Week 10 Problem 2](../Week10/assignments/w10p2.ipynb).

In [3]:
florentine_families = nx.florentine_families_graph()

## Persisting Graphs

Write a funtion named `persist_graph` that:
- Gets all nodes and edges from the NetworkX graph (`florentine_families`), and adds them to the Neo4J database,
- Provides a label `"families"` to all nodes,
- Provides a name using the node name read from the NetworkX graph to all nodes, and
- Creates a relationship of `"tied to"` for all edges.

In [4]:
def persist_graph(neo_graph, nx_graph):
    '''
    Persists a NetworkX graph in Neo4J.
    All nodes are labeled "families".
    All edges have connection type "tied to".
    
    Parameters
    ----------
    neo_graph: A py2neo.database.Graph instance.
    nx_graph: A networkx.Graph instance.
    '''
    
    # YOUR CODE HERE
    
    # Recreate graph by using a tranascation
    #tx = graph.begin()
    #nodes = []
    #for node in hc.nodes():
    #    nd = Node('NetworkX', name = str(node))
    #    nodes.append(nd)
    #    tx.create(nd)
    #for edge in hc.edges():
    #    tx.create(Relationship(nodes[edge[0]], 'Connects To', nodes[edge[1]]))
    #nodes = []
    for node in nx_graph.nodes():
        # Provides a label "families" to all nodes,
        nd = Node('families', name = str(node))
        #nodes.append(nd)
        neo_graph.create(nd)
        
    for edge in nx_graph.edges():
        # Creates a relationship of "tied to" for all edges
        n_edge0 = neo_graph.find('families', 'name', edge[0])
        n_edge1 = neo_graph.find('families', 'name', edge[1])
        neo_graph.create(Relationship(n_edge0, 'tied to', n_edge1))
        #neo_graph.create(Relationship(nodes[edge[0]], 'tied to', nodes[edge[1]]))

    return None

In [5]:
# clean out graph database
graph.delete_all()

In [6]:
# execute the function
persist_graph(graph, florentine_families)

In [7]:
# do a query to display all nodes and relationships in the database
for result in graph.run('START n=node(*) MATCH (n)-[r]->(m) RETURN n,r,m;'):
    print(result)

('n': (db33230 {name:"Bischeri"}), 'r': (db33230)-[:`tied to`]->(a5f96c7), 'm': (a5f96c7 {name:"Guadagni"}))
('n': (d613b30 {name:"Bischeri"}), 'r': (d613b30)-[:`tied to`]->(efe6ab9), 'm': (efe6ab9 {name:"Strozzi"}))
('n': (d257167 {name:"Bischeri"}), 'r': (d257167)-[:`tied to`]->(afe6dfb), 'm': (afe6dfb {name:"Peruzzi"}))
('n': (f169f7f {name:"Guadagni"}), 'r': (f169f7f)-[:`tied to`]->(b79dba7), 'm': (b79dba7 {name:"Lamberteschi"}))
('n': (e1f55da {name:"Guadagni"}), 'r': (e1f55da)-[:`tied to`]->(baf7501), 'm': (baf7501 {name:"Albizzi"}))
('n': (ad31a8b {name:"Guadagni"}), 'r': (ad31a8b)-[:`tied to`]->(a473cf5), 'm': (a473cf5 {name:"Tornabuoni"}))
('n': (c09857a {name:"Medici"}), 'r': (c09857a)-[:`tied to`]->(d5b9e14), 'm': (d5b9e14 {name:"Salviati"}))
('n': (f738c6d {name:"Medici"}), 'r': (f738c6d)-[:`tied to`]->(bad17d7), 'm': (bad17d7 {name:"Acciaiuoli"}))
('n': (d179aa0 {name:"Medici"}), 'r': (d179aa0)-[:`tied to`]->(dfc92c0), 'm': (dfc92c0 {name:"Barbadori"}))
('n': (d31f344 {nam

In [8]:
# test nodes
assert_true(all(isinstance(n['name'], str) for n in graph.find('families')))
node_names = [n['name'] for n in graph.find('families')]
assert_equal(len(node_names), len(florentine_families.nodes()))
assert_equal(set(node_names), set(florentine_families.nodes()))

In [9]:
# test relationships
edges = [e for e in graph.match(rel_type='tied to')]
start_nodes = [e.start_node()['name'] for e in edges]
end_nodes = [e.end_node()['name'] for e in edges]

assert_equal(len(edges), len(florentine_families.edges()))
assert_equal(set(start_nodes), {e[0] for e in florentine_families.edges()})
assert_equal(set(end_nodes), {e[1] for e in florentine_families.edges()})

## Querying Graphs

Write a funtion named `query_graph` that returns a CQL query string. The CQL query does the following:
- Finds the two nodes: `"Medici"` and `"Guadagni"`,
- Creates a new relationship `"business friend of"` between the two nodes, using `"Medici"` as start node and `"Guadagni"` as end node, and
- Returns the relationship record just created.

In [10]:
def query_graph():
    '''
    Constructs a CQL string that makes a query to the Neo4J database.
    Finds nodes "Medici" and "Guadagni" and makes a new relationship 
      "business friend of" between these two nodes.
    
    Ruturns
    ----------
    cql: A string.
    '''
    
    # YOUR CODE HERE
    
    # Create relationship between two nodes, where the nodes are found from a query
    # CQL = 'MATCH (a),(b) \
    #    WHERE a.name = "Jenny Doe" AND b.name = "Jim Doe" \
    #    CREATE (a)-[r:`sister of` {began:2002}]->(b) \
    #    RETURN r'
    cql = '''MATCH (a:families),(b:families)
                WHERE a.name = "Medici" AND b.name = "Guadagni"   
                CREATE (a)-[r:`business friend of`]->(b)
                RETURN r
          '''
    
    return cql

In [11]:
# run the query to add the new relationship to the database
cql = query_graph()
for result in (graph.run(cql)):
    print(result)

('r': (ccfcbbe)-[:`business friend of`]->(c3153cb))


In [12]:
# do a query to display all nodes and relationships in the database
for result in graph.run('START n=node(*) MATCH (n)-[r]->(m) RETURN n,r,m;'):
    print(result)

('n': (ccfcbbe:families {name:"Medici"}), 'r': (ccfcbbe)-[:`business friend of`]->(c3153cb), 'm': (c3153cb:families {name:"Guadagni"}))
('n': (db33230 {name:"Bischeri"}), 'r': (db33230)-[:`tied to`]->(a5f96c7), 'm': (a5f96c7 {name:"Guadagni"}))
('n': (d613b30 {name:"Bischeri"}), 'r': (d613b30)-[:`tied to`]->(efe6ab9), 'm': (efe6ab9 {name:"Strozzi"}))
('n': (d257167 {name:"Bischeri"}), 'r': (d257167)-[:`tied to`]->(afe6dfb), 'm': (afe6dfb {name:"Peruzzi"}))
('n': (f169f7f {name:"Guadagni"}), 'r': (f169f7f)-[:`tied to`]->(b79dba7), 'm': (b79dba7 {name:"Lamberteschi"}))
('n': (e1f55da {name:"Guadagni"}), 'r': (e1f55da)-[:`tied to`]->(baf7501), 'm': (baf7501 {name:"Albizzi"}))
('n': (ad31a8b {name:"Guadagni"}), 'r': (ad31a8b)-[:`tied to`]->(a473cf5), 'm': (a473cf5 {name:"Tornabuoni"}))
('n': (c09857a {name:"Medici"}), 'r': (c09857a)-[:`tied to`]->(d5b9e14), 'm': (d5b9e14 {name:"Salviati"}))
('n': (f738c6d {name:"Medici"}), 'r': (f738c6d)-[:`tied to`]->(bad17d7), 'm': (bad17d7 {name:"Acciai

In [13]:
# tests
assert_equal(type(cql), str)

new_edge = [e for e in graph.match(rel_type='business friend of')]
new_edge_start = [e.start_node()['name'] for e in new_edge]
new_edge_end = [e.end_node()['name'] for e in new_edge]

assert_equal(len(new_edge), 1)
assert_equal(new_edge_start[0], 'Medici')
assert_equal(new_edge_end[0], 'Guadagni')

## Cleanup

In [14]:
# clean out graph database
graph.delete_all()