# ResearchOps Community Toolbox Census + Graph
## Demo II - Pushing the Data to the Graph
#### Author: Pete Tunkis
#### Date: 2024-10-29

This notebook provides replicable code that uses previously-created node and relationship tables that will be used as parameterized references when building the graph. To do this, I use a local instance of neo4j (using neo4j Desktop) that's running, and connect to it/send commands using neo4j's API from a python environment.

Please note that this notebook is not a standalone; it is a conveyor of (mostly) tidied up code that you can use as inspiration for your project, to improve upon it, etc. All steps in this notebook may be carried out if all steps from the `01_data_ingest_xform.ipynb` notebook are complete/successful.

In [1]:
### Environment
import os
from dotenv import load_dotenv
from neo4j import GraphDatabase

# Initialization
load_dotenv()

### Parameters
URI = os.getenv('NEO4J_URI')
USER = os.getenv('NEO4J_USER')
PASS = os.getenv('NEO4J_PASS')
AUTH = (USER, PASS)

# Neo4j GDBMS driver and session strings
driver = GraphDatabase.driver(URI, auth = AUTH)
session = driver.session(database = 'neo4j')

In [2]:
### Verify connection to Neo4J GDBMS
with GraphDatabase.driver(URI, auth = AUTH) as driver:
    driver.verify_connectivity()
    print('Connection established!')

Connection established!


### First step: create indices/constraints
Before pushing any data to the graph, I create indices that help with search and other functions. This is especially critical when working with large volumes of data...spoiler alert that my data only have 800 nodes and some 9k relationships (before any algorithms), but this is still a good habit to maintain.

In [3]:
### Initial Graph Setup - Create uniqueness constraints, indices

# Respondent, Tool indices
session.run(
    '''
    CREATE INDEX Respondent_Index IF NOT EXISTS FOR (uxt:Respondent) ON (uxt.respondent_id);
    '''
    )
session.run(
    '''
    CREATE INDEX Tool_Index IF NOT EXISTS FOR (t:Tool) ON (t.tool_id);
    '''
    )

# Node constraints
session.run(
    '''
    CREATE CONSTRAINT Tool_Constraint IF NOT EXISTS FOR (t:Tool) REQUIRE t.tool IS UNIQUE;
    '''
    )
session.run(
    '''
    CREATE CONSTRAINT Business_Constraint IF NOT EXISTS FOR (b:Business) REQUIRE b.business IS UNIQUE;
    '''
    )
session.run(
    '''
    CREATE CONSTRAINT Industry_Constraint IF NOT EXISTS FOR (i:Industry) REQUIRE i.industry IS UNIQUE;
    '''
    )
session.run(
    '''
    CREATE CONSTRAINT Company_Constraint IF NOT EXISTS FOR (c:Company) REQUIRE c.company IS UNIQUE;
    '''
    )
session.run(
    '''
    CREATE CONSTRAINT Location_Constraint IF NOT EXISTS FOR (l:Location) REQUIRE l.location IS UNIQUE;
    '''
    )
session.run(
    '''
    CREATE CONSTRAINT Discipline_Constraint IF NOT EXISTS FOR (d:Discipline) REQUIRE d.discipline IS UNIQUE;
    '''
    )
session.run(
    '''
    CREATE CONSTRAINT Responsibility_Constraint IF NOT EXISTS FOR (r:Responsibility) REQUIRE r.responsibility IS UNIQUE;
    '''
    )

### Writing nodes to the graph
As with the previous demo notebook, most of everything below could probably be looped to whatever extent using dictionaries, but for the purpose of being exhaustive, I leave these outside of any while loops to illustrate each component explicitly.

In [4]:
### Write nodes to the GDBMS

# Respondents
for index, row in respondent_node.iterrows():
    session.run(
        '''
        CREATE (:Respondent {respondent_id: $respondent_id
                             , num_researchers: $num_researchers
                             , size_cat: $size_cat
                             , maturity: $maturity
                             , len_experience: $len_experience
                             , exp_cat: $exp_cat
                             , discipline: $discipline})
        '''
        , respondent_id = row['respondent-id']
        , num_researchers = row['num-researchers']
        , maturity = row['maturity']
        , len_experience = row['len-experience']
        , discipline = row['discipline']
        , size_cat = row['size-cat']
        , exp_cat = row['exp-cat']
        )

# Business types
for index, row in business_node.iterrows():
    session.run(
        '''
        CREATE (:Business {business_id: $business_id
                             , business: $business})
        '''
        , business_id = row['id']
        , business = row['business']
        )

# Industries
for index, row in industry_node.iterrows():
    session.run(
        '''
        CREATE (:Industry {industry_id: $industry_id
                             , industry: $industry})
        '''
        , industry_id = row['id']
        , industry = row['industry']
        )

# Companies
for index, row in company_node.iterrows():
    session.run(
        '''
        CREATE (:Company {company_id: $company_id
                             , company: $company})
        '''
        , company_id = row['id']
        , company = row['company']
        )

# Locations
for index, row in location_node.iterrows():
    session.run(
        '''
        CREATE (:Location {location_id: $location_id
                             , location: $location
                             , code: $code})
        '''
        , location_id = row['id']
        , location = row['location']
        , code = row['code']
        )

# Tools
for index, row in tool_node.iterrows():
    session.run(
        '''
        CREATE (:Tool {tool_id: $tool_id
                             , tool: $tool})
        '''
        , tool_id = row['id']
        , tool = row['tool']
        )

# Disciplines
for index, row in discipline_node.iterrows():
    session.run(
        '''
        CREATE (:Discipline {discipline_id: $discipline_id
                             , discipline: $discipline})
        '''
        , discipline_id = row['id']
        , discipline = row['discipline']
        )

# Responsibilities
for index, row in responsibility_node.iterrows():
    session.run(
        '''
        CREATE (:Responsibility {responsibility_id: $responsibility_id
                             , responsibility: $responsibility})
        '''
        , responsibility_id = row['id']
        , responsibility = row['responsibility']
        )

### Writing relationships to the graph
I approach pushing relationships in the exact same way I pushed nodes.

In [None]:
### Write relationships to GDBMS

# Business
for index, row in business_rel.iterrows():
    session.run(
        '''
        MATCH (a:Respondent {respondent_id: $respondent_id})
        MATCH (b:Business {business: $business})
        MERGE (a)-[:IS_BUSINESS_TYPE]->(b)
        '''
        , respondent_id = row['respondent-id']
        , business = row['business']
        )

# Company
for index, row in company_rel.iterrows():
    session.run(
        '''
        MATCH (a:Respondent {respondent_id: $respondent_id})
        MATCH (c:Company {company: $company})
        MERGE (a)-[:IS_COMPANY_TYPE]->(c)
        '''
        , respondent_id = row['respondent-id']
        , company = row['company']
        )

# Discipline - via respondent node table since it's 1:1
for index, row in respondent_node.iterrows():
    session.run(
        '''
        MATCH (a:Respondent {respondent_id: $respondent_id})
        MATCH (d:Discipline {discipline: $discipline})
        MERGE (a)-[:IS_DISCIPLINE]->(d)
        '''
        , respondent_id = row['respondent-id']
        , discipline = row['discipline']
        )

# Industry
for index, row in industry_rel.iterrows():
    session.run(
        '''
        MATCH (a:Respondent {respondent_id: $respondent_id})
        MATCH (i:Industry {industry: $industry})
        MERGE (a)-[:IN_INDUSTRY]->(i)
        '''
        , respondent_id = row['respondent-id']
        , industry = row['industry']
        )

# Locations - Participants and Researchers
country_rel_dict = {'country-participants': 'HAS_PARTICIPANTS_IN'
                    , 'country-researchers': 'HAS_RESEARCHERS_IN'}

for key, value in country_rel_dict.items():
    for index, row in location_rel[location_rel['party'] == key].iterrows():
        session.run(
            f'''
            MATCH (a:Respondent {{respondent_id: $respondent_id}})
            MATCH (l:Location {{location: $location}})
            MERGE (a)-[:{value} {{party: $party}}]->(l)
            '''
            , respondent_id = row['respondent-id']
            , location = row['location']
            , party = row['party']
            )

# Tools
for index, row in tool_rel.iterrows():
    session.run(
        '''
        MATCH (a:Respondent {respondent_id: $respondent_id})
        MATCH (t:Tool {tool: $tool})
        MERGE (a)-[:USES {use_case: $use_case}]->(t)
        '''
        , respondent_id = row['respondent-id']
        , tool = row['tool']
        , use_case = row['purpose']
        )

# Responsibilities
for index, row in responsibility_rel.iterrows():
    session.run(
        '''
        MATCH (a:Respondent {respondent_id: $respondent_id})
        MATCH (i:Responsibility {responsibility: $responsibility})
        MERGE (a)-[:HAS_RESPONSIBILITY]->(i)
        '''
        , respondent_id = row['respondent-id']
        , responsibility = row['responsibility']
        )

### We're done...
...uploading information to our graph, but there's more fun to come. Preliminary descriptive analysis of the graph and some of our data of interest follow in the next part of the demonstration!