## Establish the configuration for the demo

The demo is set up to run inside the directory:  ${HOME}/10M_accelerator

In [1]:
import json
import os

ROOT = os.path.join(os.path.expanduser('~'), '100M_accelerator')
%env ROOT=$ROOT

NUM_EDGES='100M'
NUM_NODES=10000000

env: ROOT=/home/ec2-user/100M_accelerator


In [2]:
!python3 -m pip install -U neo4j xgt pandas pyarrow

Defaulting to user installation because normal site-packages is not writeable


## Setup the neo4j file structure

In [3]:
%rm -rf $ROOT/*
%mkdir -p $ROOT/neo4j
%cd -q $ROOT/neo4j
%mkdir -p data logs conf import plugins
%cd $ROOT

/home/ec2-user/100M_accelerator


In [4]:
# Download plugins
%cd -q $ROOT/neo4j/plugins
!wget --quiet https://github.com/neo4j-field/neo4j-arrow/releases/download/v4.1/neo4j-arrow-4.1-all.jar
#  !wget --quiet -O graph-data-science.jar https://s3-eu-west-1.amazonaws.com/com.neo4j.graphalgorithms.dist/graph-data-science/neo4j-graph-data-science-2.0.0.jar
%cd $ROOT

/home/ec2-user/100M_accelerator


## Setup the xgt file structure

In [5]:
%mkdir -p $ROOT/xgt/data $ROOT/xgt/logs

## Create the docker-compose.yml config file

In [6]:
with open(f"{ROOT}/docker-compose.yml", "w") as config:
    config.write(f"""# neo4j accelerator config
version: '3'
services:
  neo4j:
    image: neo4j:4.4.4
    # restart: unless-stopped
    user: "{os.getuid()}:{os.getgid()}"
    ports:
      - 7474:7474
      - 7687:7687
      - 9999:9999
    volumes:
      # - {ROOT}/neo4j/conf:/conf
      - {ROOT}/neo4j/data:/data
      - {ROOT}/neo4j/import:/import 
      - {ROOT}/neo4j/logs:/logs
      - {ROOT}/neo4j/plugins:/plugins
    environment:
      - NEO4J_AUTH=neo4j/foo
      - NEO4JLABS_PLUGINS=["graph-data-science"]
      - HOST=0.0.0.0
      - NEO4J_dbms_default__listen__address=0.0.0.0
      - NEO4J_dbms.security.procedures.unrestricted=gds.*
      - NEO4J_dbms.security.procedures.allowlist=gds.*
      # Raise memory limits
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms.memory.heap.initial_size=8G
      - NEO4J_dbms_memory_heap_max__size=8G
      # Unset for writing to /plugins
      - SECURE_FILE_PERMISSIONS=
    # networks: ["neo4j-accelerator"]
  xgt:
    image: trovares/nonroot-xgt
    user: "{os.getuid()}:{os.getgid()}"
    ports:
      - 4367:4367
    volumes:
      - {ROOT}/xgt/logs:/var/log/xgtd
      - {ROOT}/xgt/data:/data
# networks: {{ neo4j-accelerator {{}} }}
#volumes:
#  neo4j-accelerator:
""")

In [7]:
!docker system prune -f

Total reclaimed space: 0B


In [8]:
# Return to notebook directory
%cd -0

# Download neo4j_arrow Python client
!wget --quiet --no-clobber https://github.com/neo4j-field/neo4j-arrow/releases/download/v4.1/neo4j_arrow.py

/home/ec2-user/accelerate_neo4j/demo


## Create metadata files for neo4j-admin import

In [9]:
with open(f"{ROOT}/neo4j/import/edge_header", "w") as hdr:
    hdr.write(':START_ID,:END_ID,timestamp:int\n')
with open(f"{ROOT}/neo4j/import/vertex_header", "w") as hdr:
    hdr.write('id:ID\n')
with open(f"{ROOT}/neo4j/import/vertex_data", "w") as hdr:
    for n in range(NUM_NODES):
        hdr.write(f"{n}\n")

In [10]:
%%time
%cd -q $ROOT/neo4j/import
os.system(f"wget --quiet --no-clobber -O tt.csv http://datasets.trovares.com/TT/tt.{NUM_EDGES}")
%cd -q $ROOT/

CPU times: user 7.73 ms, sys: 220 µs, total: 7.95 ms
Wall time: 1min 8s


## Ingest the CSV file into neo4j using fastest ingest method

In [11]:
%%time
%cd -q $ROOT/
!docker-compose run neo4j neo4j-admin import --force --nodes=vertex=/import/vertex_header,/import/vertex_data --trim-strings=true --relationships=edge=/import/edge_header,/import/tt.csv --database=neo4j --id-type=INTEGER

Creating network "100m_accelerator_default" with the default driver
Creating 100m_accelerator_neo4j_run ... 
[1Bting 100m_accelerator_neo4j_run ... [32mdone[0mFetching versions.json for Plugin 'graph-data-science' from https://s3-eu-west-1.amazonaws.com/com.neo4j.graphalgorithms.dist/graph-data-science/versions.json
Installing Plugin 'graph-data-science' from https://s3-eu-west-1.amazonaws.com/com.neo4j.graphalgorithms.dist/graph-data-science/neo4j-graph-data-science-2.0.0.jar to /plugins/graph-data-science.jar 
Applying default values for plugin graph-data-science to neo4j.conf
Selecting JVM - Version:11.0.14.1+1, Name:OpenJDK 64-Bit Server VM, Vendor:Oracle Corporation
Neo4j version: 4.4.4
Importing the contents of these files into /var/lib/neo4j/data/databases/neo4j:
Nodes:
  [vertex]:
  /import/vertex_header
  /import/vertex_data

Relationships:
  edge:
  /import/edge_header
  /import/tt.csv


Available resources:
  Total machine memory: 30.68GiB
  Free machine memory: 26.64GiB


## Restart the Neo4j server

In [12]:
%cd -q $ROOT/
!docker-compose up -d --remove-orphans

Creating 100m_accelerator_xgt_1 ... 
Creating 100m_accelerator_neo4j_1 ... 
[1Bting 100m_accelerator_neo4j_1 ... [32mdone[0m