# TigerGraph Graph Data Science Library 101 - Path Finding Algorithm

This notebook shows the examples of using the most common path finding algorithms in TigerGraph Graph Science Library. More detailed explanations of these algorithms can be four in the official documentation (https://docs.tigergraph.com/graph-ml/current/pathfinding-algorithms/).  

## Step 1: Setting things up
- Connect and Load data

In [1]:
from pyTigerGraph.datasets import Datasets

dataset = Datasets("ldbc_snb")

A folder with name ldbc_snb already exists in ./tmp. Skip downloading.


In [2]:
from pyTigerGraph import TigerGraphConnection
import json

# Read in DB configs
with open('../config.json', "r") as config_file:
    config = json.load(config_file)

conn = TigerGraphConnection(
    host=config["host"],
    username=config["username"],
    password=config["password"],
)

In [3]:
conn.ingestDataset(dataset, getToken=config["getToken"])

---- Checking database ----
A graph with name ldbc_snb already exists in the database. Skip ingestion.


- Visualize the graph schema 

In [4]:
from pyTigerGraph.visualization import drawSchema

drawSchema(conn.getSchema(force=True))

CytoscapeWidget(cytoscape_layout={'name': 'circle', 'animate': True, 'padding': 1}, cytoscape_style=[{'selecto…

- Get basic stats, e.g., counts of nodes & edges

In [5]:
vertices = conn.getVertexTypes()
for vertex in vertices:
    print("Node count: ({} : {}) ".format(vertex, conn.getVertexCount(vertex)))

Node count: (Comment : 2052169) 
Node count: (Post : 1003605) 
Node count: (Company : 1575) 
Node count: (University : 6380) 
Node count: (City : 1343) 
Node count: (Country : 111) 
Node count: (Continent : 6) 
Node count: (Forum : 90492) 
Node count: (Person : 9892) 
Node count: (Tag : 16080) 
Node count: (Tag_Class : 71) 


In [6]:
import pprint
print("Edges counts: ")
pprint.pprint(conn.getEdgeCount())

Edges counts: 
{'Container_Of': 1003605,
 'Container_Of_Reverse': 1003605,
 'Has_Creator': 3055774,
 'Has_Creator_Reverse': 3055774,
 'Has_Interest': 229166,
 'Has_Interest_Reverse': 229166,
 'Has_Member': 1611869,
 'Has_Member_Reverse': 1611869,
 'Has_Moderator': 90492,
 'Has_Moderator_Reverse': 90492,
 'Has_Tag': 3721417,
 'Has_Tag_Reverse': 3721417,
 'Has_Type': 16080,
 'Has_Type_Reverse': 16080,
 'Is_Located_In': 3073621,
 'Is_Located_In_Reverse': 3073621,
 'Is_Part_Of': 1454,
 'Is_Part_Of_Reverse': 1454,
 'Is_Subclass_Of': 70,
 'Is_Subclass_Of_Reverse': 70,
 'Knows': 180623,
 'Likes': 2190095,
 'Likes_Reverse': 2190095,
 'Reply_Of': 2052169,
 'Reply_Of_Reverse': 2052169,
 'Study_At': 7949,
 'Study_At_Reverse': 7949,
 'Work_At': 21654,
 'Work_At_Reverse': 21654}


## Step 2: Leveraging pyTigerGraph’s featurizer to run Path Finding algorithms


In [7]:
feat = conn.gds.featurizer()

In [8]:
feat.listAlgorithms("Path")

Available algorithms for Path:
  bfs:
    01. name: tg_bfs
  cycle_detection:
    02. name: tg_cycle_detection_count
  shortest_path:
    03. name: tg_shortest_ss_no_wt
Call runAlgorithm() with the algorithm name to execute it


## tg_bfs
Breadth-First Search Algorithm from a single source node

In [9]:
params = {
    "v_type_set": ["Person"],
    "e_type_set": ["Knows"],
    "max_hops": 2,
    "v_start": {"id": "21990232556463", "type": "Person"}, ##{"id": "vertex_id", "type": "vertex_type"}
    "print_results": True,
    "result_attribute": "",
    "file_path": "",
    "display_edges": False
  }

In [10]:
import csv
import os
import time
import psutil
!pip install memory_profiler
%load_ext memory_profiler

algo_performance_out = '/home/tigergraph/GraphML/output/algorithm_' + config["job_id"] + '.csv'

start_time = time.time()

algo_memory = %memit -r 1 -o feat.runAlgorithm("tg_bfs", params=params)

algo_memory = str(algo_memory)

start = algo_memory.find(": ") + 1
end = algo_memory.find("M")

algo_memory = algo_memory[start:end].strip()

execution_time = time.time() - start_time

cpu_usage = psutil.cpu_percent(4)

print('The CPU usage is: ', cpu_usage)

# print('RAM memory % used:', psutil.virtual_memory()[2])

host_memory = psutil.virtual_memory()[3]/1000000000

print('RAM Used (GB):', host_memory)

print ('tg_bfs executed successfully')

print ('execution time: ' + str(execution_time) + ' seconds\n')

algo_id = "tg_bfs_" + config["job_id"]

nb_id = "pathfinding.ipynb_" + config["job_id"]

keyword = "tg_bfs"

data = [algo_id, "false" ,cpu_usage, algo_memory, execution_time, host_memory, "3.8", "no error", nb_id, keyword]

with open(algo_performance_out, mode='a+', encoding='utf-8') as f:
    writer = csv.writer(f) 
    writer.writerow(data)

  ipython_version = LooseVersion(IPython.__version__)
  other = LooseVersion(other)


Installing and optimizing the queries, it might take a minute...
Queries installed successfully
peak memory: 124.59 MiB, increment: 2.52 MiB
The CPU usage is:  24.6
RAM Used (GB): 11.079008256
tg_bfs executed successfully
execution time: 37.20045328140259 seconds



In [12]:
res = feat.runAlgorithm("tg_bfs", params=params)
len(res[0]['Start']), res[0]['Start'][:10]

(4069,
 [{'v_id': '30786325580605',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}},
  {'v_id': '13194139540951',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}},
  {'v_id': '6597069769055',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}},
  {'v_id': '15393162796423',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}},
  {'v_id': '15393162792715',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}},
  {'v_id': '28587302332123',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}},
  {'v_id': '6597069774914',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}},
  {'v_id': '9079', 'v_type': 'Person', 'attributes': {'Start.@sum_step': 2}},
  {'v_id': '21990232561273',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}},
  {'v_id': '15393162792433',
   'v_type': 'Person',
   'attributes': {'Start.@sum_step': 2}}])

## tg_shortest_path
Single-source shortest path algorithm, with unweighted edges.

In [13]:
params = {
    "source": {"id": "21990232556463", "type": "Person"}, ##{"id": "vertex_id", "type": "vertex_type"}
    "v_type_set": ["Person"],
    "e_type_set": ["Knows"],
    "print_limit": 20,
    "print_results": True,
    "result_attribute": "",
    "file_path": "",
    "display_edges": False
}

In [14]:
start_time = time.time()

algo_memory = %memit -r 1 -o feat.runAlgorithm("tg_shortest_ss_no_wt", params=params)

algo_memory = str(algo_memory)

start = algo_memory.find(": ") + 1
end = algo_memory.find("M")

algo_memory = algo_memory[start:end].strip()

execution_time = time.time() - start_time

cpu_usage = psutil.cpu_percent(4)

print('The CPU usage is: ', cpu_usage)

# print('RAM memory % used:', psutil.virtual_memory()[2])

host_memory = psutil.virtual_memory()[3]/1000000000

print('RAM Used (GB):', host_memory)

print ('tg_shortest_ss_no_wt executed successfully')

print ('execution time: ' + str(execution_time) + ' seconds\n')

algo_id = "tg_shortest_ss_no_wt_" + config["job_id"]

nb_id = "pathfinding.ipynb_" + config["job_id"]

keyword = "tg_shortest_ss_no_wt"

data = [algo_id, "false" ,cpu_usage, algo_memory, execution_time, host_memory, "3.8", "no error", nb_id, keyword]

with open(algo_performance_out, mode='a+', encoding='utf-8') as f:
    writer = csv.writer(f) 
    writer.writerow(data)

Installing and optimizing the queries, it might take a minute...
Queries installed successfully
peak memory: 131.82 MiB, increment: 0.40 MiB
The CPU usage is:  24.1
RAM Used (GB): 11.09198848
tg_shortest_ss_no_wt executed successfully
execution time: 40.10139513015747 seconds



In [15]:
res = feat.runAlgorithm("tg_shortest_ss_no_wt", params=params)
print(len(res[0]['ResultSet']))
res[0]['ResultSet'][:5]

20


[{'v_id': '15393162794623',
  'v_type': 'Person',
  'attributes': {'ResultSet.@min_dis': 3,
   'ResultSet.@path_list': ['21990232556463',
    '10995116278291',
    '19791209304170',
    '15393162794623']}},
 {'v_id': '13194139540583',
  'v_type': 'Person',
  'attributes': {'ResultSet.@min_dis': 3,
   'ResultSet.@path_list': ['21990232556463',
    '15393162795439',
    '24189255821300',
    '13194139540583']}},
 {'v_id': '30786325580605',
  'v_type': 'Person',
  'attributes': {'ResultSet.@min_dis': 2,
   'ResultSet.@path_list': ['21990232556463',
    '17592186045238',
    '30786325580605']}},
 {'v_id': '13194139540951',
  'v_type': 'Person',
  'attributes': {'ResultSet.@min_dis': 2,
   'ResultSet.@path_list': ['21990232556463',
    '26388279075075',
    '13194139540951']}},
 {'v_id': '6597069769055',
  'v_type': 'Person',
  'attributes': {'ResultSet.@min_dis': 2,
   'ResultSet.@path_list': ['21990232556463',
    '10995116286685',
    '6597069769055']}}]

## tg_cycle_detection_count
This is a distributed algorithm for detecting all the cycles on large-scale directed graphs.

In [16]:
params =  {
    "v_type_set": ["Person"],
    "e_type_set": ["Knows"],
    "depth": 2,
    "batches": 2,
    "print_results": True
}

In [17]:
start_time = time.time()

algo_memory = %memit -r 1 -o feat.runAlgorithm("tg_cycle_detection_count", params=params)

algo_memory = str(algo_memory)

start = algo_memory.find(": ") + 1
end = algo_memory.find("M")

algo_memory = algo_memory[start:end].strip()

execution_time = time.time() - start_time

cpu_usage = psutil.cpu_percent(4)

print('The CPU usage is: ', cpu_usage)

# print('RAM memory % used:', psutil.virtual_memory()[2])

host_memory = psutil.virtual_memory()[3]/1000000000

print('RAM Used (GB):', host_memory)

print ('tg_cycle_detection_count executed successfully')

print ('execution time: ' + str(execution_time) + ' seconds\n')

algo_id = "tg_cycle_detection_count_" + config["job_id"]

nb_id = "pathfinding.ipynb_" + config["job_id"]

keyword = "tg_cycle_detection_count"

data = [algo_id, "false" ,cpu_usage, algo_memory, execution_time, host_memory, "3.8", "no error", nb_id, keyword]

with open(algo_performance_out, mode='a+', encoding='utf-8') as f:
    writer = csv.writer(f) 
    writer.writerow(data)

Installing and optimizing the queries, it might take a minute...
Queries installed successfully
peak memory: 130.04 MiB, increment: 0.08 MiB
The CPU usage is:  26.7
RAM Used (GB): 11.811397632
tg_cycle_detection_count executed successfully
execution time: 46.2645218372345 seconds



In [18]:
# Display Results
res = feat.runAlgorithm("tg_cycle_detection_count", params=params)
res

[{'cycles': 180623}]