<div class="alert alert-block alert-success">
    <h1>
        Example notebook - Reactome subgraph
    </h1>
</div>

# Import modules and functions

In [1]:
import os
import glob
import re
import networkx as nx
import pandas as pd
import time
from tqdm.auto import tqdm

from IPython.display import display, Markdown

from turingdb_examples.graph import (
    build_create_command_from_networkx,
)
from turingdb_examples.graph import split_cypher_commands
from turingdb_examples.llm import natural_language_to_cypher, query_llm

In [2]:
%load_ext autoreload
%autoreload 2

# Check data files are available

In [3]:
example_name = "reactome"
path_data = f"{os.getcwd()}/data/{example_name}"
if not os.path.exists(path_data):
    raise ValueError(f"{path_data} does not exists")

list_csv_files = sorted(
    [os.path.basename(file) for file in glob.glob(os.path.join(path_data, "*"))]
)
if not "entities_pairwise.gml" in list_csv_files:
    raise ValueError(
        f"At least one of the {len(list_csv_files)} csv files is not available in {path_data}"
    )

# Import `gml` file

In [4]:
G = nx.read_gml(f"{path_data}/entities_pairwise.gml")
print(G)

MultiGraph with 40 nodes and 57 edges


# Graph Creation in TuringDB

## Build Cypher CREATE Commands

In [5]:
# Build CREATE command from networkx object
graph_CREATE_command = build_create_command_from_networkx(G, node_type_key="schemaClass")

print(f"""
Cypher CREATE command :
* size: {len(graph_CREATE_command.encode('utf-8'))/1024/1000:.4f} MB\n
{100 * '*'}
{graph_CREATE_command if len(graph_CREATE_command.split("\n")) < 10000 else "\n".join(graph_CREATE_command.split('\n')[:5]) + "\n...\n" + "\n".join(graph_CREATE_command.split('\n')[-5:])}
{100 * '*'}
""")

Cypher query will create graph with 40 nodes and 57 edges

Cypher CREATE command :
* size: 0.0215 MB

****************************************************************************************************
CREATE (:Reaction {id: "TP53 binds the PMAIP1 (NOXA) promoter", schemaClass: "Reaction", stId: "R-HSA-4331331", oldStId: "REACT_169265", releaseDate: "2016-03-23", name: "[ TP53 binds the PMAIP1 (NOXA) promoter ]", stIdVersion: "R-HSA-4331331.6", speciesName: "Homo sapiens", category: "binding", displayName: "TP53 binds the PMAIP1 (NOXA) promoter"}),
(:Complex {id: "p-S15,S20-TP53 Tetramer [nucleoplasm]", schemaClass: "Complex", stId: "R-HSA-3222171", name: "[ p-S15,S20-TP53 Tetramer ]", stIdVersion: "R-HSA-3222171.1", speciesName: "Homo sapiens", displayName: "p-S15,S20-TP53 Tetramer [nucleoplasm]"}),
(:Reaction {id: "TP53 binds the APAF1 gene promoter", schemaClass: "Reaction", stId: "R-HSA-6791349", releaseDate: "2016-03-23", name: "[ TP53 binds the APAF1 gene promoter ]", stIdVersio

## Split command into chunks

In [6]:
%%time

chunks = split_cypher_commands(graph_CREATE_command, max_size_mb=1)

print(f"âœ“ Split into {len(chunks['node_chunks'])} node chunk(s) and {len(chunks['edge_chunks'])} edge chunk(s)")

print("\nNode chunks:")
for i, chunk in enumerate(chunks['node_chunks']):
    print(f"  Node chunk {i+1}: {len(chunk.encode('utf-8'))/1024:.1f} KB")
    if i == 10:
        print("  ...")
        break

print("\nEdge chunks:")
for i, chunk in enumerate(chunks['edge_chunks']):
    print(f"  Edge chunk {i+1}: {len(chunk.encode('utf-8'))/1024:.1f} KB")
    if i == 10:
        print("  ...")
        break

âœ“ Split into 1 node chunk(s) and 57 edge chunk(s)

Node chunks:
  Node chunk 1: 13.0 KB

Edge chunks:
  Edge chunk 1: 0.2 KB
  Edge chunk 2: 0.1 KB
  Edge chunk 3: 0.2 KB
  Edge chunk 4: 0.2 KB
  Edge chunk 5: 0.1 KB
  Edge chunk 6: 0.2 KB
  Edge chunk 7: 0.2 KB
  Edge chunk 8: 0.2 KB
  Edge chunk 9: 0.1 KB
  Edge chunk 10: 0.1 KB
  Edge chunk 11: 0.1 KB
  ...
CPU times: user 1.15 ms, sys: 0 ns, total: 1.15 ms
Wall time: 1.03 ms


# Create graph using `turingdb` python package

<div class="alert alert-block alert-info">
    <h2>
        See <a href="https://docs.turingdb.ai/quickstart">TuringDB Get started documentation</a> for the important steps to follow :
    </h2>
    <h4>
        <ul>
            <li>Create your TuringDB account</li>
            <li>Create your instance in the <a href="https://console.turingdb.ai/auth">TuringDB Cloud UI</a></li>
            <li>Copy your Instance ID from the Database Instances management page</li>
            <li>Get API Key from the Settings in UI</li>
        </ul>
        Remember to have your instance active while working in this notebook !
    </h4>
</div>

In [7]:
from turingdb import TuringDB

# Create TuringDB client
# set host parameter to the URL (as string) on which TuringDB is running,
# default "http://localhost:6666"
client = TuringDB(host="http://localhost:6666")
try:
    client.warmup()
except Exception as e:
    print(f"TuringDB not started, please run `uv run turingdb` in your terminal")

In [8]:
# Get list of available graphs
list_graphs = client.list_available_graphs()

In [9]:
client.list_loaded_graphs()

['wine_ontology1',
 'healthcare_dataset1',
 'crypto_orbitaal_fraud_detection1',
 'citeab_antibody1',
 'default']

In [10]:
# Set graph name
graph_name_prefix = example_name
graph_name_nb_suffix = str(
    max(
        [
            int(re.sub(graph_name_prefix, "", g))
            for g in list_graphs
            if g.startswith(graph_name_prefix)
            and re.sub(graph_name_prefix, "", g).isdigit()
        ]
        + [0]
    )
    + 1
)
graph_name = graph_name_prefix + graph_name_nb_suffix
graph_name = re.sub("-", "_", graph_name)
graph_name

'reactome1'

In [11]:
from turingdb.exceptions import TuringDBException

In [12]:
%%time

# Set graph
try:
    client.create_graph(graph_name)
except TuringDBException as e:
    print(e)

# Set working graph
client.set_graph(graph_name)

CPU times: user 1.82 ms, sys: 60 Î¼s, total: 1.88 ms
Wall time: 8.12 ms


In [13]:
%%time

# Create a new change on the graph
client.checkout()
change = client.new_change()
print(f"Current change {change}")

# Checkout into the change
client.checkout(change=change)

Current change 0
CPU times: user 1.69 ms, sys: 0 ns, total: 1.69 ms
Wall time: 1.4 ms


In [14]:
%%time

# Run CREATE command
print("\nExecuting query on TuringDB...")
start_time = time.time()

print(f"âœ“ Split into {len(chunks['node_chunks'])} node chunk(s) and {len(chunks['edge_chunks'])} edge chunk(s)")

# CREATE nodes
print("\nNode chunks:")
for i, chunk in enumerate(tqdm(chunks['node_chunks'])):
    result = client.query(chunk)
# Commit the change
client.query("COMMIT")
print(f"âœ“ {len(chunks['node_chunks'])} node chunks done")

# CREATE edges
print("\nEdge chunks:")
for i, chunk in enumerate(tqdm(chunks['edge_chunks'])):
    result = client.query(chunk)
# Commit the change
client.query("COMMIT")
print(f"âœ“ {len(chunks['edge_chunks'])} edge chunks done")

execution_time = time.time() - start_time
print(f"\nâœ“ Graph created successfully in {execution_time:.2f} seconds")

# Submit changes
start_time = time.time()
client.query("CHANGE SUBMIT")
execution_time = time.time() - start_time
print(f"\nâœ“ Changes successfully submitted in {execution_time:.2f} seconds")

# Checkout into main
client.checkout()


Executing query on TuringDB...
âœ“ Split into 1 node chunk(s) and 57 edge chunk(s)

Node chunks:


  0%|          | 0/1 [00:00<?, ?it/s]

âœ“ 1 node chunks done

Edge chunks:


  0%|          | 0/57 [00:00<?, ?it/s]

âœ“ 57 edge chunks done

âœ“ Graph created successfully in 0.06 seconds

âœ“ Changes successfully submitted in 0.06 seconds
CPU times: user 54.7 ms, sys: 7.04 ms, total: 61.7 ms
Wall time: 122 ms


In [15]:
# Returns the commit history
client.query("CALL db.history()")

Unnamed: 0,commit,nodeCount,edgeCount,partCount
0,c8d5bb6bf5109a99,0,0,0
1,d3f5359c8c55ea40,40,0,1
2,374526511fd88385,0,57,1
3,557b2cda9fc2e87,0,0,0


<div class="alert alert-block alert-info">
    <h2>
        Visualize your graph in TuringDB Graph Visualizer ! Now that your instance is running:
    </h2>
    <h3>
        <ul>
            <li>Go to <a href="https://console.turingdb.ai/databases">TuringDB Console - Database Instances</a></li>
            <li>In your current instance panel, click on "Open Visualizer" button</li>
            <li>Visualizer opens, now you can choose your graph in the dropdown menu at the top-right corner</li>
        </ul>
        You can then play with your graph and visualize the nodes you want !
    </h3>
</div>

# Query TuringDB

## Use metaqueries to have insight on graph overall structure

<h3>
    To learn more about ðŸ“® Metaqueries, please check TuringDB documentation on this <a href="https://turingdb.mintlify.app/query/cypher_subset#%F0%9F%93%AE-metaqueries">link</a>
</h3>

In [16]:
%%time

# CALL propertyTypes() - returns a column of all the different node and edge properties and their types in the database
command = """
CALL db.propertyTypes()
"""
df_propertyTypes = client.query(command)
if df_propertyTypes.empty:
    print("No result found")
else:
    display(df_propertyTypes)

Unnamed: 0,id,propertyType,valueType
0,0,displayName,String
1,1,stIdVersion,String
2,2,name,String
3,3,releaseDate,String
4,4,oldStId,String
5,5,category,String
6,6,stId,String
7,7,schemaClass,String
8,8,speciesName,String
9,9,id,String


CPU times: user 4.25 ms, sys: 22 Î¼s, total: 4.27 ms
Wall time: 3.93 ms


In [17]:
# Get node properties
nodes_properties = df_propertyTypes["propertyType"].values.tolist()
print(f"Node properties: {nodes_properties}")

Node properties: ['displayName', 'stIdVersion', 'name', 'releaseDate', 'oldStId', 'category', 'stId', 'schemaClass', 'speciesName', 'id', 'searchSeed', 'referenceType']


In [18]:
%%time

# CALL labels () - returns a column of all the different node labels
command = """
CALL db.labels()
"""
df_labels = client.query(command)
if df_labels.empty:
    print("No result found")
else:
    display(df_labels)

Unnamed: 0,id,label
0,0,Reaction
1,1,Complex
2,2,BlackBoxEvent
3,3,EntityWithAccessionedSequence
4,4,PositiveGeneExpressionRegulation
5,5,PositiveRegulation


CPU times: user 3.54 ms, sys: 47 Î¼s, total: 3.59 ms
Wall time: 3.26 ms


In [19]:
%%time

# CALL edgeTypes() - returns a column of all the different edge types (edge equivalent of node labels)
command = """
CALL db.edgeTypes()
"""
df_edgeTypes = client.query(command)
if df_edgeTypes.empty:
    print("No result found")
else:
    display(df_edgeTypes)

Unnamed: 0,id,edgeType
0,0,CONNECTED


CPU times: user 3.56 ms, sys: 0 ns, total: 3.56 ms
Wall time: 3.16 ms


## Counts

In [20]:
%%time

# Find number of nodes and number of edges in the graph
n_nodes = len(client.query("MATCH (n) RETURN n"))
n_edges = len(client.query("MATCH (n)-->(m) RETURN n, m"))
print(f"Graph: {n_nodes:,} nodes and {n_edges:,} edges\n")

Graph: 40 nodes and 57 edges

CPU times: user 2.49 ms, sys: 45 Î¼s, total: 2.54 ms
Wall time: 2.24 ms


In [21]:
%%time

# Count all nodes
command = """
MATCH (n)
RETURN COUNT(n)
"""
df_count_nodes = client.query(command)
display(df_count_nodes)

# Count all edges
command = """
MATCH (n)-->()
RETURN COUNT(n)
"""
df_count_edges = client.query(command)
display(df_count_edges)

# Find number of nodes and number of edges in the graph
n_nodes = int(df_count_nodes.loc[0, "COUNT(n)"])
n_edges = int(df_count_edges.loc[0, "COUNT(n)"])
print(f"Graph: {n_nodes:,} nodes and {n_edges:,} edges\n")

Unnamed: 0,COUNT(n)
0,40


Unnamed: 0,COUNT(n)
0,57


Graph: 40 nodes and 57 edges

CPU times: user 5.63 ms, sys: 1.06 ms, total: 6.69 ms
Wall time: 6.07 ms


In [22]:
# Count number of nodes for each label
for label in df_labels["label"]:
    print(100 * '-')
    print(f"label: {label}")
    df_curr_label = client.query(f"""
    MATCH (n:{label})
    RETURN n.name
    """)
    df_curr_label_count = client.query(f"""
    MATCH (n:{label})
    RETURN count(n)
    """)
    display(df_curr_label)
    display(df_curr_label_count)
    
    print()
print(100 * '-')

----------------------------------------------------------------------------------------------------
label: Reaction


Unnamed: 0,n.name
0,[ TP53 binds the PMAIP1 (NOXA) promoter ]
1,[ TP53 binds the APAF1 gene promoter ]
2,[ NRF1:PPARGC1B binds the CYCS promoter ]
3,[ CYCS binds to APAF1 ]
4,[ E2F1 binds APAF1 gene promoter ]
5,[ p38 MAPK phosphorylates PPARGC1A ]
6,"[ NRF1:p-PPARGC1A, NRF2 bind the TFB2M promoter ]"
7,[ E2F1 binds PMAIP1 (NOXA) promoter ]
8,[ Translocation of PMAIP1 (NOXA) to mitochondr...
9,[ BH3-only proteins associate with and inactiv...


Unnamed: 0,count(n)
0,12



----------------------------------------------------------------------------------------------------
label: Complex


Unnamed: 0,n.name
0,"[ E2F1:(TFDP1,TFDP2) ]"
1,"[ APAF1:CYCS , APAF1:Cytochrome C ]"
2,"[ CYCS gene:NRF1:PPARGC1B , NRF1:PGC-1beta:CYCS ]"
3,[ RORA:Coactivator ]
4,"[ p-S15,S20-TP53:EP300:PRMT1:CARM1:GADD45A Gene ]"
5,[ ESRRA:PPARGC1A ]
6,"[ p-S15,S20-TP53 Tetramer ]"
7,"[ p-S15,S20-TP53 Tetramer:PMAIP1 Gene ]"
8,"[ E2F1:TFDP1,TFDP2 , DP1/2:E2F1 ]"


Unnamed: 0,count(n)
0,9



----------------------------------------------------------------------------------------------------
label: BlackBoxEvent


Unnamed: 0,n.name
0,[ APAF1 gene expression is stimulated by E2F1 ...
1,[ Transactivation of PMAIP1 (NOXA) by E2F1 ]
2,"[ Expression of CYCS , Expression of Cytochrom..."
3,[ Expression of NRF1 ]
4,[ TP53 stimulates APAF1 gene expression ]
5,[ TP53 stimulates PMAIP1 (NOXA) expression ]


Unnamed: 0,count(n)
0,6



----------------------------------------------------------------------------------------------------
label: EntityWithAccessionedSequence


Unnamed: 0,n.name
0,"[ E2F1 , Transcription factor E2F1 , E2F-1 , R..."
1,"[ ESRRA , Steroid hormone receptor ERR1 , ERR1..."
2,"[ CYCS , Cytochrome c ]"
3,"[ EP300 , p300 , Histone acetyltransferase p30..."
4,"[ PMAIP1 Gene , NOXA Gene ]"
5,"[ PPARGC1A , Peroxisome proliferator-activated..."
6,"[ NRF1 , Nuclear respiratory factor 1 , NRF1_H..."
7,"[ CYCS , Cytochrome c ]"
8,"[ PMAIP1 , NOXA protein ]"
9,"[ APAF1 , Apaf-1 , Apoptotic protease activati..."


Unnamed: 0,count(n)
0,10



----------------------------------------------------------------------------------------------------
label: PositiveGeneExpressionRegulation


Unnamed: 0,n.name
0,
1,


Unnamed: 0,count(n)
0,2



----------------------------------------------------------------------------------------------------
label: PositiveRegulation


Unnamed: 0,n.name
0,


Unnamed: 0,count(n)
0,1



----------------------------------------------------------------------------------------------------


## Queries

In [23]:
%%time

# Match all edges and return them
command = """
MATCH (n)-[e]->(m)
RETURN n.displayName, e, m.displayName
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    display(df)

Unnamed: 0,n.displayName,e,m.displayName
0,TP53 binds the PMAIP1 (NOXA) promoter,0,"p-S15,S20-TP53 Tetramer [nucleoplasm]"
1,TP53 binds the PMAIP1 (NOXA) promoter,1,PMAIP1 Gene [nucleoplasm]
2,TP53 binds the PMAIP1 (NOXA) promoter,2,"p-S15,S20-TP53 Tetramer:PMAIP1 Gene [nucleoplasm]"
3,TP53 binds the PMAIP1 (NOXA) promoter,3,TP53 stimulates PMAIP1 (NOXA) expression
4,TP53 binds the APAF1 gene promoter,4,TP53 stimulates APAF1 gene expression
5,NRF1:PPARGC1B binds the CYCS promoter,5,Expression of CYCS
6,NRF1:PPARGC1B binds the CYCS promoter,6,Expression of NRF1
7,CYCS binds to APAF1,7,Release of Cytochrome c from mitochondria
8,CYCS binds to APAF1,8,CYCS [cytosol]
9,E2F1 binds APAF1 gene promoter,9,APAF1 gene expression is stimulated by E2F1 an...


CPU times: user 4.73 ms, sys: 974 Î¼s, total: 5.71 ms
Wall time: 5.3 ms


In [24]:
%%time

# Find all nodes of type "EntityWithAccessionedSequence" and referenceType "ReferenceGeneProduct"
command = """
MATCH (n:EntityWithAccessionedSequence)
WHERE n.referenceType = "ReferenceGeneProduct"
RETURN n.displayName, n.schemaClass, n.referenceType
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    display(df)

Unnamed: 0,n.displayName,n.schemaClass,n.referenceType
0,E2F1 [nucleoplasm],EntityWithAccessionedSequence,ReferenceGeneProduct
1,ESRRA [nucleoplasm],EntityWithAccessionedSequence,ReferenceGeneProduct
2,CYCS [cytosol],EntityWithAccessionedSequence,ReferenceGeneProduct
3,EP300 [nucleoplasm],EntityWithAccessionedSequence,ReferenceGeneProduct
4,PPARGC1A [nucleoplasm],EntityWithAccessionedSequence,ReferenceGeneProduct
5,NRF1 [nucleoplasm],EntityWithAccessionedSequence,ReferenceGeneProduct
6,CYCS [mitochondrial intermembrane space],EntityWithAccessionedSequence,ReferenceGeneProduct
7,PMAIP1 [cytosol],EntityWithAccessionedSequence,ReferenceGeneProduct
8,APAF1 [cytosol],EntityWithAccessionedSequence,ReferenceGeneProduct


CPU times: user 3.65 ms, sys: 965 Î¼s, total: 4.62 ms
Wall time: 4.21 ms


In [25]:
%%time

# Count nodes of each schemaClass
command = """
MATCH (n)
RETURN n.schemaClass
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    display(pd.DataFrame(df.value_counts()))

Unnamed: 0_level_0,count
n.schemaClass,Unnamed: 1_level_1
Reaction,12
EntityWithAccessionedSequence,10
Complex,9
BlackBoxEvent,6
PositiveGeneExpressionRegulation,2
PositiveRegulation,1


CPU times: user 4.52 ms, sys: 52 Î¼s, total: 4.57 ms
Wall time: 4.17 ms


In [26]:
%%time

# Find all nodes of type "PositiveGeneExpressionRegulation" (which is also schemaClass)
command = """
MATCH (n:PositiveGeneExpressionRegulation)
RETURN n, n.displayName, n.schemaClass
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    display(df)

Unnamed: 0,n,n.displayName,n.schemaClass
0,37,"Positive gene expression regulation by p-S15,S...",PositiveGeneExpressionRegulation
1,38,Positive gene expression regulation by E2F1:TF...,PositiveGeneExpressionRegulation


CPU times: user 3.91 ms, sys: 92 Î¼s, total: 4 ms
Wall time: 3.61 ms


In [27]:
%%time

# Find all edges involving a node of type Complex (undirected edge get both directions of directed edges)
command = """
MATCH (n:Complex)-[e]-(m)
RETURN n.displayName, n.schemaClass, e, m.displayName, m.schemaClass, m.category
"""
df = client.query(command)

df

CPU times: user 1.83 ms, sys: 0 ns, total: 1.83 ms
Wall time: 1.55 ms


Unnamed: 0,n.displayName,n.schemaClass,e,m.displayName,m.schemaClass,m.category
0,"E2F1:(TFDP1,TFDP2) [nucleoplasm]",Complex,20,E2F1 binds APAF1 gene promoter,Reaction,binding
1,"E2F1:(TFDP1,TFDP2) [nucleoplasm]",Complex,39,E2F1 [nucleoplasm],EntityWithAccessionedSequence,
2,APAF1:CYCS [cytosol],Complex,42,CYCS [cytosol],EntityWithAccessionedSequence,
3,APAF1:CYCS [cytosol],Complex,54,APAF1 [cytosol],EntityWithAccessionedSequence,
4,CYCS gene:NRF1:PPARGC1B [nucleoplasm],Complex,21,NRF1:PPARGC1B binds the CYCS promoter,Reaction,binding
5,CYCS gene:NRF1:PPARGC1B [nucleoplasm],Complex,22,NRF1 [nucleoplasm],EntityWithAccessionedSequence,
6,RORA:Coactivator [nucleoplasm],Complex,23,EP300 [nucleoplasm],EntityWithAccessionedSequence,
7,RORA:Coactivator [nucleoplasm],Complex,49,PPARGC1A [nucleoplasm],EntityWithAccessionedSequence,
8,"p-S15,S20-TP53:EP300:PRMT1:CARM1:GADD45A Gene ...",Complex,44,EP300 [nucleoplasm],EntityWithAccessionedSequence,
9,"p-S15,S20-TP53:EP300:PRMT1:CARM1:GADD45A Gene ...",Complex,24,"p-S15,S20-TP53 Tetramer [nucleoplasm]",Complex,


In [28]:
%%time

# Find all edges with a node of category "binding" going to an other node
command = """
MATCH (n)-[e]->(m)
WHERE n.category = "binding"
RETURN n.displayName, n.schemaClass, n.category, e, m.displayName, m.schemaClass, m.category
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    display(df)

Unnamed: 0,n.displayName,n.schemaClass,n.category,e,m.displayName,m.schemaClass,m.category
0,TP53 binds the PMAIP1 (NOXA) promoter,Reaction,binding,0,"p-S15,S20-TP53 Tetramer [nucleoplasm]",Complex,
1,TP53 binds the PMAIP1 (NOXA) promoter,Reaction,binding,1,PMAIP1 Gene [nucleoplasm],EntityWithAccessionedSequence,
2,TP53 binds the PMAIP1 (NOXA) promoter,Reaction,binding,2,"p-S15,S20-TP53 Tetramer:PMAIP1 Gene [nucleoplasm]",Complex,
3,TP53 binds the PMAIP1 (NOXA) promoter,Reaction,binding,3,TP53 stimulates PMAIP1 (NOXA) expression,BlackBoxEvent,omitted
4,TP53 binds the APAF1 gene promoter,Reaction,binding,4,TP53 stimulates APAF1 gene expression,BlackBoxEvent,omitted
5,NRF1:PPARGC1B binds the CYCS promoter,Reaction,binding,5,Expression of CYCS,BlackBoxEvent,omitted
6,NRF1:PPARGC1B binds the CYCS promoter,Reaction,binding,6,Expression of NRF1,BlackBoxEvent,omitted
7,CYCS binds to APAF1,Reaction,binding,7,Release of Cytochrome c from mitochondria,Reaction,transition
8,CYCS binds to APAF1,Reaction,binding,8,CYCS [cytosol],EntityWithAccessionedSequence,
9,E2F1 binds APAF1 gene promoter,Reaction,binding,9,APAF1 gene expression is stimulated by E2F1 an...,BlackBoxEvent,omitted


CPU times: user 6.19 ms, sys: 20 Î¼s, total: 6.21 ms
Wall time: 5.76 ms


## Complex queries

In [29]:
def build_query_chain(
    hop_count: int,
    start_node_label: str = None,
    start_node_property: str = None,
    start_node_value: str = None,
    end_node_label: str = None,
    end_node_property: str = None,
    end_node_value: str = None,
    edge_type: str = None,
    intermediate_node_label: str = None,
    return_properties: list = None,
) -> tuple[str, list[str]]:
    """
    Build a general query to find chains between nodes.

    Parameters:
    -----------
    hop_count : int
        Number of hops/edges in the path (REQUIRED)
    start_node_label : str, optional
        Label of the starting node (e.g., 'Process', 'Station')
    start_node_property : str, optional
        Property name to match on start node (e.g., 'displayName', 'id')
    start_node_value : str, optional
        Value to match for start node
    end_node_label : str, optional
        Label of the ending node (None for any node)
    end_node_property : str, optional
        Property name to match on end node
    end_node_value : str, optional
        Value to match for end node
    edge_type : str, optional
        Type of edges to traverse (None for any edge type)
    intermediate_node_label : str, optional
        Label constraint for intermediate nodes (None for any label)
    return_properties : list, optional
        List of property names to return for nodes (default: ['displayName'])

    Returns:
    --------
    tuple : (query_string, column_names)
    """

    if return_properties is None:
        return_properties = ["displayName"]

    # Build MATCH clause
    query = "MATCH "

    # Start node - build based on what's provided
    start_parts = []
    if start_node_label:
        start_parts.append(f":{start_node_label}")
    if start_node_property and start_node_value:
        start_parts.append(f'{{{start_node_property}:"{start_node_value}"}}')

    if start_parts:
        query += f"(first{''.join(start_parts)})"
    else:
        query += "(first)"

    # Intermediate nodes and edges
    for k in range(1, hop_count + 1):
        # Edge
        if edge_type:
            query += f"-[e{k}:{edge_type}]->"
        else:
            query += f"-[e{k}]->"

        # Last hop - end node
        if k == hop_count:
            end_parts = []
            if end_node_label:
                end_parts.append(f":{end_node_label}")
            if end_node_property and end_node_value:
                end_parts.append(f'{{{end_node_property}:"{end_node_value}"}}')

            if end_parts:
                query += f"(last{''.join(end_parts)})"
            else:
                query += "(last)"
        # Intermediate nodes
        else:
            if intermediate_node_label:
                query += f"(n{k}:{intermediate_node_label})"
            else:
                query += f"(n{k})"

    # Build RETURN clause
    query += " RETURN first"
    column_names = ["first"]

    # Add start node properties
    for prop in return_properties:
        query += f", first.{prop}"
        column_names.append(f"first.{prop}")

    # Add intermediate nodes/edges
    for k in range(1, hop_count + 1):
        query += f", e{k}"
        column_names.append(f"e{k}")

        if k == hop_count:
            query += ", last"
            column_names.append("last")
            for prop in return_properties:
                query += f", last.{prop}"
                column_names.append(f"last.{prop}")
        else:
            query += f", n{k}"
            column_names.append(f"n{k}")
            for prop in return_properties:
                query += f", n{k}.{prop}"
                column_names.append(f"n{k}.{prop}")

    return query, column_names

In [30]:
%%time

# Find all paths (with maximum length of 15 hops) between a node of type "EntityWithAccessionedSequence" and a node of type "BlackBoxEvent"
max_hops = 15
longest_df = None

for hop in range(1, max_hops):
    print(100 * "*")
    print(f"{hop} hop(s) :\n")

    query, cols = build_query_chain(
        hop_count=hop,
        start_node_label="EntityWithAccessionedSequence",
        end_node_label="BlackBoxEvent",
        return_properties=["id", "displayName", "schemaClass", "category"],
    )
    #print(query)
    
    # Use with client
    df = client.query(query)
    if df.empty:
        print("No result found")
    else:
        df.columns = cols
        display(df)
        longest_df = df

print(100 * "*")

****************************************************************************************************
1 hop(s) :



Unnamed: 0,first,first.id,first.displayName,first.schemaClass,first.category,e1,last,last.id,last.displayName,last.schemaClass,last.category
0,29,CYCS [cytosol],CYCS [cytosol],EntityWithAccessionedSequence,,43,23,Expression of CYCS,Expression of CYCS,BlackBoxEvent,omitted
1,31,PMAIP1 Gene [nucleoplasm],PMAIP1 Gene [nucleoplasm],EntityWithAccessionedSequence,,46,26,TP53 stimulates PMAIP1 (NOXA) expression,TP53 stimulates PMAIP1 (NOXA) expression,BlackBoxEvent,omitted
2,31,PMAIP1 Gene [nucleoplasm],PMAIP1 Gene [nucleoplasm],EntityWithAccessionedSequence,,47,22,Transactivation of PMAIP1 (NOXA) by E2F1,Transactivation of PMAIP1 (NOXA) by E2F1,BlackBoxEvent,omitted
3,33,NRF1 [nucleoplasm],NRF1 [nucleoplasm],EntityWithAccessionedSequence,,53,24,Expression of NRF1,Expression of NRF1,BlackBoxEvent,omitted


****************************************************************************************************
2 hop(s) :



Unnamed: 0,first,first.id,first.displayName,first.schemaClass,first.category,e1,n1,n1.id,n1.displayName,n1.schemaClass,n1.category,e2,last,last.id,last.displayName,last.schemaClass,last.category
0,33,NRF1 [nucleoplasm],NRF1 [nucleoplasm],EntityWithAccessionedSequence,,51,6,"NRF1:p-PPARGC1A, NRF2 bind the TFB2M promoter","NRF1:p-PPARGC1A, NRF2 bind the TFB2M promoter",Reaction,binding,13,24,Expression of NRF1,Expression of NRF1,BlackBoxEvent,omitted
1,33,NRF1 [nucleoplasm],NRF1 [nucleoplasm],EntityWithAccessionedSequence,,52,2,NRF1:PPARGC1B binds the CYCS promoter,NRF1:PPARGC1B binds the CYCS promoter,Reaction,binding,5,23,Expression of CYCS,Expression of CYCS,BlackBoxEvent,omitted
2,33,NRF1 [nucleoplasm],NRF1 [nucleoplasm],EntityWithAccessionedSequence,,52,2,NRF1:PPARGC1B binds the CYCS promoter,NRF1:PPARGC1B binds the CYCS promoter,Reaction,binding,6,24,Expression of NRF1,Expression of NRF1,BlackBoxEvent,omitted


****************************************************************************************************
3 hop(s) :



Unnamed: 0,first,first.id,first.displayName,first.schemaClass,first.category,e1,n1,n1.id,n1.displayName,n1.schemaClass,...,n2.id,n2.displayName,n2.schemaClass,n2.category,e3,last,last.id,last.displayName,last.schemaClass,last.category
0,27,E2F1 [nucleoplasm],E2F1 [nucleoplasm],EntityWithAccessionedSequence,,39,12,"E2F1:(TFDP1,TFDP2) [nucleoplasm]","E2F1:(TFDP1,TFDP2) [nucleoplasm]",Complex,...,E2F1 binds APAF1 gene promoter,E2F1 binds APAF1 gene promoter,Reaction,binding,9,21,APAF1 gene expression is stimulated by E2F1 an...,APAF1 gene expression is stimulated by E2F1 an...,BlackBoxEvent,omitted
1,31,PMAIP1 Gene [nucleoplasm],PMAIP1 Gene [nucleoplasm],EntityWithAccessionedSequence,,46,26,TP53 stimulates PMAIP1 (NOXA) expression,TP53 stimulates PMAIP1 (NOXA) expression,BlackBoxEvent,...,Translocation of PMAIP1 (NOXA) to mitochondria,Translocation of PMAIP1 (NOXA) to mitochondria,Reaction,transition,16,22,Transactivation of PMAIP1 (NOXA) by E2F1,Transactivation of PMAIP1 (NOXA) by E2F1,BlackBoxEvent,omitted
2,33,NRF1 [nucleoplasm],NRF1 [nucleoplasm],EntityWithAccessionedSequence,,51,6,"NRF1:p-PPARGC1A, NRF2 bind the TFB2M promoter","NRF1:p-PPARGC1A, NRF2 bind the TFB2M promoter",Reaction,...,p38 MAPK phosphorylates PPARGC1A,p38 MAPK phosphorylates PPARGC1A,Reaction,transition,10,24,Expression of NRF1,Expression of NRF1,BlackBoxEvent,omitted


****************************************************************************************************
4 hop(s) :

No result found
****************************************************************************************************
5 hop(s) :

No result found
****************************************************************************************************
6 hop(s) :

No result found
****************************************************************************************************
7 hop(s) :



Unnamed: 0,first,first.id,first.displayName,first.schemaClass,first.category,e1,n1,n1.id,n1.displayName,n1.schemaClass,...,n6.id,n6.displayName,n6.schemaClass,n6.category,e7,last,last.id,last.displayName,last.schemaClass,last.category
0,31,PMAIP1 Gene [nucleoplasm],PMAIP1 Gene [nucleoplasm],EntityWithAccessionedSequence,,47,22,Transactivation of PMAIP1 (NOXA) by E2F1,Transactivation of PMAIP1 (NOXA) by E2F1,BlackBoxEvent,...,E2F1 binds APAF1 gene promoter,E2F1 binds APAF1 gene promoter,Reaction,binding,9,21,APAF1 gene expression is stimulated by E2F1 an...,APAF1 gene expression is stimulated by E2F1 an...,BlackBoxEvent,omitted


****************************************************************************************************
8 hop(s) :

No result found
****************************************************************************************************
9 hop(s) :



Unnamed: 0,first,first.id,first.displayName,first.schemaClass,first.category,e1,n1,n1.id,n1.displayName,n1.schemaClass,...,n8.id,n8.displayName,n8.schemaClass,n8.category,e9,last,last.id,last.displayName,last.schemaClass,last.category
0,31,PMAIP1 Gene [nucleoplasm],PMAIP1 Gene [nucleoplasm],EntityWithAccessionedSequence,,46,26,TP53 stimulates PMAIP1 (NOXA) expression,TP53 stimulates PMAIP1 (NOXA) expression,BlackBoxEvent,...,E2F1 binds APAF1 gene promoter,E2F1 binds APAF1 gene promoter,Reaction,binding,9,21,APAF1 gene expression is stimulated by E2F1 an...,APAF1 gene expression is stimulated by E2F1 an...,BlackBoxEvent,omitted


****************************************************************************************************
10 hop(s) :

No result found
****************************************************************************************************
11 hop(s) :

No result found
****************************************************************************************************
12 hop(s) :

No result found
****************************************************************************************************
13 hop(s) :

No result found
****************************************************************************************************
14 hop(s) :

No result found
****************************************************************************************************
CPU times: user 109 ms, sys: 2.34 ms, total: 112 ms
Wall time: 112 ms


# Create subgraph to visualise

In [31]:
# Get subgraph
subset_nodes = longest_df.filter(regex="id$", axis=1).iloc[0].values.tolist()
subG = G.subgraph(subset_nodes).copy()
print(subG)

# Build CREATE command from subgraph
create_command_subG = build_create_command_from_networkx(subG)
print(f"""
Cypher CREATE command :
* size: {len(create_command_subG.encode('utf-8'))/1024/1000:.4f} MB\n
{100 * '*'}
{create_command_subG \
if len(create_command_subG.split("\n")) < 10000 \
else "\n".join(create_command_subG.split('\n')[:5]) + "\n...\n" + "\n".join(create_command_subG.split('\n')[-5:])}
{100 * '*'}
""")

MultiGraph with 10 nodes and 10 edges
Cypher query will create graph with 10 nodes and 10 edges

Cypher CREATE command :
* size: 0.0049 MB

****************************************************************************************************
CREATE (:Node {id: "Translocation of PMAIP1 (NOXA) to mitochondria", schemaClass: "Reaction", stId: "R-HSA-140216", oldStId: "REACT_1585", releaseDate: "2004-10-27", name: "[ Translocation of PMAIP1 (NOXA) to mitochondria ]", stIdVersion: "R-HSA-140216.4", speciesName: "Homo sapiens", category: "transition", displayName: "Translocation of PMAIP1 (NOXA) to mitochondria"}),
(:Node {id: "E2F1:(TFDP1,TFDP2) [nucleoplasm]", schemaClass: "Complex", stId: "R-HSA-9007512", name: "[ E2F1:(TFDP1,TFDP2) ]", stIdVersion: "R-HSA-9007512.1", speciesName: "Homo sapiens", displayName: "E2F1:(TFDP1,TFDP2) [nucleoplasm]"}),
(:Node {id: "E2F1 binds APAF1 gene promoter", schemaClass: "Reaction", stId: "R-HSA-9007514", releaseDate: "2017-09-12", name: "[ E2F1 binds APAF

In [32]:
subgraph_name = f"{graph_name}_subgraph"
subgraph_name

'reactome1_subgraph'

In [33]:
%%time

# Set graph
try:
    client.create_graph(subgraph_name)
except TuringDBException as e:
    print(e)

# Set working graph
client.set_graph(subgraph_name)

# Create a new change on the graph
client.checkout()
change = client.new_change()
print(f"Current change {change}")

# Checkout into the change
client.checkout(change=change)

Current change 0
CPU times: user 2.82 ms, sys: 0 ns, total: 2.82 ms
Wall time: 9.19 ms


In [34]:
%%time

chunks = split_cypher_commands(create_command_subG, max_size_mb=1)

print(f"âœ“ Split into {len(chunks['node_chunks'])} node chunk(s) and {len(chunks['edge_chunks'])} edge chunk(s)")

print("\nNode chunks:")
for i, chunk in enumerate(chunks['node_chunks']):
    print(f"  Node chunk {i+1}: {len(chunk.encode('utf-8'))/1024:.1f} KB")
    if i == 10:
        print("  ...")
        break

print("\nEdge chunks:")
for i, chunk in enumerate(chunks['edge_chunks']):
    print(f"  Edge chunk {i+1}: {len(chunk.encode('utf-8'))/1024:.1f} KB")
    if i == 10:
        print("  ...")
        break

âœ“ Split into 1 node chunk(s) and 10 edge chunk(s)

Node chunks:
  Node chunk 1: 3.3 KB

Edge chunks:
  Edge chunk 1: 0.2 KB
  Edge chunk 2: 0.2 KB
  Edge chunk 3: 0.1 KB
  Edge chunk 4: 0.1 KB
  Edge chunk 5: 0.2 KB
  Edge chunk 6: 0.1 KB
  Edge chunk 7: 0.1 KB
  Edge chunk 8: 0.1 KB
  Edge chunk 9: 0.2 KB
  Edge chunk 10: 0.2 KB
CPU times: user 496 Î¼s, sys: 0 ns, total: 496 Î¼s
Wall time: 449 Î¼s


In [35]:
%%time

# Run CREATE command
print("\nExecuting query on TuringDB...")
start_time = time.time()

print(f"âœ“ Split into {len(chunks['node_chunks'])} node chunk(s) and {len(chunks['edge_chunks'])} edge chunk(s)")

# CREATE nodes
print("\nNode chunks:")
for i, chunk in enumerate(tqdm(chunks['node_chunks'])):
    result = client.query(chunk)
# Commit the change
client.query("COMMIT")
print(f"âœ“ {len(chunks['node_chunks'])} node chunks done")

# CREATE edges
print("\nEdge chunks:")
for i, chunk in enumerate(tqdm(chunks['edge_chunks'])):
    result = client.query(chunk)
# Commit the change
client.query("COMMIT")
print(f"âœ“ {len(chunks['edge_chunks'])} edge chunks done")

execution_time = time.time() - start_time
print(f"\nâœ“ Graph created successfully in {execution_time:.2f} seconds")

# Submit changes
start_time = time.time()
client.query("CHANGE SUBMIT")
execution_time = time.time() - start_time
print(f"\nâœ“ Changes successfully submitted in {execution_time:.2f} seconds")

# Checkout into main
client.checkout()


Executing query on TuringDB...
âœ“ Split into 1 node chunk(s) and 10 edge chunk(s)

Node chunks:


  0%|          | 0/1 [00:00<?, ?it/s]

âœ“ 1 node chunks done

Edge chunks:


  0%|          | 0/10 [00:00<?, ?it/s]

âœ“ 10 edge chunks done

âœ“ Graph created successfully in 0.02 seconds

âœ“ Changes successfully submitted in 0.06 seconds
CPU times: user 25.7 ms, sys: 5.2 ms, total: 30.9 ms
Wall time: 89.6 ms


<div class="alert alert-block alert-info">
    <h2>
        You can visualise the subgraph directly in the notebook below. For more details on nodes and edges, you can go to TuringDB visualizer (running on your instance)
    </h2>
</div>

<div class="alert alert-block alert-info">
    <h2>
        Visualize your graph in TuringDB Graph Visualizer ! Now that your instance is running:
    </h2>
    <h3>
        <ul>
            <li>Go to <a href="https://console.turingdb.ai/databases">TuringDB Console - Database Instances</a></li>
            <li>In your current instance panel, click on "Open Visualizer" button</li>
            <li>Visualizer opens, now you can choose your graph in the dropdown menu at the top-right corner</li>
        </ul>
        You can then play with your graph and visualize the nodes you want !
    </h3>
</div>

In [36]:
from pyvis.network import Network
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

net = Network(
    height="750px",
    width="100%",
    notebook=True,
    bgcolor="#ffffff",
    font_color="#000000",
    directed=True,
)

# Choose your palette (can be changed easily)
palette_name = (
    "Pastel1"  # Options: 'tab10', 'Set3', 'Paired', 'viridis', 'plasma', etc.
)

# Get unique node types
unique_types = list(set(data.get("schemaClass") for _, data in G.nodes(data=True)))

# Get colors from matplotlib palette
cmap = plt.get_cmap(palette_name)
colors = [
    mcolors.rgb2hex(cmap(i / len(unique_types))) for i in range(len(unique_types))
]

# Map types to colors
type_colors = {node_type: colors[i] for i, node_type in enumerate(unique_types)}

# Then use in your visualization
for node, data in subG.nodes(data=True):
    node_type = data.get("schemaClass", "Unknown")
    color = type_colors.get(node_type, "#95a5a6")

    net.add_node(
        node,
        label=data.get("displayName", str(node)),
        title=f"{data.get('displayName', '')}",
        color=color,
        size=25,
    )

for source, target, data in subG.edges(data=True):
    net.add_edge(source, target, color="#95a5a6", width=3)

net.toggle_physics(status=True)
net.show(f"{example_name}_subgraph.html")

reactome_subgraph.html


# Use LLM to generate Cypher query

Before running this section, create a `.env` file in the project root with your API keys:

```env
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
MISTRAL_API_KEY=your_key_here

In [37]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv(override=True)

True

In [38]:
api_keys = {
    "Anthropic": os.getenv("ANTHROPIC_API_KEY"),
    "Mistral": os.getenv("MISTRAL_API_KEY"),
    "OpenAI": os.getenv("OPENAI_API_KEY"),
}

In [39]:
"""Build system prompt with TuringDB schema and examples"""

turingdb_cypher_system_prompt = """
You are an expert at converting natural language questions into TuringDB queries.

Your task is to generate syntactically correct TuringDB queries based on natural language input.

VERY IMPORTANT - YOU MUST FOLLOW THESE REQUESTS - TuringDB Syntax Guidelines:
1. Return ONLY the TuringDB query, no explanations or markdown formatting
2. Use MATCH, CREATE and WHERE operations only
3. Nodes: (n:Label {property = "value"}) or (n:Label {property: value})
4. Edges: Use DIRECTED syntax with ->
5. Pattern matching: MATCH (n)-[e]->(m)
6. Property matching: Use = operator for exact matching
7. Multiple constraints: (n:Person:Engineer {name = "John", age = 30})
8. Return all matched entities: RETURN n, e, m or use RETURN * for all
9. Filter using WHERE clause: MATCH (n:Person) WHERE n.name = 'John' RETURN n.firstname, n.lastname

VERY IMPORTANT - YOU ARE NOT ALLOWED TO USE THE FOLLOWING - FORBIDDEN in TuringDB:
- Do NOT use AS aliases
- Do NOT use LIMIT, SKIP clauses
- Do NOT use WITH clauses
- Do NOT use CALL (except for metaqueries)
- Do NOT use toLower() or other functions
- Do NOT use wildcard character (*)
- Do NOT use multi-hops pattern for edges: e.g. `-[e:CONNECTED*1..10]->`
- Do NOT use "end" or "s3" variable name

Supported TuringDB Operations:
- MATCH queries: MATCH (n:Label)-[e:Type]->(m) RETURN n, m
- CREATE queries: CREATE (n:Label{property="value"})-[e:Type]->(m:Label)
- Metaqueries: CALL db.propertyTypes(), CALL db.labels(), CALL db.edgeTypes()
- Property types: String ("text" or `text`), Boolean (true/false), Integer (20), Double (20.5)

Examples for few-shot learning:
- Find all persons: MATCH (n:Person) RETURN n
- Find connections: MATCH (n:Person)-[e]->(m:Person) RETURN n, e, m
- Create person: CREATE (n:Person{name="John", age=30})
- Match person with specific name: MATCH (p:Person) WHERE p.name = "John" RETURN p
- Path with 1 hop between Station Paddington and Blackfriars:  MATCH (first:Station{displayName:"Paddington"})-[e1:CONNECTED]->(last:Station{displayName="Blackfriars"}) RETURN start, start.displayName, start.Note, e1.Line, last, last.displayName, last.Note
- Path with 2 hops between Station Paddington and Blackfriars: MATCH (first:Station{displayName:"Paddington"})-[e1:CONNECTED]->(s1:Station)-[e2:CONNECTED]->(last:Station{displayName="Blackfriars"}) RETURN start, start.displayName, start.Note, e1.Line, s1, s1.displayName, s1.Note, e2.Line, last, last.displayName, last.Note
- Path with 8 hops between Station Paddington and Blackfriars: MATCH (first:Station{displayName:"Paddington"})-[e1:CONNECTED]->(s1:Station)-[e2:CONNECTED]->(s2:Station)-[e3:CONNECTED]->(s3:Station)-[e4:CONNECTED]->(s4:Station)-[e5:CONNECTED]->(s5:Station)-[e6:CONNECTED]->(s6:Station)-[e7:CONNECTED]->(s7:Station)-[e8:CONNECTED]->(last:Station{displayName="Blackfriars"}) RETURN start, start.displayName, start.Note, e1.Line, s1, s1.displayName, s1.Note, e2.Line, s2, s2.displayName, s2.Note, e3.Line, s3, s3.displayName, s3.Note, e4.Line, s4, s4.displayName, s4.Note, e5.Line, s5, s5.displayName, s5.Note, e6.Line, s6, s6.displayName, s6.Note, e7.Line, s7, s7.displayName, s7.Note, e8.Line, last, last.displayName, last.Note
- Find all Chinese providers and what they supply: MATCH (n{provider_country:"CHN"}) RETURN n, n.provider_name, n.displayName, n.share_provided, n.type
- Find all deposition tools and their types: MATCH (specific)-[e:IS_TYPE_OF]->(general:Tool_Resource{displayName:"Deposition tools"}) RETURN specific, specific.displayName, specific.provider_name, e, general, general.displayName
"""

In [40]:
# Get subset of CREATE command to avoid exceeding context window
create_command_subset = create_command_subG.split("\n")[:5] + create_command_subG.split("\n")[-5:]

# Create system_prompt
system_prompt = f"""
TuringDB Cypher prompt :
{turingdb_cypher_system_prompt}

Here is a subset of the CREATE command used to create the graph, this way you know graph structure.
Only a subset is passed because the whole command is to long :
{create_command_subset}

Here is also the output of "CALL LABELS ()" command, showing the different node types of the graph :
{client.query("CALL db.labels()")}

Here is also the output of "CALL EDGETYPES ()" command, showing the different edge types of the graph :
{client.query("CALL db.edgeTypes()")}

Very important :
- You MUST follow current TuringDB Syntax Guidelines
- You MUST NOT USE what is FORBIDDEN in TuringDB
- By default, RETURN ALL THE MATCHED NODES AND EDGES AND THEIR PROPERTIES in the RETURN section (except contrary demand from user)
- Use the correct node and edge properties name in the MATCH section.
- Use the correct node and edge properties name in the RETURN section.
- Pay attention to which properties come from nodes or edges, to create a functioning query
- Pay attention to lower and uppercases in properties
- If some properties contain spaces, be careful to wrap them

Give me the query FOLLOWING TURINGDB GUIDELINES AND NOT USING WHAT IS FORBIDDEN for this specific question :
"""

In [41]:
question = """
Is there a path of any size linking gene PMAIP1 and gene E2F1 ?
"""

In [42]:
%%time

provider = "OpenAI"

cypher_query = natural_language_to_cypher(
    question=question,
    system_prompt=system_prompt,
    provider=provider,
    api_key=api_keys[provider],
    temperature=0.0,
)
print(f"cypher_query : {cypher_query}")

cypher_query : MATCH (n:Node {id: "PMAIP1 Gene [nucleoplasm]"})-[e:CONNECTED]->(m:Node {id: "E2F1 [nucleoplasm]"}) RETURN n, e, m
CPU times: user 363 ms, sys: 52.9 ms, total: 416 ms
Wall time: 2.68 s


In [44]:
%%time

# Set original graph
client.set_graph(graph_name)

try:
    df_path = client.query(cypher_query)

    if df_path.empty:
        print("--> No result found\n")
    else:
        display(df_path)

except TuringDBException:
    print(f"Query generated by LLM not supported.")

Query generated by LLM not supported.
CPU times: user 106 Î¼s, sys: 2.87 ms, total: 2.97 ms
Wall time: 2.1 ms


# Use LLM to get subgraph summary

In [45]:
%%time

prompt = f"""
Give me a summary of this graph. It represents biological entities, tell me more about the entities involved and the interactions.
Here is the graph :
{G.nodes(data=True)} {G.edges(data=True)}
"""

system_prompt = """
You are a specialist in analysing graphs and their structure.
You will use your knowledge to add more information about the entities and relationships in the graph.
Add information only when you are sure it is relevant.
"""

provider = "OpenAI"

response = query_llm(
    prompt=prompt,
    system_prompt=system_prompt,
    provider=provider,
    api_key=api_keys[provider],
    temperature=0.0,
)

CPU times: user 105 ms, sys: 15.1 ms, total: 120 ms
Wall time: 17 s


In [46]:
display(Markdown(response))

The graph represents a complex network of interactions involving various biological entities primarily related to gene regulation, apoptosis, and cellular signaling in *Homo sapiens*. Hereâ€™s a summary of the key entities and their interactions:

### Key Entities:
1. **TP53 (Tumor Protein p53)**: A crucial tumor suppressor protein that regulates the cell cycle and functions in preventing cancer. It binds to the promoters of various genes, including PMAIP1 and APAF1, to stimulate their expression.
   - **p-S15,S20-TP53 Tetramer**: A phosphorylated form of TP53 that is active in gene regulation.

2. **PMAIP1 (NOXA)**: A pro-apoptotic gene that is regulated by TP53. It plays a role in apoptosis and is translocated to mitochondria to promote cell death.

3. **APAF1 (Apoptotic Protease Activating Factor 1)**: A key component in the apoptosome that activates caspases, leading to apoptosis. Its expression is stimulated by TP53 and E2F1.

4. **CYCS (Cytochrome c)**: A protein involved in the electron transport chain and apoptosis. It binds to APAF1 and is released from mitochondria during apoptosis.

5. **E2F1**: A transcription factor that regulates the expression of genes involved in cell cycle progression and apoptosis. It binds to the promoters of PMAIP1 and APAF1.

6. **NRF1 (Nuclear Respiratory Factor 1)**: A transcription factor that regulates the expression of genes involved in mitochondrial biogenesis and function, including CYCS.

7. **PPARGC1A (Peroxisome Proliferator-Activated Receptor Gamma Coactivator 1-alpha)**: A coactivator that regulates genes involved in energy metabolism and mitochondrial function.

8. **ESRRA (Estrogen-Related Receptor Alpha)**: A transcription factor that positively regulates the expression of various genes, including those involved in mitochondrial function.

### Key Interactions:
- **TP53 Binding**: TP53 binds to the promoters of PMAIP1 and APAF1, stimulating their expression, which is crucial for apoptosis.
- **E2F1 Regulation**: E2F1 also binds to the promoters of PMAIP1 and APAF1, indicating a collaborative role in regulating apoptosis and cell cycle.
- **CYCS and APAF1 Interaction**: CYCS binds to APAF1, leading to the formation of the apoptosome and subsequent activation of caspases.
- **Translocation Events**: PMAIP1 is translocated to mitochondria, where it can exert its pro-apoptotic effects, and CYCS is released from mitochondria during apoptosis.
- **Positive Regulation**: The interactions between TP53, E2F1, and other transcription factors like NRF1 and ESRRA indicate a complex regulatory network that ensures proper gene expression in response to cellular stress or damage.

### Conclusion:
This graph illustrates a network of interactions that highlight the roles of TP53, E2F1, and other transcription factors in regulating apoptosis and gene expression in response to cellular signals. The interplay between these entities is crucial for maintaining cellular homeostasis and preventing tumorigenesis.

In [47]:
print("Notebook finished !")

Notebook finished !
