**1. Converting OWL to a graph**: 

Remember that you must convert your OWL file into a heterogeneous graph before using HAN.
This means defining nodes such as patients, proteins, etc., and the relationships between them.  
You can use libraries like `RDFlib` in Python to extract triples from the OWL file and then organize the data into HeteroData for PyTorch Geometric.

`GSE54514_enriched_ontology_degfilter_v2.10_ovp0.2_ng4_stats.csv`   
`GSE54514_enriched_ontology_degfilter_v2.10_ovp0.2_ng4.owl`


In [7]:
import os
os.chdir('..') if os.getcwd().endswith('notebooks') else None
owl_file='output/GSE54514_enriched_ontology_degfilter_v2.10_ovp0.2_ng4.owl'
stats_file='output/GSE54514_enriched_ontology_degfilter_v2.10_ovp0.2_ng4_stats.csv'

In [6]:
!head output/GSE54514_enriched_ontology_degfilter_v2.10_ovp0.2_ng4_stats.csv -n 10

type;name;count
Class;GO_0003674;41
Class;GO_0005575;120
Class;GO_0008150;301
Class;GO_0006281;2
Class;GO_1902494;7
Class;GO_0005829;1
Class;GO_0065007;124
Class;GO_0051052;1
Class;GO_0000278;1


In [5]:
!head output/GSE54514_enriched_ontology_degfilter_v2.10_ovp0.2_ng4.owl -n 20

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
         xml:base="http://purl.obolibrary.org/obo/go.owl"
         xmlns="http://purl.obolibrary.org/obo/go.owl#"
         xmlns:oboI="http://www.geneontology.org/formats/oboInOwl#"
         xmlns:obo="http://purl.obolibrary.org/obo/"
         xmlns:term="http://purl.org/dc/terms/"
         xmlns:x_1.1="http://purl.org/dc/elements/1.1/">

<owl:Ontology rdf:about="http://purl.obolibrary.org/obo/go.owl">
  <owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/go/releases/2025-07-22/go.owl"/>
  <obo:IAO_0000700 rdf:resource="http://purl.obolibrary.org/obo/GO_0003674"/>
  <obo:IAO_0000700 rdf:resource="http://purl.obolibrary.org/obo/GO_0005575"/>
  <obo:IAO_0000700 rdf:resource="http://purl.obolibrary.org/obo/GO_0008150"/>
  <term:li

In [10]:
import rdflib
from rdflib import Graph, URIRef, Literal
from rdflib.namespace import RDF, RDFS, OWL
# Remember that you must convert your OWL file into a heterogeneous graph before using HAN.
# This means defining nodes such as patients, proteins, etc., and the relationships between them.  
# You can use libraries like `RDFlib` in Python to extract triples from the OWL file

g = Graph()
g.parse(owl_file)


<Graph identifier=Nedb95f41a6b741b3b30d9bb12c87294d (<class 'rdflib.graph.Graph'>)>

In [17]:
print(f"Graph has {len(g)} triples")
for s, p, o in g:
    print(s, p, o)
    break

Graph has 3391207 triples
N2b3f9f272e2a427d84cce2404cf52367 http://www.w3.org/2002/07/owl#annotatedProperty http://www.geneontology.org/formats/oboInOwl#hasDbXref


In [20]:
entity_types = {}

for s, p, o in g.triples((None, RDF.type, None)):
    entity_types[s] = o

In [23]:
node_types = set(entity_types.values())
for t in node_types:
    print("Node type:", t)
    break

Node type: http://purl.obolibrary.org/obo/GO_0043168


In [26]:
predicates = set()

for s, p, o in g:
    predicates.add(p)

print("predicates:")
for p in predicates:
    print("-", p)
    break

predicates:
- http://www.w3.org/2002/07/owl#annotatedTarget


In [40]:
node_types

# -- keep only the names of the node types --
node_type_names = {str(t).split('/')[-1].split('#')[-1] for t in node_types}
node_type_names
node_types={node_type_name:[] for node_type_name in node_type_names}

In [42]:
for entity, t in entity_types.items():
    t_str = str(t).split("#")[-1]
    if t_str in node_types:
        node_types[t_str].append(entity)


In [58]:
# node_types

In [44]:
edges = {}

for s, p, o in g:
    p_str = str(p).split("#")[-1]
    edges.setdefault(p_str, []).append((s, o))

In [50]:
all_nodes = set()

for s, p, o in g:
    all_nodes.add(s)
    all_nodes.add(o)

In [51]:
entity_types = {}

for s, p, o in g.triples((None, RDF.type, None)):
    entity_types[s] = o

for n in all_nodes:
    if n not in entity_types:
        entity_types[n] = "unknown"


In [54]:
node_id = {}
counter = 0

for t, nodes in node_types.items():
    for n in nodes:
        node_id[n] = counter
        counter += 1

In [56]:
hetero_edges = {}

for rel, triples in edges.items():
    hetero_edges[rel] = [
        (node_id[s], node_id[o]) for s, o in triples
    ]

In [59]:
# hetero_edges