# Generating Graphs Tutorial

This tutorial serves as a template on how to generate your graphs from edge_list

In [1]:
!pip install tcrgnn # first install package



First we will show how to generate one graph from one edge file. Each edge file will correspond to one TCR graph

We will load the PCA encoding of the AAiNDEX. This matrix contains the two principal component PC1 and PC2 that contains the amino acid biophysicochemical property, that will be used to encode the node attributes in the graph

In [2]:
from tcrgnn import load_pca_encoding
pca_encoding = load_pca_encoding("/scratch/project/tcr_ml/gnn_release/AAidx_PCA_2024.txt")
pca_encoding



Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
A,-0.119337,4.766417,17.116718,2.295739,3.153466,3.641931,-2.857472,-0.510985,3.842872,-1.553891,0.760858,1.727918,2.143332,2.080794
L,17.827236,2.574882,11.241191,6.959181,-1.981687,2.180934,-0.098418,1.090348,0.559656,5.462775,7.368277,-1.595037,0.6615,0.083453
R,-8.188429,-15.255091,-1.522583,0.076711,-7.413312,-8.28445,-6.45411,-5.305068,-0.090436,-5.391594,5.400477,-1.503589,-2.789434,-0.039869
K,-11.832051,-13.376451,5.349272,1.943336,-5.254241,-2.259015,-3.690758,0.058316,-1.170751,9.12241,-4.000843,2.100945,-0.789188,3.080768
N,-15.229827,-0.08429,-3.488524,-6.198218,-1.827574,1.607748,4.176737,5.400174,-0.964957,4.007921,1.773937,-1.337021,-4.847988,-3.707884
M,16.174029,-4.974744,3.197186,-1.917002,6.432085,4.222104,-5.937735,4.27795,-0.505234,-3.175196,-3.868715,-6.915054,-2.011987,2.902042
D,-18.214603,-3.286554,-0.400137,-1.904439,4.706532,4.241386,9.203848,-2.939339,-1.991538,-2.311474,1.381515,-1.929361,-3.442823,5.299531
F,19.523669,-0.291606,-3.964076,1.787112,-2.300718,3.777835,1.369979,2.015799,-0.562367,-1.618873,3.587535,-1.685901,-0.189207,-3.736868
C,8.263782,8.355437,-5.918452,-14.414202,12.698391,-6.393775,-3.524156,-4.560058,-2.210811,4.12249,2.415289,0.575655,1.107817,-0.316855
P,-16.669545,11.992157,-16.756806,19.067663,6.3665,-0.264777,-4.297,0.456891,-0.831621,0.545217,0.335688,0.2799,-0.29678,0.280335


## Loading one graph

We have provided a simple function to generate graph given an edgelist file. Specify the label and the PCA encoded matrix into the function as well

In [4]:
from tcrgnn import generate_graph_from_edge_file

graph = generate_graph_from_edge_file("/scratch/project/tcr_ml/gnn_release/sample_data/edgelists/edgefile/IID_H136574_T05_01_WT01_cdr3_1_unrelaxed_rank_001_alphafold2_ptm_model_2_seed_000_edge.txt",pca_encoding, label = 1)
graph

Data(x=[13, 14], edge_index=[2, 42], y=[1], original_characters='CAIRTDMNTEAFF')

For training purposes, we have provided a function to process edgelists as a bulk given a directory. In a normal case, a directory of edgelist files will be represented as a sample

In [5]:
from tcrgnn import generate_graphs_from_edge_dir
graphs = generate_graphs_from_edge_dir("/scratch/project/tcr_ml/gnn_release/sample_data/edgelists/cancer/example", pca_encoding, label=1)
graphs

[Data(x=[16, 14], edge_index=[2, 58], y=[1], original_characters='CASSSTPSANTGELFF'),
 Data(x=[14, 14], edge_index=[2, 48], y=[1], original_characters='CATSRGHSNQPQHF'),
 Data(x=[13, 14], edge_index=[2, 42], y=[1], original_characters='CSVEGTRKDTQYF'),
 Data(x=[15, 14], edge_index=[2, 86], y=[1], original_characters='CSARSLLAGTGELFF'),
 Data(x=[14, 14], edge_index=[2, 52], y=[1], original_characters='CASSEFRGRNEQYF'),
 Data(x=[15, 14], edge_index=[2, 54], y=[1], original_characters='CASSLGLYSNQPQHF'),
 Data(x=[15, 14], edge_index=[2, 74], y=[1], original_characters='CASSVGPGGVTEAFF'),
 Data(x=[14, 14], edge_index=[2, 52], y=[1], original_characters='CSVVDDTAVGGYTF'),
 Data(x=[16, 14], edge_index=[2, 56], y=[1], original_characters='CASSSYGTENTDTQYF'),
 Data(x=[19, 14], edge_index=[2, 72], y=[1], original_characters='CASSQDPQVAGGSYNEQFF'),
 Data(x=[17, 14], edge_index=[2, 68], y=[1], original_characters='CASSYSPLLTGDYGYTF'),
 Data(x=[13, 14], edge_index=[2, 44], y=[1], original_characte

In [6]:
# You can save graph to disk as a pytorch object itself
# We want to save one sample as a list of graphs
# We can utilise torch.save
import torch
torch.save(graphs, "sample_graphs.pt")