##Using GraphGym: An Exploration of PyG’s GraphGym
GraphGym is a powerful tool that allows users to easily design and evaluate Graph Neural Networks (GNN’s). In this tutorial, we will walk through three different use cases for GraphGym in three different domains. First, we will explore node level prediction on determining the class a document belongs to, while learning about the basics of GNN’s in GraphGym. Then, we will learn how to load your own modules into a GraphGym GNN, and how to evaluate those new modules on a link predictions collaboration dataset. Finally, we will use GraphGym to find the best GNN for predicting new molecules, using a graph level GNN pipeline.

#Setup
Import the required modules and download GraphGym

In [None]:
!pip install torch==1.13.1+cu116 -f https://download.pytorch.org/whl/torch_stable.html
!pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv torch_geometric -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
!pip install pytorch_lightning
!git clone https://github.com/snap-stanford/GraphGym
%cd GraphGym
!pip install -r requirements.txt
!pip install -e .  # From latest verion
%cd run

# Part 1:  Node level prediction and implementing a GNN pipeline

## Data
For the first example we work through, we will be looking at how to do node classification on the PubMed dataset proposed in “Revisiting Semi-Supervised Learning with Graph Embeddings”. Each dataset contains a bag of words representation of documents as the node features, and the links represent citation links between those documents in an undirected graph structure, meaning that if document i cites document j, there will be a 1 in the adjacency matrix at both location ij and ji. We then use a GNN to predict the class of each document. This dataset is built into PyG, making loading and working with the dataset easier than loading in a custom dataset, and so it is a good candidate for this post, in which the focus is on the GNN pipeline as a whole, and not just loading data.

## Defining Hyperparameters
The first step in getting your GNN pipeline running in GraphGym is to create a YAML file containing information about your dataset, GNN architecture, and hyperparameters for your initial experiments. For our PubMed dataset example, we will use a YAML file that looks like this, saved at configs/node.yaml:


```
out_dir: results
dataset:
  format: PyG
  name: PubMed
  task: node
  task_type: classification
train:
  batch_size: 128
  eval_period: 1
  ckpt_period: 100
  sampler: full_batch
model:
  type: gnn
  loss_fun: cross_entropy
gnn:
  layers_pre_mp: 1
  layers_mp: 3
  layers_post_mp: 1
  dim_inner: 128
  layer_type: sageconv
  stage_type: skipsum
  batchnorm: True
  act: prelu
  dropout: 0.1
  agg: mean
  normalize_adj: False
optim:
  optimizer: adam
  base_lr: 0.01
  max_epoch: 200
```



Walking through this YAML, we first see the definition of the directory that we want to save all our results for our experiments into under the key out_dir. Next, we define some information about the dataset that we want to run our experiments on. In this case, we know that we want to use the PubMed dataset, which is formatted as a PyG (pytorch-geometric) dataset. Currently, GraphGym supports PyG and OGB formatted datasets, with dataset loaders pre-implemented for a variety of different PyG and OGB datasets. Next, we see that for our example, we want to perform node classification, as we have defined our dataset task to be node and our task_type to be classification.

Next, we define the hyperparameters we want to use for training in the train section of our YAML file. Specifically, in this case, we decide that we want to use a batch size of 128 and generate statistics on our model’s performance on the dev and test sets every epoch, along with checkpointing the model every 100 epochs. Finally, we will use the full batch for each epoch of training, instead of subsampling only part of our data in each epoch.

Then, we begin to define how our model will be built using the model and gnn sections of our YAML file. Note that while we define many hyperparameters in our specific example, in practice you don’t need to define any hyperparameters you don’t want to. For any hyperparameters that you leave unspecified, GraphGym will fill in default values. For more information about what default values GraphGym uses, see config.py in the GraphGym repository.

For our example, we will use a GNN that is evaluated using cross entropy (a standard loss function for classification tasks), as defined in the model section of the YAML file. Then each GNN layer, we will use 1 neural network layer before message passing, three layers of message passing, and then one neural network layer after message passing, and a hidden dimension inside our layers of 128. We will use the SageConv variety of GNN conv layer, using a Skip-Sum residual layer for residual connections. Next, we define that we want to use batch-normalization in our GNN, we want PReLU as our non-linear activation function, that we want a dropout rate of 0.1, and that we want to use mean aggregation for our aggregation step in our GNN. Finally, we define that we do not want to normalize the adjacency matrix for our graph.

Finally, in our YAML file, we define some hyperparameters about how our GNN will be optimized. Specifically, we define that we want to use the ADAM optimizer with a learning rate of 0.01, and we will run optimization for a maximum of 200 epochs.

## Running experiments

We will explore the different ways to run experiments in more depth later in this notebook, but for now, to test that we’ve run all the steps properly up to this point, we can run the following command. If all has been done correctly, you should see GraphGym train a model 3 different times with different random initializations for 200 epochs and save the results of those three runs into results/node. You should see separate logging information for each of the three runs stored separately, along with combined statistics about how the model performed across all three runs aggregated into a folder called agg (in the `run` directory we're currently using as our working directory).

In [None]:
!python main_pyg.py --cfg configs/node.yaml --repeat 3

We can also run a full set of experiments (also explored in more depth later), by following this series of steps:
First, create a “grid” file to define how you will perturb your model hyperparameters during your set of experiments. For our example, we use the following grid file, saved at grids/node_grid.txt:
```
model.graph_pooling gp ['add','mean','max']
gnn.layers_pre_mp l_pre [1,2]
gnn.layers_mp l_mp [2,4,6]
gnn.layers_post_mp l_post [1,2]
gnn.stage_type stage ['stack','skipsum','skipconcat']
gnn.dim_inner dim [64,128,256]
optim.base_lr lr [0.001,0.01]
optim.max_epoch epoch [100,200,300]
```
In this grid file, we define which hyperparameters we want to perturb, along with the possible values we want those hyperparameters to be able to take on.
Then, we run:

In [None]:
!python configs_gen.py --config configs/node.yaml --grid grids/node_grid.txt --out_dir configs --sample --sample_num 20

This will generate a set of config files that correspond to the different ways that we perturbed the hyperparameters. If we don’t specify — sample, it will create a config file for every combination of the different hyperparameters specified in the grid file. However, since we do specify sample, we will only generate 20 config files, which will be a random subsample of the total set of possible hyperparameter configurations.
Next, run:

In [None]:
!bash parallel.sh configs/node_grid_node_grid 3 1 main_pyg

This will run all the config files created by the previous command. In this case, because we specified 1 for the third parameter, the config files will be run in sequence, but if that number is increased you can run multiple experiments in parallel.
Finally, run:

In [None]:
!python agg_batch.py --dir results/node_grid_node_grid

This will aggregate all the results from the various hyperparameter configurations into a single set of CSVs, so that further analysis can be done on which hyperparameter configuration is best. Again, this will be explored more later in greater detail, including examples of how to visualize these CSVs and make good decisions about the configurations.

# Part 2: Edge level prediction and customizing modules

Now, let’s explore some more of GraphGym’s functionality. GraphGym allows us to define the dataset we are using, the training parameters, the model and GNN, and the optimizer. Additionally, we can run different datasets and different model parameters to help evaluate these different model parameters. GraphGym also defines edge level training splits, automatically based on our dataset, when we have indicated the edge prediction parameter in the dataset section. After defining our dataset and model parameters, we can simply train our model using the prewritten run_singly_pyg.sh function.
Now, navigate to the example_link.yaml file in the run/configs/pyg directory. This is where we can modify our model. Open the file, and you should gain an editor on the right side of collab. The yaml file contains a few subsections to define our graph. To better understand this section, review Part 1. In short, notice how we have a different style of GNN. The task is link prediction, we have a gcn convolution layer instead of the sage convolution, and a different stage type (stack vs skipsum). These differences are significant to the performance of the GNN on it’s specific task. We will now begin customizing these modules and evaluating different choices on our collaborations graph.


### Module modifications

From the run directory, we modify the examples we want to change, and can run them either indivdually or in batches.

First, we change the dim_inner inside run/configs/pyg/example_link.yaml from 300 to 50, and max_epoch from 100 to 10. This is to allow collab to train without purchasing a more powerful machine. The bottom of our .yaml file should now look like this:

```
gnn:
  layers_pre_mp: 1
  layers_mp: 2
  layers_post_mp: 1
  dim_inner: 50
  layer_type: gcnconv
  stage_type: stack
  batchnorm: True
  act: prelu
  dropout: 0.0
  agg: mean # This value can be modified! Try 'add' instead of 'mean'
  normalize_adj: False
optim:
  optimizer: adam
  base_lr: 0.01
  max_epoch: 10

```



Next, we change run_batch_pyg.sh CONFIG=example_node to example_link, and keep REPEAT=3 and MAX_JOBS=8. If this crashes your machine, consider lowering REPEAT, or decreasing the dim_inner.

The top of run_batch_pyg.sh should look like this:
```
CONFIG=example_link
GRID=example
REPEAT=3
MAX_JOBS=8
MAIN=main_pyg
```


In graphgym/contrib/act/example.py
add the following function. This registers a new activation function, "lrelu_05", which is defined below.
```
register_act('lrelu_05',
             nn.LeakyReLU(negative_slope=0.5, inplace=cfg.mem.inplace))
```


Finally, in run/grids/pyg/example.txt comment out all the gnn._ lines, and add the activation functions we want to explore. The example.txt file should look like this:
```
# Format for each row: name in config.py; alias; range to search
# No spaces, except between these 3 fields
# Line breaks are used to union different grid search spaces
# Feel free to add '#' to add comments


gnn.act act ["swish","lrelu_03","lrelu_05"]
# gnn.layer_type layer ["exampleconv","gcnconv"]
# gnn.stage_type stage ['skipsum','skipconcat'] # If we wanted to experiment with other variables, we'd use this format!
# gnn.agg agg ['add','mean','max']
```

###Experiment
Now, we will compare these two activation functions in our experiment below! Run the following code block to begin evaluating our different models.
Note: Training 3 models three separate times can be quite computationally expensive. If you are simply using this collab to learn, consider the cost of the changes above.

In [None]:
!bash run_batch_pyg.sh 

###Results
To see our results, navigate to the run/results/example_link_grid_example/agg directory, which contains the aggregated experiment results. Specifically, we can look at the val_best.csv, which contains the best results from our validation set. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

val_best = pd.read_csv("/content/GraphGym/run/results/example_link_grid_example/agg/val_best.csv")
val_best.head()

###Plotting
Now, we will plot our results, for an easier time understanding. In the next section we will use GraphGym's built-in visualizations, but for now we will make a simple bar plot.

In [None]:
barWidth = 0.25
fig = plt.subplots(figsize =(8, 6))
 
# set height of bar
activation_funcs = val_best["act"]
loss = val_best["loss"]
loss_std = val_best["loss_std"]
accuracy = val_best["accuracy"]
acc_std = val_best["accuracy_std"]
 
# Set position of bar on X axis
br1 = np.arange(len(Loss))
br2 = [x + barWidth for x in br1]
br3 = [x + barWidth for x in br2]
 
# Make the plot
plt.bar(br1, Loss, color ='b', width = barWidth,
        edgecolor ='grey', label ='Loss', yerr = loss_std)
plt.bar(br2, accuracy, color ='g', width = barWidth,
        edgecolor ='grey', label ='Accuracy', yerr = acc_std)
 
# Adding Xticks
plt.xlabel('Activation Function Experiment', fontweight ='bold', fontsize = 15)
plt.xticks([r + barWidth/2 for r in range(len(Loss))],
        activation_funcs)
 
plt.legend()
plt.show()

While the difference between accuracy in our three activation functions is small, it is easy to see how GraphGym can help you explore and innovate GNN design. Since GraphGym fixes all other hyperparameters and controls the computation budget, we know that the comparison is fair. This easy way to tune and compare otherwise identical GNN’s is perfect for GNN researchers who want a simple way to test novel GNN ideas, without having to explicitly define and perfect other aspects of the model.

# Part 3: Graph-Level Prediction and Finding the Best GNN Design

Finally, let's discuss how GraphGym can help us find the best GNN design for a given problem.
To motivate this, we will use the "MoleculeNet HIV" dataset, which consists of 41,127 graphs each representing a molecule (where nodes are atoms and edges are chemical bonds between atoms). Node input features are 9-dimensional, containing atomic information such as the atomic number and formal charge, among others. The ultimate graph-level prediction task is to classify whether or not each molecule inhibits HIV. Because this dataset is so skewed (only 1.4% positive labels), ROC AUC is typically used as an evaluation metric instead of accuracy.

## The Base GNN
Let's follow along from the previous sections to re-enter the Run directory within GraphGym. Navigate to the run/configs/pyg directory and open the example_graph.yaml file. This is where we can view the basic setup for our model. The yaml file contains a few subsections to define our graph. 



## The GNN Design Space
In order to explore multiple GNN designs at once, we'll need to open up a few different files. Within the Run directory, open up run_batch_pyg.sh as well as grids/pyg/example.txt. After opening both, they should appear on the right hand side of our Colab Notebook.

1. Let's go ahead and make a couple of updates to this file for our purposes. First of all, we want to make sure we specify the proper configuration file, so we need to replace example_node with example_graph.
2. Secondly, if you want, you can change the values of REPEAT and MAX_JOBS. REPEAT specifies the number of random seeds for each experiment (eg. in this case, as REPEAT=3, if we were to run one experiment, we would actually run it three separate times, each with a different random seed). MAX_JOBS specifies the maximum number of experiments our system will run concurrently. For this example, I will change REPEAT=1 for simplicity, but keep MAX_JOBS the same.


Now, let go ahead and open up grids/pyg/example.txt. This is where we can specify our experiments. We can define multiple values for our pre-processing layers, our MP layers, our post-processing layers, our hidden dimensions, our learning rates, our epochs, and more. We also have the ability to specify multiple different stage types and aggregate functions. Note that the total number of experiments that will be run is equal to the number of all the unique combinations. Feel free to explore different design spaces by modifying this file.

*Note: If you decide to create your own .txt file to specify your grid search space, rather than modify the example.txt file, make sure to make the necessary changes to run_batch_pyg.sh so that the program knows where to find your file.

Now, we can run our experiments with the following line:

In [None]:
!bash run_batch_pyg.sh #run our specified experiments

This will take some time, depending on our total number of experiments. Ultimately, the results of each experiment will be saved in the results/example_grid_graph_sample directory, and even more conveniently, the results will also be aggregated and saved in the results/example_grid_graph_sample/agg directory. Here, we can find the csv files like val_best.csv, which saves the model from the epoch with the best validation performance for each of our different experiments.

##Visualization
In many cases, though, we'd ideally like to visualize the results from our experiments so that we can compare the different models. Outside of the Run directory in GraphGym/analysis, we can find a file titled example.ipynb that contains all of the relevant code to visualize our results! First, we can copy over the first cell from that file into our notebook and run it to define the relevant visualization functions:

In [None]:
# Example analysis for a batch of experiments
# We found the functionalities below is the most useful in practice
# It can automatically provides an overview of the trade-off for each design dimension
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:95% !important; }</style>"))
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import copy
from scipy.stats import rankdata
from matplotlib.ticker import MaxNLocator

%matplotlib inline
sns.set(style='ticks',context='poster')
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
np.set_printoptions(precision=3, linewidth=200, suppress=True)


def list_exclude(a, b):
    return [item for item in a if item not in b]

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
        
column_exclude_options = ['format','dataset', 'task', 'trans', 'feature', 'label',
            'epoch', 'loss', 'loss_std', 
            'params', 'time_iter', 'time_iter_std', 'accuracy', 'accuracy_std', 
            'precision', 'precision_std', 'recall', 'recall_std', 'f1', 'f1_std', 'auc', 'auc_std']

def name_mapping(name):
    # you can add additional name mapping for your customize configurations
    mapping = {'act': 'Activation', 'bn':'Batch Normalization', 'drop':'Dropout', 'agg':'Aggregation',
                'l_mp':'MP layers', 'l_pre':'Pre-process layers', 'l_post': 'Post-process layers', 'stage': 'Layer connectivity',
                'lr': 'Learning rate', 'batch':'Batch size', 'optim': 'Optimizer', 'epoch': 'Training epochs', 
                'direct': 'Direction', 'head':'Multi-task heads', 'l_final':'Att final', 'l_type':'layer_type',
               'l_finalbn': 'Final BN', 'task': 'Task', 'subgraph':'subgraph', 'margin':'margin',
               'order':'order', 'norm':'norm'}
    if name in mapping:
        return mapping[name]
    else:
        return name

def get_acc(df_pivot, name, ax, plot_type='performance', has_y=True, rank_resolution=0.001, verbose=False):
    accs_np = df_pivot.fillna(df_pivot.min()).values.round(4)
    options = df_pivot.columns.values

    ranks_raw = {'Model ID':[], 'Accuracy':[], 'Acc. Ranking':[], name_mapping(name):[]}
    
    for i,row in enumerate(accs_np):
        # (1) rank is asceneding, so we neg the row
        rank_base = -row
        med = np.median(rank_base)
        for j in range(len(rank_base)):
            if abs(rank_base[j]-med) <= rank_resolution:
                rank_base[j] = med
        rank = rankdata(rank_base, method='min')
        for j in range(len(rank)):
            ranks_raw['Model ID'].append(i)
            ranks_raw['Accuracy'].append(accs_np[i,j])
            ranks_raw['Acc. Ranking'].append(rank[j])
            ranks_raw[name_mapping(name)].append(options[j])
    
    ranks_raw = pd.DataFrame(data=ranks_raw)     
    with sns.color_palette("muted"):
        if plot_type=='performance':
            splot = sns.violinplot(x=name_mapping(name), y="Accuracy",inner="box", data=ranks_raw, cut=0, ax=ax)
            ax.set_xlabel('',fontsize=48)
            if not has_y:
                ax.set_ylabel('',fontsize=48)
            else:
                ax.set_ylabel('AUC Dist.',fontsize=48)
        elif plot_type=='rank_bar':
            splot = sns.barplot(x=name_mapping(name), y="Acc. Ranking",data=ranks_raw, ax=ax)
            ax.set_ylim(bottom=1)
            ax.set_yticks([1,2])
            ax.set_xlabel('',fontsize=48)
            if not has_y:
                ax.set_ylabel('',fontsize=48)
            else:
                ax.set_ylabel('Rank Average',fontsize=48)
        elif plot_type=='rank_violin':
            sns.violinplot(x=name_mapping(name), y="Acc. Ranking",inner="box", data=ranks_raw, cut=0, ax=ax)
            ax.set_ylim(bottom=1)
            ax.yaxis.set_major_locator(MaxNLocator(integer=True))
            if not has_y:
                ax.set_ylabel('',fontsize=48)
            else:
                ax.set_ylabel('Rank Dist.',fontsize=48)
        ax.xaxis.label.set_size(48)
        ax.yaxis.label.set_size(48)
        for tick in ax.xaxis.get_major_ticks():
            tick.label.set_fontsize(40)
        for tick in ax.yaxis.get_major_ticks():
            tick.label.set_fontsize(40)

            
def plot_single(df, options_chunk, options, metric, rank_resolution):
    for names in options_chunk:
        col = 6
        row = 3
        f, axes = plt.subplots(nrows=row, ncols=col, figsize=(48, 14))
        for i,name in enumerate(names):
            name_others = copy.deepcopy(options)
            name_others.remove(name)
            df_pivot = pd.pivot_table(df, values=metric, index=name_others, columns=[name], aggfunc=np.mean)
            for j,plot_type in enumerate(['performance','rank_bar','rank_violin']):
                get_acc(df_pivot, name, axes[j, i], plot_type, has_y=True, rank_resolution=rank_resolution)
        plt.tight_layout()
        plt.subplots_adjust(wspace=0.5, hspace=0.2)
    #     f.savefig('figs/{}.png'.format(metric), dpi=150, bbox_inches='tight')
        plt.show()

def plot_analysis(fname, division='test', dataset=None, metric='accuracy', rank_resolution=0.001, filter=None, filter_rm=None):
    results_file_path = '../run/results/{}/agg/{}.csv'.format(fname, division)
    df = pd.read_csv(results_file_path)
    df = df.fillna(0)
    df['epoch'] += 1
    df.replace('skipconcat','skipcat',inplace=True)
    df.replace('add','sum',inplace=True)

    if filter is not None:
        for key, val in filter.items():
            if type(val) == list:
                df = df[df[key].isin(val)]
            else:
                df = df[df[key]==val]
    if filter_rm is not None:
        for key, val in filter_rm.items():
            if type(val) == list:
                df = df[~df[key].isin(val)]
            else:
                df = df[df[key]!=val]

    # create and filter design dimensions
    options_raw = list_exclude(list(df.columns), column_exclude_options)
    options = []
    for name in options_raw:
        column_temp = copy.deepcopy(options_raw)
        column_temp.remove(name)
        df_pivot = pd.pivot_table(df, values=metric, index=column_temp, columns=[name], aggfunc=np.mean)
        if len(df_pivot.columns)!=1:
            options.append(name)
    options_chunk = list(chunks(options, 6))
    print(division, dataset, options_chunk)
    
    if dataset is None:
        for dataset in df['dataset'].unique():
            df_dataset = df[df['dataset']==dataset]
            print('Dataset: {}'.format(dataset))
            plot_single(df_dataset, options_chunk, options, metric, rank_resolution)
    elif dataset=='all':
        plot_single(df, options_chunk, options, metric, rank_resolution)
    else:
        df_dataset = df[df['dataset']==dataset]
        print('Dataset: {}'.format(dataset))
        plot_single(df_dataset, options_chunk, options, metric, rank_resolution)

Then, we can run the following line to get our results:

In [None]:
# analysis for all dataset in the batch
experiment_name = 'example_graph_grid_example'
plot_analysis(experiment_name, division='val', dataset="all", metric='auc', rank_resolution=0)