## Introduction

This Jupyter notebook is created to reproduce the results and experiment abalations of paper "SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS" by Thomas N. Kipf and Max Welling. It was published on "Proceedings of the International Conference on Learning Representations (ICLR)" and can be found at https://arxiv.org/pdf/1609.02907.pdf.

The authors of the paper proposed a semi-supervised classification algorithm based on graph convolutional networks (GCNs). And the original code can be found at https://github.com/tkipf/gcn. In this notebook, we reused most of the code and made a few modifications. 

The main structure of this notebook is as follows:
1. We modified the code so that it works in compatibility mode with tensorflow 2.12.0 and also added a few modifications so that we are able to explore different ablations.
2. We put the code from original authors under `src/gcn` directory. And they were modified to be compatible with tensorflow 2.12.0.
3. In the "Methodology explanation and examples" section, we explored all of our ablations.

## Data

The data is included in the original GCN repository. Thus, there is no need to download it again. It can be found under `gcn/data` directory. The data are from article "Collective classification in network data" by Sen et al. (2008). Here is an overview of the datasets:

| Dataset | Nodes   | Edges   | Features | Classes |
|---------|---------|---------|----------|---------|
| Citeseer| 3,327   | 4,732   | 3,703    | 6       |
| Cora    | 2,708   | 5,429   | 1,433    | 7       |
| Pubmed  | 19,717  | 44,338  | 500      | 3       |

## Reproducibility Summary

By running this notebook, we are able to reproduce the results of GCN on all three datasets. In the "Reproduce of Original Model" section, it includes our reproduction of the original model. We achieved very similar results to original paper. Additionally, in the "Methodology explanation and examples" section, we explored all of our ablations. And results of these ablations can be found in the "Results of Ablations" section. 
- In terms of swapping activation functions, the majority of activation function (nn.tf.leaky_relu, nn.tf.tanh, nn.tf.elu) options resulted in very similar performance to the original results using tf.nn.relu. But the Sigmoid activation function yielded significantly worse results. 
- For the experiment with different optimizers, we achieved results close to the original findings using at least one combination of optimizers and learning rates.
- For the experiment with different number of layers, by simply adding or removing the number of layers, the results becomes worse.

<!-- TODO: A summary of the report and findings, about 200 words -->

## Import Packages and Set Seeds

In [1]:
from gcn.inits import *
from gcn.utils import *
from gcn.metrics import *
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

seed = 123
np.random.seed(seed)
tf.set_random_seed(seed)

Instructions for updating:
non-resource variables are not supported in the long term


## Layers Class

### Helper Functions

In [2]:
# global unique layer ID dictionary for layer name assignment
_LAYER_UIDS = {}


def get_layer_uid(layer_name=''):
    """Helper function, assigns unique layer IDs."""
    if layer_name not in _LAYER_UIDS:
        _LAYER_UIDS[layer_name] = 1
        return 1
    else:
        _LAYER_UIDS[layer_name] += 1
        return _LAYER_UIDS[layer_name]


def sparse_dropout(x, keep_prob, noise_shape):
    """Dropout for sparse tensors."""
    random_tensor = keep_prob
    random_tensor += tf.random_uniform(noise_shape)
    dropout_mask = tf.cast(tf.floor(random_tensor), dtype=tf.bool)
    pre_out = tf.sparse_retain(x, dropout_mask)
    return pre_out * (1./keep_prob)


def dot(x, y, sparse=False):
    """Wrapper for tf.matmul (sparse vs dense)."""
    if sparse:
        res = tf.sparse_tensor_dense_matmul(x, y)
    else:
        res = tf.matmul(x, y)
    return res

### Layer Class

In [3]:
class Layer(object):
    """Base layer class. Defines basic API for all layer objects.
    Implementation inspired by keras (http://keras.io).

    # Properties
        name: String, defines the variable scope of the layer.
        logging: Boolean, switches Tensorflow histogram logging on/off

    # Methods
        _call(inputs): Defines computation graph of layer
            (i.e. takes input, returns output)
        __call__(inputs): Wrapper for _call()
        _log_vars(): Log all variables
    """

    def __init__(self, **kwargs):
        allowed_kwargs = {'name', 'logging'}
        for kwarg in kwargs.keys():
            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg
        name = kwargs.get('name')
        if not name:
            layer = self.__class__.__name__.lower()
            name = layer + '_' + str(get_layer_uid(layer))
        self.name = name
        self.vars = {}
        logging = kwargs.get('logging', False)
        self.logging = logging
        self.sparse_inputs = False

    def _call(self, inputs):
        return inputs

    def __call__(self, inputs):
        with tf.name_scope(self.name):
            if self.logging and not self.sparse_inputs:
                tf.summary.histogram(self.name + '/inputs', inputs)
            outputs = self._call(inputs)
            if self.logging:
                tf.summary.histogram(self.name + '/outputs', outputs)
            return outputs

    def _log_vars(self):
        for var in self.vars:
            tf.summary.histogram(self.name + '/vars/' + var, self.vars[var])

### Graph Convolutional Layer Class

In [4]:
class GraphConvolution(Layer):
    """Graph convolution layer."""
    def __init__(self, input_dim, output_dim, placeholders, dropout=0.,
                 sparse_inputs=False, act=tf.nn.relu, bias=False,
                 featureless=False, **kwargs):
        super(GraphConvolution, self).__init__(**kwargs)

        if dropout:
            self.dropout = placeholders['dropout']
        else:
            self.dropout = 0.

        self.act = act
        self.support = placeholders['support']
        self.sparse_inputs = sparse_inputs
        self.featureless = featureless
        self.bias = bias

        # helper variable for sparse dropout
        self.num_features_nonzero = placeholders['num_features_nonzero']

        with tf.variable_scope(self.name + '_vars'):
            for i in range(len(self.support)):
                self.vars['weights_' + str(i)] = glorot([input_dim, output_dim],
                                                        name='weights_' + str(i))
            if self.bias:
                self.vars['bias'] = zeros([output_dim], name='bias')

        if self.logging:
            self._log_vars()

    def _call(self, inputs):
        x = inputs

        # dropout
        if self.sparse_inputs:
            x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
        else:
            x = tf.nn.dropout(x, 1-self.dropout)

        # convolve
        supports = list()
        for i in range(len(self.support)):
            if not self.featureless:
                pre_sup = dot(x, self.vars['weights_' + str(i)],
                              sparse=self.sparse_inputs)
            else:
                pre_sup = self.vars['weights_' + str(i)]
            support = dot(self.support[i], pre_sup, sparse=True)
            supports.append(support)
        output = tf.add_n(supports)

        # bias
        if self.bias:
            output += self.vars['bias']

        return self.act(output)

## Models

In [6]:
class Model(object):
    def __init__(self, **kwargs):
        # print("new model, seed=", tf.get_default_graph().seed)
        allowed_kwargs = {'name', 'logging'}
        for kwarg in kwargs.keys():
            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg
        name = kwargs.get('name')
        if not name:
            name = self.__class__.__name__.lower()
        self.name = name

        logging = kwargs.get('logging', False)
        self.logging = logging

        self.vars = {}
        self.placeholders = {}

        self.layers = []
        self.activations = []

        self.inputs = None
        self.outputs = None

        self.loss = 0
        self.accuracy = 0
        self.optimizer = None
        self.opt_op = None

    def _build(self):
        raise NotImplementedError

    def build(self):
        """ Wrapper for _build() """
        with tf.variable_scope(self.name):
            self._build()

        # Build sequential layer model
        self.activations.append(self.inputs)
        for layer in self.layers:
            hidden = layer(self.activations[-1])
            self.activations.append(hidden)
        self.outputs = self.activations[-1]

        # Store model variables for easy access
        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
        self.vars = {var.name: var for var in variables}

        # Build metrics
        self._loss()
        self._accuracy()

        self.opt_op = self.optimizer.minimize(self.loss)

    def predict(self):
        pass

    def _loss(self):
        raise NotImplementedError

    def _accuracy(self):
        raise NotImplementedError

    def save(self, sess=None):
        if not sess:
            raise AttributeError("TensorFlow session not provided.")
        saver = tf.train.Saver(self.vars)
        save_path = saver.save(sess, "tmp/%s.ckpt" % self.name)
        print("Model saved in file: %s" % save_path)

    def load(self, sess=None):
        if not sess:
            raise AttributeError("TensorFlow session not provided.")
        saver = tf.train.Saver(self.vars)
        save_path = "tmp/%s.ckpt" % self.name
        saver.restore(sess, save_path)
        print("Model restored from file: %s" % save_path)

### GCN Model

In [7]:
class GCN(Model):
    def __init__(self, placeholders, input_dim, **kwargs):
        super(GCN, self).__init__(**kwargs)

        # print("new GCN")

        self.inputs = placeholders['features']
        self.input_dim = input_dim
        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions
        self.output_dim = placeholders['labels'].get_shape().as_list()[1]
        self.placeholders = placeholders

        self.optimizer = flags_optimizer(learning_rate=flags_learning_rate)

        self.build()

    def _loss(self):
        # Weight decay loss
        for var in self.layers[0].vars.values():
            self.loss += flags_weight_decay * tf.nn.l2_loss(var)

        # Cross entropy error
        self.loss += masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],
                                                  self.placeholders['labels_mask'])

    def _accuracy(self):
        self.accuracy = masked_accuracy(self.outputs, self.placeholders['labels'],
                                        self.placeholders['labels_mask'])

    def _build(self):

        self.layers=[]

        if flags_layers == 2:
            # Paper layer configuration
            self.layers.append(GraphConvolution(input_dim=self.input_dim,
                                                output_dim=flags_hidden1,
                                                placeholders=self.placeholders,
                                                act=flags_act_func,
                                                dropout=True,
                                                sparse_inputs=True,
                                                logging=self.logging))

            self.layers.append(GraphConvolution(input_dim=flags_hidden1,
                                                output_dim=self.output_dim,
                                                placeholders=self.placeholders,
                                                act=lambda x: x,
                                                dropout=True,
                                                logging=self.logging))
        elif flags_layers == 1:
            # Single layer configuration
            self.layers.append(GraphConvolution(input_dim=self.input_dim,
                                                output_dim=self.output_dim,
                                                placeholders=self.placeholders,
                                                act=flags_act_func,
                                                dropout=True,
                                                sparse_inputs=True,
                                                logging=self.logging))
        elif flags_layers == 3:
            # Triple layer configuration
            self.layers.append(GraphConvolution(input_dim=self.input_dim,
                                                output_dim=64,
                                                placeholders=self.placeholders,
                                                act=flags_act_func,
                                                dropout=True,
                                                sparse_inputs=True,
                                                logging=self.logging))

            self.layers.append(GraphConvolution(input_dim=64,
                                                output_dim=flags_hidden1,
                                                placeholders=self.placeholders,
                                                act=flags_act_func,
                                                dropout=True,
                                                logging=self.logging))

            self.layers.append(GraphConvolution(input_dim=flags_hidden1,
                                                output_dim=self.output_dim,
                                                placeholders=self.placeholders,
                                                act=lambda x: x,
                                                dropout=True,
                                                logging=self.logging))

    def predict(self):
        return tf.nn.softmax(self.outputs)

## Train

In [8]:
from __future__ import division
from __future__ import print_function

import time


### Training Parameters

In [9]:
# Default Settings
flags_dataset = 'cora'# , 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'
flags_model = 'gcn' #, 'Model string.')  # 'gcn', 'gcn_cheby', 'dense'
flags_learning_rate = 0.01 #, 'Initial learning rate.')
flags_epochs = 200 #, 'Number of epochs to train.')
flags_hidden1 = 16 #, 'Number of units in hidden layer 1.')
flags_dropout = 0.5 #, 'Dropout rate (1 - keep probability).')
flags_weight_decay = 5e-4 #, 'Weight for L2 loss on embedding matrix.')
flags_early_stopping = 10 #, 'Tolerance for early stopping (# of epochs).')
flags_max_degree = 3 #, 'Maximum Chebyshev polynomial degree.')
flags_act_func = tf.nn.relu # Activation function: tf.nn.relu, tf.nn.leaky_relu, tf.nn.sigmoid, tf.nn.tanh, tf.nn.elu
flags_optimizer = tf.train.AdamOptimizer # Optimizer: tf.train.AdamOptimizer, tf.train.GradientDescentOptimizer, tf.train.AdadeltaOptimizer, tf.train.RMSPropOptimizer
flags_layers = 2 # layers: 1, 2, 3

## Functions to Train Model

The following functions are used to train the model. It was a reuse of `train.py` provided by original authors. We've modified it to fit into the Jupyter notebook.

In [10]:
# Define model evaluation function
def evaluate(features, support, labels, mask, placeholders, sess, model):
    t_test = time.time()
    feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)
    outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)
    return outs_val[0], outs_val[1], (time.time() - t_test)

def train(adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask):

    """ reset global """
    _LAYER_UIDS = {}
    
    # Some preprocessing
    features = preprocess_features(features)
    support = [preprocess_adj(adj)]
    num_supports = 1

    # Define placeholders
    placeholders = {
        'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],
        'features': tf.sparse_placeholder(tf.float32, shape=tf.constant(features[2], dtype=tf.int64)),
        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),
        'labels_mask': tf.placeholder(tf.int32),
        'dropout': tf.placeholder_with_default(0., shape=()),
        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout
    }

    # Create model
    model = GCN(placeholders, input_dim=features[2][1], logging=True)

    # Initialize session
    sess = tf.Session()
    # Init variables
    sess.run(tf.global_variables_initializer())

    cost_val = []

    t_begin = time.time()

    # Train model
    for epoch in range(flags_epochs):

        t = time.time()
        # Construct feed dictionary
        feed_dict = construct_feed_dict(features, support, y_train, train_mask, placeholders)
        feed_dict.update({placeholders['dropout']: flags_dropout})

        # Training step
        outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)

        # Validation
        cost, acc, duration = evaluate(features, support, y_val, val_mask, placeholders, sess, model)
        cost_val.append(cost)

        # Print results
        # print("Epoch:", '%04d' % (epoch + 1), "train_loss=", "{:.5f}".format(outs[1]),
        #     "train_acc=", "{:.5f}".format(outs[2]), "val_loss=", "{:.5f}".format(cost),
        #     "val_acc=", "{:.5f}".format(acc), "time=", "{:.5f}".format(time.time() - t))
        if epoch % 5 == 0:
            print(".", end="")

        if epoch > flags_early_stopping and cost_val[-1] > np.mean(cost_val[-(flags_early_stopping+1):-1]):
            print("Early stopping...")
            break

    print("")

    # print("total train time {:.5f}".format(time.time() - t_begin))
    duration = time.time() - t_begin

    # Testing
    test_cost, test_acc, test_duration = evaluate(features, support, y_test, test_mask, placeholders, sess, model)
    sess.close()
    return test_cost, test_acc, duration

## Reproduce of Original Model

In [11]:
def resetAllRandomSeeds():
    seed = 123
    np.random.seed(seed)
    tf.reset_default_graph() # reset session
    tf.set_random_seed(seed)
    tf.config.set_soft_device_placement(True)

def loadDataAndTrain():
    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(flags_dataset)
    return train(adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask)

In [12]:
dataset_list = ['cora', 'citeseer', 'pubmed']

for flags_dataset in dataset_list:
     resetAllRandomSeeds()
     test_cost, test_acc, test_duration = loadDataAndTrain()
     print("[{}] Test set results: cost={cost:.5f}, accuracy={accuracy:.5f}, time={time:.5f}".format(
          flags_dataset, cost=test_cost, accuracy=test_acc, time=test_duration))


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.



  adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))


........................................
[cora] Test set results: cost=1.01651, accuracy=0.81700, time=2.48694


  r_inv = np.power(rowsum, -1).flatten()


........................................
[citeseer] Test set results: cost=1.30973, accuracy=0.70600, time=2.82150
...............................Early stopping...

[pubmed] Test set results: cost=0.72828, accuracy=0.79400, time=9.65042


## Methodology explanation and examples

Here we loop through the three datasets and train the model on each of them with different combination of optimizer, activation function and number of layers. It will call function `train()` defined in the above cell to perform the training. We keep all the other parameters the same as the original paper. The `train()` function creates the GCN model and train it on the dataset. It will return the accuracy on the test set which will be later recorded in the Result section.

In [13]:
dataset_list = ['cora', 'citeseer', 'pubmed']
optimizer_list = [tf.train.AdamOptimizer, tf.train.GradientDescentOptimizer, tf.train.AdadeltaOptimizer, tf.train.RMSPropOptimizer]
activation_list = [tf.nn.relu, tf.nn.leaky_relu, tf.nn.sigmoid, tf.nn.tanh, tf.nn.elu]
lr_list = [0.01, 0.99]
layers_list = [2, 1, 3]
result = {}

In [14]:

""" RESET GLOBALS """
_LAYER_UIDS = {}

for flags_dataset in dataset_list:
    print("=========== {} begin ===========".format(flags_dataset))
    if flags_dataset not in result.keys():
        result[flags_dataset] = {}
    print("+ [{}] activation function trial begin".format(flags_dataset))
    result[flags_dataset]['activation'] = {}
    for flags_act_func in activation_list:
        # adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(flags_dataset)
        # test_cost, test_acc, test_duration = train(adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask)
        resetAllRandomSeeds()
        test_cost, test_acc, test_duration = loadDataAndTrain()
        print("+ + [{}][{}] Test set results: cost={cost:.5f}, accuracy={accuracy:.5f}, time={time:.5f}".format(
            flags_dataset, flags_act_func.__name__, cost=test_cost, accuracy=test_acc, time=test_duration))
        result[flags_dataset]['activation'][flags_act_func.__name__] = {"cost": test_cost, "accuracy": test_acc, "time": test_duration}
    # reset activation function
    flags_act_func = tf.nn.relu

    
    print("+ [{}] optimizer trial begin".format(flags_dataset))
    result[flags_dataset]['optimizer'] = {}
    for flags_optimizer in optimizer_list:
        result[flags_dataset]['optimizer'][flags_optimizer.__name__] = []
        for flags_learning_rate in lr_list:
            resetAllRandomSeeds()
            test_cost, test_acc, test_duration = loadDataAndTrain()
            print("+ + [{}][{}][{lr}] Test set results: cost={cost:.5f}, accuracy={accuracy:.5f}, time={time:.5f}".format(
                flags_dataset, flags_optimizer.__name__, cost=test_cost, accuracy=test_acc, time=test_duration, lr=flags_learning_rate))
            result[flags_dataset]['optimizer'][flags_optimizer.__name__].append({"cost": test_cost, "accuracy": test_acc, "time": test_duration})
    # reset optimizer
    flags_optimizer = tf.train.AdamOptimizer
    flags_learning_rate = 0.01
    
    print("+ [{}] layers trial begin".format(flags_dataset))
    result[flags_dataset]['layers'] = {}
    for flags_layers in layers_list:
        resetAllRandomSeeds()
        test_cost, test_acc, test_duration = loadDataAndTrain()
        print("+ + [{}][{}] Test set results: cost={cost:.5f}, accuracy={accuracy:.5f}, time={time:.5f}".format(
            flags_dataset, flags_layers, cost=test_cost, accuracy=test_acc, time=test_duration))
        result[flags_dataset]['layers'][flags_layers] = {"cost": test_cost, "accuracy": test_acc, "time": test_duration}
    # reset number of layers
    flags_layers = 2

    print("=========== {} end ===========".format(flags_dataset))

+ [cora] activation function trial begin
........................................
+ + [cora][relu] Test set results: cost=1.01649, accuracy=0.81700, time=1.77342
........................................
+ + [cora][leaky_relu] Test set results: cost=1.01414, accuracy=0.81300, time=1.73895
...Early stopping...

+ + [cora][sigmoid] Test set results: cost=1.95441, accuracy=0.09300, time=0.18706
........................................
+ + [cora][tanh] Test set results: cost=0.97253, accuracy=0.81600, time=1.83278
........................................
+ + [cora][elu] Test set results: cost=0.97768, accuracy=0.81700, time=1.80212
+ [cora] optimizer trial begin
........................................
+ + [cora][AdamOptimizer][0.01] Test set results: cost=1.01651, accuracy=0.81700, time=2.03768
......Early stopping...

+ + [cora][AdamOptimizer][0.99] Test set results: cost=2.62066, accuracy=0.71800, time=0.31936
........................................
+ + [cora][GradientDescentOptimizer][

## Results of Ablations

Here we produced all the results of our ablations. We loop through the three datasets and train the model on each of them with different combination of optimizer, activation function and number of layers. It will call function `train()` defined in the above cell to perform the training. We keep all the other parameters the same as the original paper. The `train()` function creates the GCN model and train it on the dataset. It will return the accuracy on the test set which is recorded in the cell below.

In [15]:
for dataset in result.keys():
    print(dataset)
    for ablation in result[dataset]:
        print("+", ablation)
        if ablation == 'optimizer':
            for opt in result[dataset][ablation]:
                print("+", "+", opt)
                for lr_idx in range(len(result[dataset][ablation][opt])):
                    print("+", "+", "+", "lr={}".format(lr_list[lr_idx]))
                    for key, val in result[dataset][ablation][opt][lr_idx].items():
                        print("+", "+", "+", "+", key, val)
        if ablation == 'activation':
            for act in result[dataset][ablation]:
                print("+", "+", act)
                for key, val in result[dataset][ablation][act].items():
                    print("+", "+", "+", key, val)
        if ablation == "layers":
            for layer in result[dataset][ablation]:
                print("+", "+", layer)
                for key, val in result[dataset][ablation][layer].items():
                    print("+", "+", "+", key, val)

cora
+ activation
+ + relu
+ + + cost 1.0164862
+ + + accuracy 0.8169999
+ + + time 1.7734153270721436
+ + leaky_relu
+ + + cost 1.0141354
+ + + accuracy 0.813
+ + + time 1.738950490951538
+ + sigmoid
+ + + cost 1.9544058
+ + + accuracy 0.092999995
+ + + time 0.1870582103729248
+ + tanh
+ + + cost 0.97252643
+ + + accuracy 0.816
+ + + time 1.8327789306640625
+ + elu
+ + + cost 0.97768116
+ + + accuracy 0.81700003
+ + + time 1.8021161556243896
+ optimizer
+ + AdamOptimizer
+ + + lr=0.01
+ + + + cost 1.0165086
+ + + + accuracy 0.8169999
+ + + + time 2.0376811027526855
+ + + lr=0.99
+ + + + cost 2.6206574
+ + + + accuracy 0.718
+ + + + time 0.31935954093933105
+ + GradientDescentOptimizer
+ + + lr=0.01
+ + + + cost 1.9530557
+ + + + accuracy 0.144
+ + + + time 1.898348093032837
+ + + lr=0.99
+ + + + cost 1.9027857
+ + + + accuracy 0.44399998
+ + + + time 2.1450817584991455
+ + AdadeltaOptimizer
+ + + lr=0.01
+ + + + cost 1.9531313
+ + + + accuracy 0.145
+ + + + time 2.458294630050659
+ + 

Here is a conclusion of above results in a table format:

Exploration with different activation functions:

| Activation Function | Cora  | Citeseer | PubMed |
|---------------------|-------|----------|--------|
| ReLU                | 0.817 | 0.706    | 0.794  |
| Sigmoid             | 0.092 | 0.077    | 0.218  |
| Tanh                | 0.816 | 0.704    | 0.793  |
| Leaky ReLU          | 0.813 | 0.704    | 0.789  |
| ELU                 | 0.817 | 0.701    | 0.791  |

Exploration with different optimizers and learning rates:

| Optimizer (LR)      | Cora  | Citeseer | PubMed |
|---------------------|-------|----------|--------|
| Adam (0.01)         | 0.818 | 0.706    | 0.794  |
| Adam (0.99)         | 0.718 | 0.679    | 0.666  |
| Adadelta (0.01)     | 0.145 | 0.268    | 0.327  |
| Adadelta (0.99)     | 0.388 | 0.433    | 0.608  |
| RMSProp (0.01)      | 0.760 | 0.660    | 0.774  |
| RMSProp (0.99)      | 0.729 | 0.697    | 0.765  |
| Gradient Descent (0.01) | 0.144 | 0.269 | 0.328  |
| Gradient Descent (0.99) | 0.444 | 0.445 | 0.547  |

Exploration with different number of layers:

| Number of Layers   | Cora  | Citeseer | PubMed |
|--------------------|-------|----------|--------|
| 2 (original setup) | 0.817 | 0.706    | 0.794  |
| 1 (2nd layer removed) | 0.742 | 0.652 | 0.724  |
| 3 (added ReLU layer) | 0.796 | 0.670 | 0.747  |


## References

1. Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR).
2. Kipf, T. (n.d.). gcn: Implementation of Graph Convolutional Networks in TensorFlow. GitHub. https://github.com/tkipf/gcn
3. Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., & Eliassi-Rad, T. (2008). Collective classification in network data. AI Magazine, 29(3), 93.