# GCNN

In this exercise, we are going to implement the graph convolutional neural network (GCNN) proposed by **Kipf and Welling** that we have seen during the lectures.

To validate that our network is correct, we are going to train it in a common benchmark for GCNN, **ENZYMES**. This dataset is composed of 600 graphs representing the 3D structure of different enzymes (proteins). The task is to  classify those graphs among 6 different classes of enzymes.

First, we are going to load the different dependencies:

In [None]:
#Numpy
import numpy as np
#Library used to load the data
import h5py

#!pip install -q tensorflow-gpu==2.0.0-beta1
try:
  %tensorflow_version 2.x  # Colab only.
except Exception:
  pass

import tensorflow as tf
print(tf.__version__)

`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `2.x  # Colab only.`. This will be interpreted as: `2.x`.


TensorFlow is already loaded. Please restart the runtime to change versions.
2.4.0


Then, we will download the processed graphs from the **ENZYMES** dataset.

In [None]:
!wget https://www.dropbox.com/s/4j9xxjxx64yh2ns/Enzymes.zip
!unzip Enzymes.zip

--2021-01-15 22:43:36--  https://www.dropbox.com/s/4j9xxjxx64yh2ns/Enzymes.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.3.18, 2620:100:6018:18::a27d:312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.3.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/4j9xxjxx64yh2ns/Enzymes.zip [following]
--2021-01-15 22:43:37--  https://www.dropbox.com/s/raw/4j9xxjxx64yh2ns/Enzymes.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucf220fc79bf36d5d327c31e6bbb.dl.dropboxusercontent.com/cd/0/inline/BHGv36F91dYI1GMUj585tkfkhPXB1BSlSzGfM3FwraGyEc12eABDx9hvEm4SxpMhhAm9WzwPBAvuLmgtVGeoofbwNfQbsHB8XfYAvfU13pczEULWagR4s0Ck06w07otfxvE/file# [following]
--2021-01-15 22:43:37--  https://ucf220fc79bf36d5d327c31e6bbb.dl.dropboxusercontent.com/cd/0/inline/BHGv36F91dYI1GMUj585tkfkhPXB1BSlSzGfM3FwraGyEc12eABDx9hvEm4SxpMhhAm9WzwPBAvuLmgtVGeoofbwNfQbsHB8XfYAvfU13pczEULWag

Then, we will load the file and make the train/test partitioning.

In [None]:
# Load data
dataset = h5py.File("Enzymes.hdf5", "r")
graph_labels = dataset['graph_labels'][:]
node_list = dataset['node_list'][:, :]
node_adj = dataset['node_adj'][:, :, :]
print("Graphs:", graph_labels.shape[0])

# Divide for training.
np.random.seed(0)
train_indices = np.random.choice(graph_labels.shape[0], (graph_labels.shape[0]//5)*4, replace=False)
val_indices = np.array([i for i in range(graph_labels.shape[0]) if not(i in train_indices)])

print("Training:", train_indices.shape[0])
print("Validation:", val_indices.shape[0])

Graphs: 600
Training: 480
Validation: 120


At this point, we should have our dataset ready and we can start creating our network. As you can see, the network architecture is already defined and you only need to implement the function *graph_conv*. 

This function receives as input the matrices *H* (features) and *A* (adj_mat), and the number of output features (num_features), and you will need to implement the message passing algorithm of **Kipf and Welling** described in the lectures. Here are some tips:

*   To create the matrix *A_hat* you can use the function *tf.eye* which creates a batch of identity matrices.
*   To create the matrix *D_hat* you can use the function *tf.linalg.diag* which creates a batch of diagonal matrices from the lists of values in the diagonals.
*   To apply the learned weight matrix you can simply use *tf.keras.layers.Dense*.
*   To multiply the matrices you can use the function *tf.matmul*.



In [None]:
def graph_conv(features, adj_mat, num_features):
  ################ TODO
  #Create the a_hat matrix
  a_hat = adj_mat + tf.eye(126)
  #Create the d_hat matrix
  re=tf.reduce_sum(a_hat, 2)
  sq=1.0/(tf.sqrt(re))
  d_root_inv=tf.linalg.diag(sq)
  #Message passing algorithm
  mul=tf.matmul(tf.matmul(d_root_inv,a_hat),d_root_inv)
  hw=tf.keras.layers.Dense(num_features)(features)
  
  output=tf.matmul(mul,hw)
  x = tf.keras.layers.Activation('relu')(output)
  ################ END TODO
  
  #Return result
  return x
  
#Input data
inputs_node_types = tf.keras.Input(shape=(node_list.shape[1]), 
                                   name='batch_node_types',
                                   dtype=tf.int32)
inputs_adj_mat = tf.keras.Input(shape=(node_list.shape[1], node_list.shape[1]), 
                                   name='batch_adj_mat',
                                   dtype=tf.float32)

#Node embedding
max_node_id = np.amax(node_list)+1
node_features = tf.keras.layers.Embedding(max_node_id, 8)(inputs_node_types)

#GCNN layers
node_features = graph_conv(node_features, inputs_adj_mat, 128)
node_features = graph_conv(node_features, inputs_adj_mat, 256)
node_features = graph_conv(node_features, inputs_adj_mat, 512)

#Max pooling
graph_features = tf.reduce_max(node_features, axis=1)

#Last MLP
graph_features = tf.keras.layers.Dropout(0.2)(graph_features)
graph_features = tf.keras.layers.Dense(256)(graph_features)
graph_features = tf.keras.layers.BatchNormalization()(graph_features)
graph_features = tf.keras.layers.Activation('relu')(graph_features)

#Final probabilities
max_graph_label = np.amax(graph_labels)+1
outputs = tf.keras.layers.Dense(max_graph_label, activation='softmax')\
  (graph_features)

Lastly, we will train our network and we should achieve between 60% and 67% accuracy in the validation set.

In [None]:
#Create the model.
model = tf.keras.Model(inputs=[inputs_node_types, inputs_adj_mat], 
                       outputs=outputs, name='shapenet_model')

#Compile the model.
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.keras.optimizers.SGD(
                  learning_rate=0.0001, 
                  momentum=0.98),
              metrics=['accuracy'])

#Fit the model to the data.
model.fit({'batch_node_types': node_list[train_indices, :],
           'batch_adj_mat': node_adj[train_indices, :, :]}, 
            graph_labels[train_indices],
          batch_size=32,
          epochs=3000,
          validation_data=(
            {'batch_node_types': node_list[val_indices, :],
            'batch_adj_mat': node_adj[val_indices, :, :]}, 
            graph_labels[val_indices]))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Epoch 501/3000
Epoch 502/3000
Epoch 503/3000
Epoch 504/3000
Epoch 505/3000
Epoch 506/3000
Epoch 507/3000
Epoch 508/3000
Epoch 509/3000
Epoch 510/3000
Epoch 511/3000
Epoch 512/3000
Epoch 513/3000
Epoch 514/3000
Epoch 515/3000
Epoch 516/3000
Epoch 517/3000
Epoch 518/3000
Epoch 519/3000
Epoch 520/3000
Epoch 521/3000
Epoch 522/3000
Epoch 523/3000
Epoch 524/3000
Epoch 525/3000
Epoch 526/3000
Epoch 527/3000
Epoch 528/3000
Epoch 529/3000
Epoch 530/3000
Epoch 531/3000
Epoch 532/3000
Epoch 533/3000
Epoch 534/3000
Epoch 535/3000
Epoch 536/3000
Epoch 537/3000
Epoch 538/3000
Epoch 539/3000
Epoch 540/3000
Epoch 541/3000
Epoch 542/3000
Epoch 543/3000
Epoch 544/3000
Epoch 545/3000
Epoch 546/3000
Epoch 547/3000
Epoch 548/3000
Epoch 549/3000
Epoch 550/3000
Epoch 551/3000
Epoch 552/3000
Epoch 553/3000
Epoch 554/3000
Epoch 555/3000
Epoch 556/3000
Epoch 557/3000
Epoch 558/3000
Epoch 559/3000
Epoch 560/3000
Epoch 561/3000
Epoch 562/3000
Epoch

<tensorflow.python.keras.callbacks.History at 0x7f38501036d8>