Notebook based on _Hands-On Graph Neural Networks Using Python_, by Maxime Labonne.

# Ch 5. Including Node Features with Vanilla Neural Networks

This notebook compares the performance of a vanilla neural network (VNN, aka multilayer perceptron) with a graph neural network (GNN) on the Cora and Facebook Page-Page datasets.

The VNN treats the data as being essentially tabular, ignoring network topology (edges). The GNN accounts for network topology.

## 5.1 Shared code

In [1]:
import pandas as pd
import torch
from torch.nn import Linear
import torch.nn.functional as F
from torch_geometric.utils import to_dense_adj

In [2]:
# Simple accuracy measure -- not intended for production use.
# The NNs below use this for training.
def accuracy(y_pred, y_true):
    return torch.sum(y_pred == y_true) / len(y_true)

In [3]:
# We'll use this to present the overall results at the end
accuracy_results = {}

### 5.1.1 Data

In [4]:
def summarize_dataset(dataset):
    data = dataset[0]
    
    print(f'Dataset: {dataset}')
    print('--------------------')
    print(f'# graphs   : {len(dataset)}')
    print(f'# nodes    : {data.x.shape[0]}')
    print(f'# features : {dataset.num_features}')
    print(f'# classes  : {dataset.num_classes}')
    print('')
    print(f'Graph:')
    print('--------------------')
    print(f'Edges are directed       : {data.is_directed()}')
    print(f'Graph has isolated nodes : {data.has_isolated_nodes()}')
    print(f'Graph has loops          : {data.has_self_loops()}')

In [5]:
def to_dense_adjacency_matrix(data):
    # https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/utils/_to_dense_adj.html
    adjacency = to_dense_adj(data.edge_index)[0]
    
    # Identity matrix for self loops so the central node is also considered.
    # https://pytorch.org/docs/stable/generated/torch.eye.html#torch.eye
    adjacency += torch.eye(len(adjacency))

    return adjacency

### 5.1.2 Vanilla neural network (multilayer perceptron)

Consider a basic neural network layer given by the transformation

$$
h_i = x_i W^T
$$

where $x_i$ is the input vector for node $i$, $W$ is the weight matrix, and $h_i$ is the embedding for node $i$.

Equivalently, in matrix notation we have

$$
H = X W^T
$$

where $X$ is the input matrix (rows are nodes, columns are features).

In [6]:
import numpy as np

# Input matrix (3 nodes, 4 features)
X = np.array([
    [3.1, 4.1, 5.9, 2.6],
    [5.3, 5.8, 9.7, 9.1],
    [2.3, 4.5, 6.7, 2.1],
])

# Weight matrix (use transpose to map 4d feature space to 7d embedding space)
W = np.array([
    [1.0, 0.0, 3.0, 4.9],
    [4.2, 0.2, 8.4, 1.6],
    [3.8, 2.0, 3.0, 0.2],
    [0.6, 0.0, 0.0, 4.0],
    [3.8, 0.0, 3.0, 0.2],
    [0.2, 1.2, 3.4, 2.7],
    [3.8, 2.0, 2.0, 1.2],
])

# Embedding data (3 nodes, 7d embedding space)
H = X @ np.transpose(W)
H

array([[ 33.54,  67.56,  38.2 ,  12.26,  30.  ,  32.62,  34.9 ],
       [ 78.99, 119.46,  62.66,  39.58,  51.06,  65.57,  62.06],
       [ 32.69,  70.2 ,  38.26,   9.78,  29.26,  34.31,  33.66]])

In [7]:
class MLP(torch.nn.Module):
    def __init__(self, dim_in, dim_h, dim_out):
        """
        Parameters:
        - dim_in: Input layer dimension
        - dim_h: Hidden layer dimension
        - dim_out: Output layer dimension
        """
        
        super().__init__()
        
        # Linear applies an affine transformation to the incoming data.
        # (I.e, it supports a bias term.)
        # https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
        self.linear1 = Linear(dim_in, dim_h)
        self.linear2 = Linear(dim_h, dim_out)

    def forward(self, x: torch.Tensor):
        
        # Layer 1 linear + nonlinear
        x = self.linear1(x)
        x = torch.relu(x)

        # Layer 2 linear + nonlinear (logit -> log probability)
        x = self.linear2(x)
        # https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html
        return F.log_softmax(x, dim=1)

    def fit(self, data, epochs):
        # Note that we apply the training and validation masks inside this method.
        # Calling code should simply pass in the entire dataset, rather than pre-applying
        # the masks.

        # Computes cross-entropy loss between input logits and target.
        # https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#crossentropyloss
        criterion = torch.nn.CrossEntropyLoss()

        # Updates weights during backpropagation.
        # https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam
        # https://arxiv.org/abs/1412.6980
        # https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam
        optimizer = torch.optim.Adam(self.parameters(), lr=0.01, weight_decay=5e-4)

        # Puts the module in training mode
        self.train()
        
        for epoch in range(epochs+1):

            # https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam.zero_grad
            optimizer.zero_grad()
            
            # This is syntactic sugar for self.forward(data.x), since torch.nn.Module overrides __call__.
            out = self(data.x)

            # CrossEntropyLoss is also a module, so this is a call to criterion.forward().
            train_loss = criterion(out[data.train_mask], data.y[data.train_mask])
            
            # Computes gradients (∂Loss/∂params).
            # Uses reverse-mode autodiff (efficient for deep networks).
            train_loss.backward()

            # Updates weights using the computed gradients.
            optimizer.step()

            if epoch % 20 == 0:
                train_acc = accuracy(out[data.train_mask].argmax(dim=1), data.y[data.train_mask])
                val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
                val_acc = accuracy(out[data.val_mask].argmax(dim=1), data.y[data.val_mask])
                print(f'Epoch {epoch:>3} | Train Loss: {train_loss:.2f} | Train Acc: {train_acc*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')

    def test(self, data):
        
        # Puts the module in evaluation mode
        self.eval()

        # This is syntactic sugar for self.forward(data.x), since torch.nn.Module overrides __call__.
        out = self(data.x)

        # argmax on the log probabilities gives us a classifier
        acc = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])

        return acc

### 5.1.3 Graph neural network

Notice that in the vanilla neural network above, there's no reference to topological information at all: the embedding $h_i$ depends only on the input $x_i$. In graphs where homophily holds (i.e., nearby nodes are similar), we're leaving information on the table by neglecting to incorporate topology.

We can include topological information by considering node neighborhoods. The **graph linear layer** is given by

$$
h_i = \sum_{j \in \mathscr{N}_i} {x_j W^T}
$$

where $i$ is a node, $\mathscr{N}_i$ is the set of neighbors of node $i$, and $x_j$ is the input vector for node $j$.

We would like to write this in matrix notation, but we need a way to incorporate information about neighbors. We can do this by incorporating the adjacency matrix $A$, but we need to make one adjustment. Notice in the equation above, the embedding $h_i$ doesn't depend on the input $x_i$ at all. Intuitively that doesn't make sense. We can fix this by using an adjusted adjacency matrix $\tilde{A} = A + I$, which amounts to adding self-loops to every node in the graph. Then we have

$$
H = \tilde{A}^T X W^T
$$

Notice that there is no normalization here -- a node with lots of neighbors will generally have a larger sum than a node with a small number of neighbors. For now we simply note the fact. We'll return to it in the next chapter, which is on graph convolutional networks (GCNs).

In [8]:
# Adjacency matrix. Gives us node neighborhoods, but not the current node.
A = np.array([
    [0.0, 1.0, 0.0],
    [1.0, 0.0, 1.0],
    [0.0, 1.0, 0.0],
])

# Adjusted adjacency matrix. Includes current node by adding self-loops.
A_adj = A + np.identity(3)
A_adj

array([[1., 1., 0.],
       [1., 1., 1.],
       [0., 1., 1.]])

In [9]:
# Input matrix (3 nodes, 4 features)
X = np.array([
    [3.1, 4.1, 5.9, 2.6],
    [5.3, 5.8, 9.7, 9.1],
    [2.3, 4.5, 6.7, 2.1],
])

# This multiplication adds neighbors to the current node.
np.transpose(A_adj) @ X

array([[ 8.4,  9.9, 15.6, 11.7],
       [10.7, 14.4, 22.3, 13.8],
       [ 7.6, 10.3, 16.4, 11.2]])

In [10]:
# Weight matrix (use transpose to map 4d feature space to 7d embedding space)
W = np.array([
    [1.0, 0.0, 3.0, 4.9],
    [4.2, 0.2, 8.4, 1.6],
    [3.8, 2.0, 3.0, 0.2],
    [0.6, 0.0, 0.0, 4.0],
    [3.8, 0.0, 3.0, 0.2],
    [0.2, 1.2, 3.4, 2.7],
    [3.8, 2.0, 2.0, 1.2],
])

# Embedding data (3 nodes, 7d embedding space)
H = np.transpose(A_adj) @ X @ np.transpose(W)
H

array([[112.53, 187.02, 100.86,  51.84,  81.06,  98.19,  96.96],
       [145.22, 257.22, 139.12,  61.62, 110.32, 132.5 , 130.62],
       [111.68, 189.66, 100.92,  49.36,  80.32,  99.88,  95.72]])

In [11]:
class VanillaGNN(torch.nn.Module):
    def __init__(self, dim_in, dim_h, dim_out):
        super().__init__()
        self.gnn1 = VanillaGNNLayer(dim_in, dim_h)
        self.gnn2 = VanillaGNNLayer(dim_h, dim_out)

    def forward(self, x, adjacency):
        h = self.gnn1(x, adjacency)
        h = torch.relu(h)
        h = self.gnn2(h, adjacency)
        return F.log_softmax(h, dim=1)

    def fit(self, data, adjacency, epochs):
        criterion = torch.nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(self.parameters(), lr=0.01, weight_decay=5e-4)
        self.train()
        for epoch in range(epochs+1):
            optimizer.zero_grad()
            out = self(data.x, adjacency)
            train_loss = criterion(out[data.train_mask], data.y[data.train_mask])
            train_loss.backward()
            optimizer.step()
            if epoch % 20 == 0:
                train_acc = accuracy(out[data.train_mask].argmax(dim=1), data.y[data.train_mask])
                val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
                val_acc = accuracy(out[data.val_mask].argmax(dim=1), data.y[data.val_mask])
                print(f'Epoch {epoch:>3} | Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')

    def test(self, data, adjacency):
        self.eval()
        out = self(data.x, adjacency)
        acc = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])
        return acc


class VanillaGNNLayer(torch.nn.Module):
    def __init__(self, dim_in, dim_out):
        super().__init__()
        self.linear = Linear(dim_in, dim_out, bias=False)

    def forward(self, x, adjacency):
        x = self.linear(x)

        # Matrix multiplication (multiply two sparse matrices)
        # https://pytorch.org/docs/stable/generated/torch.sparse.mm.html
        x = torch.sparse.mm(adjacency, x)
        return x

## 5.2 MLP vs GNN: Cora dataset

### 5.2.1 Get Cora dataset

In [12]:
# Citation network datasets
# https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.Planetoid.html
# https://arxiv.org/abs/1603.08861
from torch_geometric.datasets import Planetoid

In [13]:
# Note that the Cora data has training, validation and test masks by default, so we
# don't have to set those explicitly here. (Below with the Facebook Page-Page data,
# we will have to do it ourselves.)
cora_ds = Planetoid(root=".", name="Cora")
cora_data = cora_ds[0]

In [14]:
summarize_dataset(cora_ds)

Dataset: Cora()
--------------------
# graphs   : 1
# nodes    : 2708
# features : 1433
# classes  : 7

Graph:
--------------------
Edges are directed       : False
Graph has isolated nodes : False
Graph has loops          : False


In [15]:
# https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Data.html
cora_data

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

In [16]:
type(cora_data)

torch_geometric.data.data.Data

In [17]:
# https://pytorch.org/docs/stable/tensors.html
cora_data.x

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

In [18]:
cora_data.x.shape

torch.Size([2708, 1433])

In [19]:
cora_data.y

tensor([3, 4, 4,  ..., 3, 3, 3])

### 5.2.2 Run MLP on Cora dataset

In [20]:
# Prepare data for MLP
cora_df_x = pd.DataFrame(cora_data.x.numpy())
cora_df_x['label'] = pd.DataFrame(cora_data.y)
cora_df_x

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1424,1425,1426,1427,1428,1429,1430,1431,1432,label
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2703,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2704,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2705,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3


In [21]:
# Fit MLP model
# dim_in, dim_h, dim_out
cora_mlp = MLP(cora_ds.num_features, 16, cora_ds.num_classes)
print(cora_mlp)
cora_mlp.fit(cora_data, epochs=100)

MLP(
  (linear1): Linear(in_features=1433, out_features=16, bias=True)
  (linear2): Linear(in_features=16, out_features=7, bias=True)
)
Epoch   0 | Train Loss: 1.96 | Train Acc: 17.14% | Val Loss: 1.95 | Val Acc: 5.20%
Epoch  20 | Train Loss: 0.10 | Train Acc: 100.00% | Val Loss: 1.41 | Val Acc: 52.40%
Epoch  40 | Train Loss: 0.01 | Train Acc: 100.00% | Val Loss: 1.47 | Val Acc: 51.60%
Epoch  60 | Train Loss: 0.01 | Train Acc: 100.00% | Val Loss: 1.47 | Val Acc: 50.20%
Epoch  80 | Train Loss: 0.01 | Train Acc: 100.00% | Val Loss: 1.39 | Val Acc: 52.00%
Epoch 100 | Train Loss: 0.01 | Train Acc: 100.00% | Val Loss: 1.35 | Val Acc: 52.80%


In [22]:
# Sample usage
# Here we run the MLP on the entire input tensor
y = cora_mlp.forward(cora_data.x)
y

tensor([[-6.5678e+00, -5.5498e+00, -6.8391e+00,  ..., -4.6271e+00,
         -7.7510e+00, -6.6441e+00],
        [-6.9242e+00, -7.3987e+00, -6.9618e+00,  ..., -3.1244e-03,
         -8.6277e+00, -1.2594e+01],
        [-5.5351e+00, -8.4512e+00, -7.0643e+00,  ..., -7.3588e-03,
         -7.3747e+00, -1.2271e+01],
        ...,
        [-2.2982e+00, -1.6605e+00, -5.3771e+00,  ..., -2.7791e+00,
         -6.4131e-01, -4.1096e+00],
        [-4.9298e+00, -3.5794e+00, -2.9710e+00,  ..., -3.9188e-01,
         -4.5190e+00, -6.1077e+00],
        [-2.6981e+00, -1.3331e+00, -3.3705e+00,  ..., -1.1810e+00,
         -3.8716e+00, -3.9363e+00]], grad_fn=<LogSoftmaxBackward0>)

In [23]:
y_classes = y.argmax(dim=1)
y_classes

tensor([3, 4, 4,  ..., 5, 4, 4])

In [24]:
y_classes.shape

torch.Size([2708])

In [25]:
# Measure MLP accuracy
acc = cora_mlp.test(cora_data)
print(f'Cora MLP test accuracy: {acc*100:.2f}%')
accuracy_results['cora_mlp'] = acc

Cora MLP test accuracy: 51.40%


### 5.2.3 Run GNN on Cora dataset

In [26]:
# Prepare dataset for GNN
cora_dense_adj = to_dense_adjacency_matrix(cora_data)
cora_dense_adj

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        [0., 1., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 1.],
        [0., 0., 0.,  ..., 0., 1., 1.]])

In [27]:
# Fit GNN model
cora_gnn = VanillaGNN(cora_ds.num_features, 16, cora_ds.num_classes)
print(cora_gnn)
cora_gnn.fit(cora_data, cora_dense_adj, epochs=100)

VanillaGNN(
  (gnn1): VanillaGNNLayer(
    (linear): Linear(in_features=1433, out_features=16, bias=False)
  )
  (gnn2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=7, bias=False)
  )
)
Epoch   0 | Train Loss: 2.201 | Train Acc: 16.43% | Val Loss: 2.22 | Val Acc: 12.80%
Epoch  20 | Train Loss: 0.070 | Train Acc: 99.29% | Val Loss: 2.12 | Val Acc: 72.80%
Epoch  40 | Train Loss: 0.005 | Train Acc: 100.00% | Val Loss: 2.90 | Val Acc: 74.60%
Epoch  60 | Train Loss: 0.002 | Train Acc: 100.00% | Val Loss: 2.91 | Val Acc: 75.00%
Epoch  80 | Train Loss: 0.001 | Train Acc: 100.00% | Val Loss: 2.81 | Val Acc: 75.00%
Epoch 100 | Train Loss: 0.001 | Train Acc: 100.00% | Val Loss: 2.72 | Val Acc: 75.00%


In [28]:
# Measure GNN accuracy
acc = cora_gnn.test(cora_data, cora_dense_adj)
print(f'\nCora GNN test accuracy: {acc*100:.2f}%')
accuracy_results['cora_gnn'] = acc


Cora GNN test accuracy: 75.30%


## 5.3 MLP vs GNN: Facebook Page-Page dataset

### 5.3.1 Get Facebook dataset

In [29]:
from torch_geometric.datasets import FacebookPagePage

In [30]:
fb_ds = FacebookPagePage(root="./FacebookPagePage")
fb_data = fb_ds[0]

In [31]:
summarize_dataset(fb_ds)

Dataset: FacebookPagePage()
--------------------
# graphs   : 1
# nodes    : 22470
# features : 128
# classes  : 4

Graph:
--------------------
Edges are directed       : False
Graph has isolated nodes : False
Graph has loops          : True


In [32]:
# We have to define these explicitly, because unlike the Cora data, the Facebook
# Page-Page data doesn't come with them.
fb_data.train_mask = range(18000)        # Training
fb_data.val_mask = range(18001, 20000)   # Validation
fb_data.test_mask = range(20001, 22470)  # Test

### 5.3.2 Run MLP on Facebook dataset

In [33]:
# Prepare data for MLP
fb_df_x = pd.DataFrame(fb_data.x.numpy())
fb_df_x['label'] = pd.DataFrame(fb_data.y)
fb_df_x

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,119,120,121,122,123,124,125,126,127,label
0,-0.262576,-0.276483,-0.262350,-0.299327,-0.299159,-0.270681,-0.307757,-0.269733,-0.25101,-0.308343,...,-0.273229,-0.223700,-0.284379,-0.224216,-0.209509,-0.255755,-0.215140,-0.375903,-0.223836,0
1,-0.262576,-0.276483,-0.262350,-0.299327,-0.299159,-0.270681,-0.307757,-0.269733,-0.25101,-0.308343,...,-0.234818,-0.223700,-0.284379,-0.197935,-0.147256,-0.255755,-0.215140,-0.364134,-0.128634,2
2,-0.262576,-0.265053,-0.262350,-0.299327,-0.299159,-0.270681,-0.307757,-0.210461,-0.25101,3.222161,...,-0.273229,-0.223700,-0.284379,-0.224216,-0.209509,-0.255755,-0.215140,-0.375903,-0.223836,1
3,-0.246378,-0.276483,-0.241991,-0.299327,-0.299159,-0.270681,-0.307051,-0.269733,-0.25101,-0.308343,...,-0.273229,-0.223700,-0.265534,-0.080353,-0.209509,-0.250560,-0.180260,-0.375903,-0.223836,2
4,-0.262576,-0.276483,-0.262350,-0.299327,-0.299159,-0.270681,-0.307757,-0.269733,-0.25101,-0.308343,...,-0.273229,-0.175312,-0.272613,-0.224216,-0.181153,-0.255755,-0.215140,-0.370639,-0.223836,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22465,-0.262576,-0.276483,-0.262350,-0.296955,-0.299159,-0.270681,-0.307757,-0.269733,-0.25101,-0.308343,...,-0.273229,-0.223700,-0.284379,-0.224216,-0.209509,-0.255755,-0.196685,-0.370115,-0.223836,3
22466,-0.262576,-0.276483,-0.262350,-0.299327,-0.299159,-0.270681,-0.307757,-0.269733,-0.25101,-0.308343,...,-0.273229,-0.221643,-0.284379,-0.224216,-0.209509,-0.255755,-0.215140,-0.375903,-0.223836,1
22467,-0.262576,-0.276483,-0.262350,-0.299327,-0.299159,-0.270681,-0.307757,-0.269733,-0.25101,-0.308343,...,-0.273229,-0.223700,-0.284379,-0.224216,-0.146793,-0.255755,-0.180389,-0.372097,-0.222613,2
22468,-0.262576,-0.276483,-0.262350,-0.299327,-0.299159,-0.270681,-0.307668,-0.269733,-0.25101,-0.308343,...,-0.273229,-0.223700,-0.284379,-0.224216,-0.209509,-0.252456,-0.215140,-0.375903,-0.218148,1


In [34]:
# Fit MLP model
fb_mlp = MLP(fb_ds.num_features, 16, fb_ds.num_classes)
print(fb_mlp)
fb_mlp.fit(fb_data, epochs=100)

MLP(
  (linear1): Linear(in_features=128, out_features=16, bias=True)
  (linear2): Linear(in_features=16, out_features=4, bias=True)
)
Epoch   0 | Train Loss: 1.41 | Train Acc: 18.78% | Val Loss: 1.41 | Val Acc: 19.26%
Epoch  20 | Train Loss: 0.68 | Train Acc: 73.12% | Val Loss: 0.69 | Val Acc: 71.89%
Epoch  40 | Train Loss: 0.59 | Train Acc: 76.56% | Val Loss: 0.63 | Val Acc: 74.34%
Epoch  60 | Train Loss: 0.55 | Train Acc: 78.08% | Val Loss: 0.61 | Val Acc: 75.84%
Epoch  80 | Train Loss: 0.53 | Train Acc: 78.70% | Val Loss: 0.60 | Val Acc: 75.44%
Epoch 100 | Train Loss: 0.52 | Train Acc: 79.23% | Val Loss: 0.60 | Val Acc: 75.94%


In [35]:
# Measure MLP accuracy
acc = fb_mlp.test(fb_data)
print(f'FBPP MLP test accuracy: {acc*100:.2f}%')
accuracy_results['fb_mlp'] = acc

FBPP MLP test accuracy: 75.21%


### 5.3.3 Run GNN on Facebook dataset

In [36]:
# Prepare data for GNN
fb_dense_adj = to_dense_adjacency_matrix(fb_data)
fb_dense_adj

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]])

In [37]:
# Fit GNN model
fb_gnn = VanillaGNN(fb_ds.num_features, 16, fb_ds.num_classes)
print(fb_gnn)
fb_gnn.fit(fb_data, fb_dense_adj, epochs=100)

VanillaGNN(
  (gnn1): VanillaGNNLayer(
    (linear): Linear(in_features=128, out_features=16, bias=False)
  )
  (gnn2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=4, bias=False)
  )
)
Epoch   0 | Train Loss: 47.811 | Train Acc: 28.28% | Val Loss: 46.54 | Val Acc: 26.91%
Epoch  20 | Train Loss: 3.934 | Train Acc: 81.36% | Val Loss: 3.26 | Val Acc: 81.74%
Epoch  40 | Train Loss: 1.685 | Train Acc: 82.42% | Val Loss: 1.44 | Val Acc: 81.89%
Epoch  60 | Train Loss: 1.875 | Train Acc: 82.77% | Val Loss: 1.12 | Val Acc: 83.49%
Epoch  80 | Train Loss: 0.762 | Train Acc: 85.64% | Val Loss: 0.68 | Val Acc: 85.84%
Epoch 100 | Train Loss: 0.592 | Train Acc: 86.46% | Val Loss: 0.53 | Val Acc: 86.49%


In [38]:
# Measure GNN accuracy
acc = fb_gnn.test(fb_data, fb_dense_adj)
print(f'\nFBPP GNN test accuracy: {acc*100:.2f}%')
accuracy_results['fb_gnn'] = acc


FBPP GNN test accuracy: 85.95%


## 5.4 Discussion

The GNN includes topological information (node neighborhood), whereas the MLP doesn't. Including topology with the node features boosts classification performance roughly 10-20% over using node features alone.

In [39]:
accuracy_results

{'cora_mlp': tensor(0.5140),
 'cora_gnn': tensor(0.7530),
 'fb_mlp': tensor(0.7521),
 'fb_gnn': tensor(0.8595)}