# Training a GCN-based edge classifier

## Using GCN for Edge Classification with `torch_geometric`

In this example, we demonstrate how to use a Graph Convolutional Network (GCN) from the `torch_geometric` library to perform edge classification in a graph. The implementation consists of two main components: the GCN model and a Multi-Layer Perceptron (MLP) for edge classification.

### Components

1. **GCN Model (`GCN`)**: This model utilizes two graph convolutional layers (`GCNConv`) to process node features and generate node embeddings. The forward pass involves applying the convolutional layers sequentially with a ReLU activation function between them.

2. **Edge Classification MLP (`EdgeMLP`)**: This module takes the concatenated embeddings of the node pairs that form an edge and passes them through a two-layer MLP to produce logits corresponding to the possible edge classes.

3. **Dummy Graph Creation (`create_dummy_graph`)**: This function creates a synthetic graph with a random number of nodes and edges. It generates random node features, edges, and assigns random labels to the edges.

### Workflow

- **Node Embeddings**: The GCN processes the input node features and graph structure to produce embeddings for each node.
- **Edge Classification**: The embeddings of the node pairs forming an edge are concatenated and passed through the `EdgeMLP` to classify the edge.


In [1]:
import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv


# Define the GCN model
class GCN(nn.Module):
    def __init__(self, input_dim: int, hidden_dim: int) -> None:
        """Init function for the GCN model.

        Args:
            input_dim: Dimension of the input features
            hidden_dim: Dimension of the hidden layer
        """
        super(GCN, self).__init__()
        self.conv1 = GCNConv(input_dim, hidden_dim)
        self.conv2 = GCNConv(hidden_dim, hidden_dim)

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor) -> torch.Tensor:
        """Forward function for the GCN model.

        Args:
            x: Input node features
            edge_index: Graph edge indices (COO)

        Returns:
            Output (updated) node features with message passing.
        """
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        return x


# Define the MLP module for edge classification
class EdgeMLP(nn.Module):
    def __init__(self, input_dim: int, hidden_dim: int, output_dim: int) -> None:
        """Init function for the EdgeMLP model.

        Args:
            input_dim: Dimension of the input features (concatenated embeddings)
            hidden_dim: Dimension of the hidden layer
            output_dim: Dimension of the output (number of classes)
        """
        super(EdgeMLP, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward function for the EdgeMLP model.

        Args:
            x: Input edge features (concatenated node embeddings)

        Returns:
            Output edge class logits
        """
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [2]:
# Function to create a single dummy graph
def create_dummy_graph(
    max_num_nodes: int, feature_dim: int, num_classes: int
) -> tuple[int, torch.Tensor, torch.Tensor, torch.Tensor]:
    """Create a dummy graph with random edges and labels.

    Args:
        max_num_nodes: Maximum number of nodes in the graph
        feature_dim: Dimension of the node features
        num_classes: Number of target classes

    Returns:
        num_nodes: Number of nodes in the graph
        node_features: Node features tensor
        edge_index: Graph edge indices tensor
        edge_labels: Edge labels tensor

    """
    # Randomly create nodes and edges
    num_nodes = torch.randint(2, max_num_nodes, (1,)).item()
    node_features = torch.randn(num_nodes, feature_dim)

    num_edges = torch.randint(1, num_nodes * 2, (1,)).item()
    edge_index = torch.randint(0, num_nodes, (2, num_edges))

    # Assign random labels to edges
    edge_labels = torch.randint(0, num_classes, (num_edges,))

    return num_nodes, node_features, edge_index, edge_labels

## Training a GCN for Edge Classification

Below example demonstrates how to train a Graph Convolutional Network (GCN) combined with a Multi-Layer Perceptron (MLP) to perform edge classification on multiple graphs. The code outlines the key steps, including model initialization, dataset creation, and the training loop.

### Parameters and Model Initialization

- **Number of Entities**: `num_total_entities = 32` specifies the total number of nodes in each graph.
- **Embedding Dimension**: `embedding_dim = 16` defines the size of each node's feature vector.
- **Number of Classes**: `num_classes = 2` sets the number of classes for edge classification.
- **Model Hyperparameters**:
  - `hidden_dim = 32`: The size of the hidden layer in both the GCN and MLP.
  - `num_graphs = 256`: The number of graphs in the dataset.
  - `batch_size = 128`: The number of graphs processed per batch.
  - `epochs = 1000`: The total number of training epochs.
  - `learning_rate = 0.001`: The learning rate for the optimizer.

### Model Components

- **GCN Model (`GCN`)**: Initialized to process the node features and generate node embeddings.
- **Edge Classification MLP (`EdgeMLP`)**: Initialized to classify edges based on the concatenated node embeddings.
- **Optimizer**: The Adam optimizer is used to update the model parameters, with a cross-entropy loss function.

### Dataset Creation

The dataset is generated using the `create_dummy_graph` function, which creates random graphs with the specified number of nodes, features, edges, and edge labels. The dataset contains `num_graphs` graphs.

### Batch Processing in GNNs

Batch processing in Graph Neural Networks (GNNs) is handled differently compared to traditional neural networks that deal with Euclidean data, such as Convolutional Neural Networks (CNNs). Instead of introducing a batch dimension, batching is achieved by incrementing the edge indices so that they refer to different vectors of node features. This allows multiple graphs to be processed in parallel without adding an explicit batch dimension.

### Training Loop

The training loop iterates over a defined number of epochs (`epochs`). For each epoch:

1. **Batch Processing**:
   - The dataset is shuffled and divided into batches of size `batch_size`.
   - Each batch is processed to extract node features, edge indices, and edge labels.

2. **Forward Pass**:
   - Node embeddings are computed using the GCN model.
   - Edge features are prepared by concatenating the embeddings of the connected nodes (head and tail) for each edge.
   - The edge features are passed through the MLP for classification.

3. **Loss Calculation and Backpropagation**:
   - The cross-entropy loss between the predicted edge classes and the true labels is computed.
   - Gradients are computed and the model parameters are updated using the Adam optimizer.

4. **Metrics Calculation**:
   - The loss and accuracy are computed and printed for each epoch.


In [3]:
# Define the total number of entities and embedding dimensions
num_total_entities = 32  # Total number of entities
embedding_dim = 16  # Dimension of each embedding vector
num_classes = 2  # Number of target classes (for edge classification)

# Parameters
hidden_dim = 32
num_graphs = 256  # Number of graphs in the dataset
batch_size = 128
epochs = 1000
learning_rate = 0.001


# Initialize the models
gcn_model = GCN(input_dim=embedding_dim, hidden_dim=hidden_dim)
edge_mlp = EdgeMLP(
    input_dim=2 * hidden_dim, hidden_dim=hidden_dim, output_dim=num_classes
)

# Create dataset
dataset = [
    create_dummy_graph(num_total_entities, embedding_dim, num_classes)
    for _ in range(num_graphs)
]


optimizer = torch.optim.Adam(
    list(gcn_model.parameters()) + list(edge_mlp.parameters()),
    lr=learning_rate,
)
criterion = nn.CrossEntropyLoss()


# Training loop
for epoch in range(1, epochs + 1):
    total_loss = 0
    total_correct = 0
    total_samples = 0

    # iid shuffle of the dataset
    random.shuffle(dataset)
    batches = [dataset[i : i + batch_size] for i in range(0, len(dataset), batch_size)]

    for batch in batches:
        num_nodes_all = [graph[0] for graph in batch]
        node_features_all = [graph[1] for graph in batch]
        node_feature_batch = torch.cat(node_features_all, dim=0)
        edge_index_all = [graph[2] for graph in batch]
        edge_labels_all = [graph[3] for graph in batch]
        edge_labels_batch = torch.cat(edge_labels_all, dim=0)

        # This is a bit tricky: we need to update the edge indices to reflect the new
        # node ordering. We do this by adding the sum of the number of nodes in the
        # previous graphs. This is how batching is done in GNNs.
        edge_index_batch = []
        num_nodes_sum = 0
        for i, edge_index in enumerate(edge_index_all):
            edge_index_batch.append(edge_index + num_nodes_sum)
            num_nodes_sum += num_nodes_all[i]

        edge_index_batch = torch.cat(edge_index_batch, dim=1)

        # Forward pass through GCN to get node embeddings
        node_embeddings_out = gcn_model(node_feature_batch, edge_index_batch)

        # Prepare edge features by concatenating the embeddings of the head and tail nodes
        edge_embeddings = torch.cat(
            [
                node_embeddings_out[edge_index_batch[0]],
                node_embeddings_out[edge_index_batch[1]],
            ],
            dim=1,
        )

        # Forward pass through MLP for edge classification
        out = edge_mlp(edge_embeddings)

        # Compute loss
        loss = criterion(out, edge_labels_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Accumulate metrics
        total_loss += loss.item()
        preds = out.argmax(dim=1)
        total_correct += (preds == edge_labels_batch).sum().item()
        total_samples += edge_labels_batch.size(0)

    avg_loss = total_loss / len(batches)
    accuracy = total_correct / total_samples

    print(f"Epoch {epoch:02d} | Loss: {avg_loss:.4f} | Accuracy: {accuracy:.4f}")

Epoch 01 | Loss: 0.6933 | Accuracy: 0.5119
Epoch 02 | Loss: 0.6921 | Accuracy: 0.5140
Epoch 03 | Loss: 0.6916 | Accuracy: 0.5282
Epoch 04 | Loss: 0.6911 | Accuracy: 0.5296
Epoch 05 | Loss: 0.6906 | Accuracy: 0.5308
Epoch 06 | Loss: 0.6901 | Accuracy: 0.5405
Epoch 07 | Loss: 0.6896 | Accuracy: 0.5446
Epoch 08 | Loss: 0.6891 | Accuracy: 0.5484
Epoch 09 | Loss: 0.6887 | Accuracy: 0.5500
Epoch 10 | Loss: 0.6883 | Accuracy: 0.5519
Epoch 11 | Loss: 0.6878 | Accuracy: 0.5519
Epoch 12 | Loss: 0.6872 | Accuracy: 0.5536
Epoch 13 | Loss: 0.6869 | Accuracy: 0.5543
Epoch 14 | Loss: 0.6864 | Accuracy: 0.5531
Epoch 15 | Loss: 0.6860 | Accuracy: 0.5526
Epoch 16 | Loss: 0.6857 | Accuracy: 0.5536
Epoch 17 | Loss: 0.6850 | Accuracy: 0.5529
Epoch 18 | Loss: 0.6847 | Accuracy: 0.5581
Epoch 19 | Loss: 0.6839 | Accuracy: 0.5614
Epoch 20 | Loss: 0.6834 | Accuracy: 0.5593
Epoch 21 | Loss: 0.6829 | Accuracy: 0.5605
Epoch 22 | Loss: 0.6821 | Accuracy: 0.5647
Epoch 23 | Loss: 0.6816 | Accuracy: 0.5699
Epoch 24 | 

## Alternative Approach: Using Learnable Node Embeddings for Edge Classification

In this alternative approach, we generate random graphs to create a dataset, but instead of generating random node features for each graph, we utilize learnable embedding vectors for the nodes. This method ensures that all nodes across different graphs share a common set of embeddings, which are updated during the training process.

### Key Differences from the Previous Example

- **Learnable Node Embeddings**: Unlike the first example where node features were randomly generated for each graph, here we initialize a fixed set of embedding vectors for all possible nodes in the dataset. These embeddings are trainable, meaning they are optimized during the training process to better represent the nodes across all graphs.
- **Shared Node Representations**: Since all graphs draw their nodes from the same set of entities, the node embeddings capture more generalizable features that can be beneficial when nodes appear in different contexts across multiple graphs.

### Benefits of this Approach

- **Consistency Across Graphs**: By using a shared embedding space for nodes, the model can learn more consistent representations, making it easier to transfer knowledge across different graphs.
- **Improved Learning**: With learnable embeddings, the model can adapt the node features during training, potentially leading to better performance in tasks like edge classification.

### Example Implementation

In the following example, we demonstrate how to implement this approach using a similar setup as before, but with the introduction of a learnable embedding layer for the nodes. The training loop and other aspects remain largely the same, focusing on updating both the GCN and the node embeddings to improve edge classification accuracy.


In [4]:
# Function to create a single dummy graph using learnable node embeddings
def create_dummy_graph_with_embeddings(
    max_num_nodes: int, num_classes: int
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
    """Create a dummy graph with random edges and labels using pre-defined node embeddings.

    Args:
        max_num_nodes: Maximum number of nodes in the graph
        num_classes: Number of target classes

    Returns:
        node_indices: Tensor of node indices for embedding lookup
        edge_index: Graph edge indices tensor
        edge_labels: Edge labels tensor
    """
    # Randomly create nodes and edges
    num_nodes = torch.randint(2, max_num_nodes, (1,)).item()
    node_indices = torch.randint(0, num_total_entities, (num_nodes,))

    num_edges = torch.randint(1, num_nodes * 2, (1,)).item()
    edge_index = torch.randint(0, num_nodes, (2, num_edges))

    # Assign random labels to edges
    edge_labels = torch.randint(0, num_classes, (num_edges,))

    return node_indices, edge_index, edge_labels


In [5]:
# Define the total number of entities and embedding dimensions
num_total_entities = 32  # Total number of entities
embedding_dim = 16  # Dimension of each embedding vector
num_classes = 2  # Number of target classes (for edge classification)

# Parameters
hidden_dim = 32
num_graphs = 256  # Number of graphs in the dataset
batch_size = 128
epochs = 1000
learning_rate = 0.001

# Initialize the node embeddings
node_embeddings = nn.Embedding(num_total_entities, embedding_dim)

# Initialize the models
gcn_model = GCN(input_dim=embedding_dim, hidden_dim=hidden_dim)
edge_mlp = EdgeMLP(
    input_dim=2 * hidden_dim, hidden_dim=hidden_dim, output_dim=num_classes
)

# Create dataset
dataset = [
    create_dummy_graph_with_embeddings(num_total_entities, num_classes)
    for _ in range(num_graphs)
]

# Optimizer (includes GCN, EdgeMLP, and the learnable node embeddings)
optimizer = torch.optim.Adam(
    list(gcn_model.parameters())
    + list(edge_mlp.parameters())
    + list(node_embeddings.parameters()),  ## Add node embeddings to the optimizer
    lr=learning_rate,
)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(1, epochs + 1):
    total_loss = 0
    total_correct = 0
    total_samples = 0

    # iid shuffle of the dataset
    random.shuffle(dataset)
    batches = [dataset[i : i + batch_size] for i in range(0, len(dataset), batch_size)]

    for batch in batches:
        node_indices_all = [graph[0] for graph in batch]
        edge_index_all = [graph[1] for graph in batch]
        edge_labels_all = [graph[2] for graph in batch]

        node_feature_batch = node_embeddings(torch.cat(node_indices_all, dim=0))
        edge_labels_batch = torch.cat(edge_labels_all, dim=0)

        edge_index_batch = []
        num_nodes_sum = 0
        for i, edge_index in enumerate(edge_index_all):
            edge_index_batch.append(edge_index + num_nodes_sum)
            num_nodes_sum += node_indices_all[i].size(0)

        edge_index_batch = torch.cat(edge_index_batch, dim=1)

        # Forward pass through GCN to get node embeddings
        node_embeddings_out = gcn_model(node_feature_batch, edge_index_batch)

        # Prepare edge features by concatenating the embeddings of the head and tail nodes
        edge_embeddings = torch.cat(
            [
                node_embeddings_out[edge_index_batch[0]],
                node_embeddings_out[edge_index_batch[1]],
            ],
            dim=1,
        )

        # Forward pass through MLP for edge classification
        out = edge_mlp(edge_embeddings)

        # Compute loss
        loss = criterion(out, edge_labels_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Accumulate metrics
        total_loss += loss.item()
        preds = out.argmax(dim=1)
        total_correct += (preds == edge_labels_batch).sum().item()
        total_samples += edge_labels_batch.size(0)

    avg_loss = total_loss / len(batches)
    accuracy = total_correct / total_samples

    print(f"Epoch {epoch:02d} | Loss: {avg_loss:.4f} | Accuracy: {accuracy:.4f}")

Epoch 01 | Loss: 0.7002 | Accuracy: 0.4843
Epoch 02 | Loss: 0.6973 | Accuracy: 0.4848
Epoch 03 | Loss: 0.6951 | Accuracy: 0.4870
Epoch 04 | Loss: 0.6938 | Accuracy: 0.4913
Epoch 05 | Loss: 0.6934 | Accuracy: 0.5057
Epoch 06 | Loss: 0.6925 | Accuracy: 0.5147
Epoch 07 | Loss: 0.6922 | Accuracy: 0.5267
Epoch 08 | Loss: 0.6918 | Accuracy: 0.5286
Epoch 09 | Loss: 0.6913 | Accuracy: 0.5289
Epoch 10 | Loss: 0.6908 | Accuracy: 0.5262
Epoch 11 | Loss: 0.6905 | Accuracy: 0.5296
Epoch 12 | Loss: 0.6901 | Accuracy: 0.5328
Epoch 13 | Loss: 0.6897 | Accuracy: 0.5342
Epoch 14 | Loss: 0.6893 | Accuracy: 0.5352
Epoch 15 | Loss: 0.6889 | Accuracy: 0.5403
Epoch 16 | Loss: 0.6887 | Accuracy: 0.5450
Epoch 17 | Loss: 0.6883 | Accuracy: 0.5457
Epoch 18 | Loss: 0.6879 | Accuracy: 0.5501
Epoch 19 | Loss: 0.6876 | Accuracy: 0.5520
Epoch 20 | Loss: 0.6871 | Accuracy: 0.5547
Epoch 21 | Loss: 0.6869 | Accuracy: 0.5520
Epoch 22 | Loss: 0.6865 | Accuracy: 0.5506
Epoch 23 | Loss: 0.6862 | Accuracy: 0.5533
Epoch 24 | 

As you can see, it takes longer to train, since we are also performing gradient descent on the initial node embeddings. However, there is no performance increase with this method. In the random data generation process, the connections between the nodes, i.e., edges, were literally randomly generated. It means that the relationships between the nodes do not follow any meaningful pattern or structure that could be captured or leveraged by the learnable embeddings, rendering the additional complexity of learnable embeddings ineffective in improving the model's performance. However, in real-world data, the connections between nodes often reflect meaningful relationships, making learnable embeddings valuable for capturing and utilizing these underlying structures, potentially leading to significant performance improvements.

## Alternative Approach 2: Using Learnable and Reusable Node Embeddings for Edge Classification

In this approach, we utilize a set of learnable and reusable node embeddings for edge classification. Unlike the previous method where node features were dynamically generated for each graph, here we consistently use the same set of node embeddings across all graphs and batches. This allows for more efficient training and simpler batch processing.

### Key Features of This Approach

- **Reusable Node Embeddings**: We define a fixed set of node embeddings that are shared across all graphs in the dataset. These embeddings are learnable and updated during training, ensuring that they capture meaningful features relevant to the entire dataset.

- **Simplified Batch Processing**: For every batch, the method uses the same `node_feature_batch`, which is the concatenation of all the node embeddings. Since all graphs in a batch reference the same node embeddings, there is no need to increment the edge indices as in the previous approach. The edge indices directly refer to the corresponding node embeddings.

### Example Implementation

The following code demonstrates how to implement this method, including the initialization of reusable node embeddings, GCN, and MLP models, as well as the training loop that leverages these reusable embeddings.


In [6]:
# Function to create a single dummy graph using learnable node embeddings
def create_dummy_graph_without_incrementing(
    num_total_entities: int, num_classes: int
) -> tuple[torch.Tensor, torch.Tensor]:
    """Create a dummy graph with random edges and labels.

    Args:
        num_total_entities: Total number of entities (nodes)
        num_classes: Number of target classes

    Returns:
        edge_index: Graph edge indices tensor
        edge_labels: Edge labels tensor
    """
    num_edges = torch.randint(1, num_total_entities * 2, (1,)).item()
    edge_index = torch.randint(0, num_total_entities, (2, num_edges))

    # Assign random labels to edges
    edge_labels = torch.randint(0, num_classes, (num_edges,))

    return edge_index, edge_labels


# Define the total number of entities and embedding dimensions
num_total_entities = 32  # Total number of entities (nodes)
embedding_dim = 16  # Dimension of each embedding vector
num_classes = 2  # Number of target classes (for edge classification)

# Parameters
hidden_dim = 32
num_graphs = 256  # Number of graphs in the dataset
batch_size = 128
epochs = 1000
learning_rate = 0.001

# Initialize the node embeddings
node_embeddings = nn.Embedding(num_total_entities, embedding_dim)

# Initialize the models
gcn_model = GCN(input_dim=embedding_dim, hidden_dim=hidden_dim)
edge_mlp = EdgeMLP(
    input_dim=2 * hidden_dim, hidden_dim=hidden_dim, output_dim=num_classes
)

# Create dataset
dataset = [
    create_dummy_graph_without_incrementing(num_total_entities, num_classes)
    for _ in range(num_graphs)
]

# Optimizer (includes GCN, EdgeMLP, and the learnable node embeddings)
optimizer = torch.optim.Adam(
    list(gcn_model.parameters()) + list(edge_mlp.parameters()) + list(node_embeddings.parameters()),
    lr=learning_rate,
)
criterion = nn.CrossEntropyLoss()

# Prepare a fixed node feature batch using the same node embeddings for every batch
node_feature_batch = node_embeddings.weight

# Training loop
for epoch in range(1, epochs + 1):
    total_loss = 0
    total_correct = 0
    total_samples = 0

    # iid shuffle of the dataset
    random.shuffle(dataset)
    batches = [dataset[i : i + batch_size] for i in range(0, len(dataset), batch_size)]

    for batch in batches:
        edge_index_all = [graph[0] for graph in batch]
        edge_labels_all = [graph[1] for graph in batch]

        edge_index_batch = torch.cat(edge_index_all, dim=1)
        edge_labels_batch = torch.cat(edge_labels_all, dim=0)

        # Forward pass through GCN to get node embeddings
        node_embeddings_out = gcn_model(node_feature_batch, edge_index_batch)

        # Prepare edge features by concatenating the embeddings of the head and tail nodes
        edge_embeddings = torch.cat(
            [
                node_embeddings_out[edge_index_batch[0]],
                node_embeddings_out[edge_index_batch[1]],
            ],
            dim=1,
        )

        # Forward pass through MLP for edge classification
        out = edge_mlp(edge_embeddings)

        # Compute loss
        loss = criterion(out, edge_labels_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Accumulate metrics
        total_loss += loss.item()
        preds = out.argmax(dim=1)
        total_correct += (preds == edge_labels_batch).sum().item()
        total_samples += edge_labels_batch.size(0)

    avg_loss = total_loss / len(batches)
    accuracy = total_correct / total_samples

    print(f"Epoch {epoch:02d} | Loss: {avg_loss:.4f} | Accuracy: {accuracy:.4f}")


Epoch 01 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 02 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 03 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 04 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 05 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 06 | Loss: 0.6929 | Accuracy: 0.5094
Epoch 07 | Loss: 0.6929 | Accuracy: 0.5094
Epoch 08 | Loss: 0.6931 | Accuracy: 0.5094
Epoch 09 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 10 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 11 | Loss: 0.6929 | Accuracy: 0.5094
Epoch 12 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 13 | Loss: 0.6929 | Accuracy: 0.5094
Epoch 14 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 15 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 16 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 17 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 18 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 19 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 20 | Loss: 0.6929 | Accuracy: 0.5094
Epoch 21 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 22 | Loss: 0.6930 | Accuracy: 0.5094
Epoch 23 | Loss: 0.6929 | Accuracy: 0.5094
Epoch 24 | 

**This method doesn't work!!**

I've been thinking about why this doesn't work. I initially thought that this basically puts all graphs in a batch into one huge graph—a one huge graph with many, many edges. 

Can someone explain why?