#### Step 1: Load Model
Before we begin transfer learning we first have to load the model. This can be done in two ways (1) load the `model.pth` which includes the model's architecture and weights, or (2) load the model class itself, defined in `pyg_model.py`. Either way works, but (2) is a bit safer when dealing with unknown files. After loading the model, we then load the state dictionary `model_state_dict.pth` which allows us to reference specific layers of the model and is crucial for examining, extracting, or modifying its underlying architecture.

In [5]:
import torch
import copy
import sys
sys.path.append("../src")

# PC
model_name = "VICRegT1"
model_path = rf"C:\Users\xmoot\Desktop\Data\ssl-seizure-detection\patient_pyg\test\model\{model_name}.pth"
model_dict_path = rf"C:\Users\xmoot\Desktop\Data\ssl-seizure-detection\patient_pyg\test\model\{model_name}_state_dict.pth"

# Load model
model = torch.load(model_path)

# Load state dictionary
model_dict = torch.load(model_dict_path)

# Set the state dictionary to the model
model.load_state_dict(model_dict)
model.eval()

VICRegT1(
  (embedder): gnn_embedder2(
    (edge_mlp): EdgeMLP(
      (mlp): Sequential(
        (0): Linear(in_features=3, out_features=128, bias=True)
        (1): ReLU()
        (2): Linear(in_features=128, out_features=64, bias=True)
        (3): ReLU()
        (4): Linear(in_features=64, out_features=576, bias=True)
      )
    )
    (conv1): NNConv(9, 64, aggr=add, nn=EdgeMLP(
      (mlp): Sequential(
        (0): Linear(in_features=3, out_features=128, bias=True)
        (1): ReLU()
        (2): Linear(in_features=128, out_features=64, bias=True)
        (3): ReLU()
        (4): Linear(in_features=64, out_features=576, bias=True)
      )
    ))
    (conv2): GATConv(64, 128, heads=1)
    (conv3): GATConv(128, 128, heads=1)
    (net_dropout): Dropout(p=0.1, inplace=False)
    (fc1): Linear(in_features=128, out_features=512, bias=True)
    (fc2): Linear(in_features=512, out_features=512, bias=True)
    (fc3): Linear(in_features=512, out_features=512, bias=True)
    (bn_graph1): Bat

#### Step 2: Extract Layers
In this step we extract the layers we want to use for the supervised model downstream. In this case, we need the NNConv and GATConv layers from our model, but since our NNConv actually depends on a separate layer called EdgeMLP (which is just a multilayer perpcetron), we'll need that too, since it's essentially part of the NNConv layer's parameters. You can assign it the old fashioned way using `EdgeMLP_module = model.edge_mlp` but this will create issues later on when we try to make two copies of `EdgeMLP_module` for freezing and unfreezing it, so we use the `copy` package instead.

In [6]:
edge_mlp = copy.deepcopy(model.embedder.edge_mlp)
conv1 = copy.deepcopy(model.embedder.conv1)
conv2 = copy.deepcopy(model.embedder.conv2)
conv3 = copy.deepcopy(model.embedder.conv3)
bn_graph1 = copy.deepcopy(model.embedder.bn_graph1)
bn_graph2 = copy.deepcopy(model.embedder.bn_graph2)
bn_graph3 = copy.deepcopy(model.embedder.bn_graph3)
bn1 = copy.deepcopy(model.embedder.bn1)
bn2 = copy.deepcopy(model.embedder.bn2)

We can examine the weights of a layer with the following:

In [7]:
from models import set_requires_grad

for param_tensor in edge_mlp.state_dict():
    print(param_tensor, "\t", edge_mlp.state_dict()[param_tensor].size())

mlp.0.weight 	 torch.Size([128, 3])
mlp.0.bias 	 torch.Size([128])
mlp.2.weight 	 torch.Size([64, 128])
mlp.2.bias 	 torch.Size([64])
mlp.4.weight 	 torch.Size([576, 64])
mlp.4.bias 	 torch.Size([576])


And here's a test running some random input through the EdgeMLP, to verify it's functional.

In [8]:
# Create some dummy data
dummy_edge_attr = torch.randn(10, 3)  # 10 edges, each with `num_edge_features` features

# Ensure everything is on the same device
device = "cuda"
dummy_edge_attr = dummy_edge_attr.to(device)
edge_mlp = edge_mlp.to(device)

# Run the data through the `edge_mlp` layer
output = edge_mlp(dummy_edge_attr)
print(output)

tensor([[ 90.8198, 166.7563, -65.1881,  ...,  33.9478,  27.4430, -26.9420],
        [ 51.7824,  96.8568, -40.4229,  ...,  19.8774,  12.1248, -11.4472],
        [ 64.5977, 119.4870, -45.8949,  ...,  25.7579,  21.9166, -21.7708],
        ...,
        [ 11.1861,  28.3259,  -8.5658,  ...,   9.3842,  10.6208,  -6.5623],
        [ 90.4008, 165.8797, -62.6314,  ...,  34.0480,  30.7454, -30.2114],
        [ -8.4273,  21.2802,  -8.8721,  ...,  21.9474,  16.3903,  -1.0682]],
       device='cuda:0', grad_fn=<AddmmBackward0>)


#### Step 3: Downstream Task
After extracting the layers and verifying everything is functional, we can now either (1) use the layers and their weights as initialization, or (2) use the layers but freeze the weights (i.e. they won't be updated during training). Below uses method (1), using our transferred layers as the initial layers of our network, and then we add on newer (untrained) layers on top of it. I've opted to use another `NNConv` and `GATConv` layer from `PyG`, adding onto the existing `NNConv` and `GATConv` layers, as well as a `global_mean_pool` layer and two fully connected layers. Now we're ready to go!

In [11]:
from models import downstream3

pretrained_layers = {"edge_mlp": edge_mlp,
                        "conv1": conv1,
                        "conv2": conv2,
                        "conv3": conv3,
                        "bn_graph1": bn_graph1,
                        "bn_graph2": bn_graph2,
                        "bn_graph3": bn_graph3,
                        }


config = {"classify": "multiclass", "head": "linear"}

model = downstream3(config, pretrained_layers=pretrained_layers, requires_grad=True).to(device)

We can check wehther the pretrained layers are frozen or not with the following.

In [15]:
def check_frozen_status(model):
    # Checking the encoder
    print("Checking Encoder:")
    for name, param in model.encoder.named_parameters():
        print(f"Layer: {name}, Frozen: {not param.requires_grad}")

    # Checking the classifier
    print("\nChecking Classifier:")
    for name, param in model.classifier.named_parameters():
        print(f"Layer: {name}, Frozen: {not param.requires_grad}")

# Assuming 'model' is an instance of 'downstream3'

config = {"classify": "multiclass", "head": "softmax"}
model = downstream3(config, pretrained_layers=pretrained_layers, requires_grad=False).to(device)

# Check if the encoder and classifier layers are frozen or not
check_frozen_status(model)


Checking Encoder:
Layer: conv1.bias, Frozen: True
Layer: conv1.nn.mlp.0.weight, Frozen: True
Layer: conv1.nn.mlp.0.bias, Frozen: True
Layer: conv1.nn.mlp.2.weight, Frozen: True
Layer: conv1.nn.mlp.2.bias, Frozen: True
Layer: conv1.nn.mlp.4.weight, Frozen: True
Layer: conv1.nn.mlp.4.bias, Frozen: True
Layer: conv1.lin.weight, Frozen: True
Layer: conv1.edge_mlp.mlp.0.weight, Frozen: True
Layer: conv1.edge_mlp.mlp.0.bias, Frozen: True
Layer: conv1.edge_mlp.mlp.2.weight, Frozen: True
Layer: conv1.edge_mlp.mlp.2.bias, Frozen: True
Layer: conv1.edge_mlp.mlp.4.weight, Frozen: True
Layer: conv1.edge_mlp.mlp.4.bias, Frozen: True
Layer: conv2.att_src, Frozen: True
Layer: conv2.att_dst, Frozen: True
Layer: conv2.bias, Frozen: True
Layer: conv2.lin_src.weight, Frozen: True
Layer: conv3.att_src, Frozen: True
Layer: conv3.att_dst, Frozen: True
Layer: conv3.bias, Frozen: True
Layer: conv3.lin_src.weight, Frozen: True
Layer: bn_graph1.module.weight, Frozen: True
Layer: bn_graph1.module.bias, Frozen: T

#### Finetuning on Downstream Task

In [16]:
import sys
import torch
sys.path.append("../src")
from preprocess import create_data_loaders

# PC
data_path = r"C:\Users\xmoot\Desktop\Data\ssl-seizure-detection\patient_pyg\jh101\supervised\jh101_combined.pt"
data = torch.load(data_path)

loaders, _ = create_data_loaders(data, val_ratio=0.2, test_ratio=0.1, batch_size=32, num_workers=4, model_id="downstream3", train_ratio=0.2)

Total number of examples in dataset: 2307.
Total number of examples used: 2307.
Number of training examples: 461. Number of training batches: 15.
Number of validation examples: 461. Number of validation batches: 15.
Number of test examples: 230. Number of test batches: 8.


In [17]:
train_loader, val_loader, test_loader = loaders

for batch in train_loader:
    batch = batch.to(device)
    output = model(batch)
    row_sums = torch.sum(output, dim=1)
    print(f"Output size: {output.size()}") # We should expect a size of [batch_size, 3]
    print(f"Model output: {output}") 
    print(f"Row sums: {row_sums}") # We should expect the row sums to be 1, by the softmax head
    break

Output size: torch.Size([32, 3])
Model output: tensor([[1.6486e-04, 7.0911e-05, 9.9976e-01],
        [4.8101e-06, 2.1530e-04, 9.9978e-01],
        [1.4486e-08, 1.2311e-06, 1.0000e+00],
        [2.5720e-01, 6.4257e-02, 6.7855e-01],
        [6.1876e-06, 2.4727e-04, 9.9975e-01],
        [4.1747e-07, 2.3232e-05, 9.9998e-01],
        [8.0231e-03, 2.0813e-02, 9.7116e-01],
        [2.2811e-08, 1.7397e-08, 1.0000e+00],
        [1.7307e-08, 1.9248e-07, 1.0000e+00],
        [2.0880e-09, 2.3274e-08, 1.0000e+00],
        [1.1480e-09, 3.4492e-08, 1.0000e+00],
        [8.6305e-07, 3.7996e-06, 1.0000e+00],
        [5.6311e-07, 3.5634e-08, 1.0000e+00],
        [3.1421e-15, 5.4503e-15, 1.0000e+00],
        [1.4451e-02, 3.9844e-02, 9.4570e-01],
        [2.5450e-03, 2.0745e-02, 9.7671e-01],
        [1.3459e-02, 4.0130e-06, 9.8654e-01],
        [3.4352e-07, 2.1390e-06, 1.0000e+00],
        [3.1483e-09, 1.3756e-07, 1.0000e+00],
        [4.2724e-07, 1.8677e-05, 9.9998e-01],
        [2.6758e-04, 1.6453e-07, 

#### Automatic Transfer Learning
If you want to do all of the above in one step, see below. Note that this implemented in `train.py` when you select the `downstream1` or `downstream2` models.

In [18]:
import sys
import torch
sys.path.append("../src")

from preprocess import extract_layers
from models import downstream3

# PC
transfer_id = "VICRegT1"
model_path = rf"C:\Users\xmoot\Desktop\Data\ssl-seizure-detection\patient_pyg\test\model\{transfer_id}.pth"
model_dict_path = rf"C:\Users\xmoot\Desktop\Data\ssl-seizure-detection\patient_pyg\test\model\{transfer_id}_state_dict.pth"

# Extract pretrained layers
extracted_layers = extract_layers(model_path, model_dict_path, transfer_id)

config = {
    "classify": "multiclass",
    "head": "softmax",
}

# Create downstream model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = downstream3(config, pretrained_layers=extracted_layers, requires_grad=False).to(device)

Load the data for supervised learning.

In [None]:
# Paths
from preprocess import create_data_loaders

# Mac
# data_path = "/Users/xaviermootoo/Documents/Data/ssl-seizure-detection/patient_pyg/jh101/supervised/jh101_run1.pt"

# PC
data_path = r"C:\Users\xmoot\Desktop\Data\ssl-seizure-detection\patient_pyg\jh101\supervised\jh101_run1.pt"
data = torch.load(data_path)
loaders, _ = create_data_loaders(data, val_ratio=0.2, test_ratio=0.1, batch_size=32, num_workers=4, model_id="supervised")
train_loader = loaders[0]

And we can see that the model output is working!

In [20]:
for batch in train_loader:
    batch = batch.to(device)
    print(f"Model Output: {model(batch)}")
    break

Model Output: tensor([[6.9798e-01, 7.7237e-03, 2.9430e-01],
        [4.7733e-01, 2.4406e-01, 2.7862e-01],
        [6.1853e-01, 3.6772e-02, 3.4470e-01],
        [8.0432e-01, 1.8714e-02, 1.7696e-01],
        [6.0336e-01, 1.8149e-01, 2.1514e-01],
        [9.1654e-01, 2.0765e-02, 6.2692e-02],
        [2.1562e-01, 1.7578e-01, 6.0861e-01],
        [2.0229e-01, 5.4471e-02, 7.4324e-01],
        [3.5535e-03, 3.0523e-03, 9.9339e-01],
        [6.3338e-03, 3.1206e-03, 9.9055e-01],
        [2.5598e-01, 1.8944e-01, 5.5458e-01],
        [6.1665e-01, 5.4741e-02, 3.2861e-01],
        [5.9272e-01, 3.2995e-02, 3.7429e-01],
        [5.6594e-01, 1.9310e-01, 2.4096e-01],
        [5.1703e-01, 1.6244e-01, 3.2053e-01],
        [9.3663e-01, 3.0902e-03, 6.0276e-02],
        [7.8324e-01, 3.6755e-02, 1.8000e-01],
        [7.9359e-01, 1.8001e-02, 1.8841e-01],
        [4.6237e-01, 4.4573e-02, 4.9305e-01],
        [5.6424e-01, 1.1708e-01, 3.1868e-01],
        [7.3546e-01, 7.2142e-03, 2.5733e-01],
        [7.2357e-01,