#### Step 1: Load Model
Before we begin transfer learning we first have to load the model. This can be done in two ways (1) load the `model.pth` which includes the model's architecture and weights, or (2) load the model class itself, defined in `pyg_model.py`. Either way works, but (2) is a bit safer when dealing with unknown files. After loading the model, we then load the state dictionary `model_state_dict.pth` which allows us to reference specific layers of the model and is crucial for examining, extracting, or modifying its underlying architecture.

In [1]:
import torch
import copy
import sys
sys.path.append("../src")

# Mac
model_path = "/Users/xaviermootoo/Documents/Data/ssl-seizure-detection/models/jh101/model/temporal_shuffling.pth"
model_dict_path = "/Users/xaviermootoo/Documents/Data/ssl-seizure-detection/models/jh101/model/temporal_shuffling_state_dict.pth"

# Load model
model = torch.load(model_path)

# Load state dictionary
model_dict = torch.load(model_dict_path)

# Set the state dictionary to the model
model.load_state_dict(model_dict)
model.eval()





temporal_shuffling(
  (encoder): gnn_encoder(
    (edge_mlp): EdgeMLP(
      (mlp): Sequential(
        (0): Linear(in_features=3, out_features=128, bias=True)
        (1): ReLU()
        (2): Linear(in_features=128, out_features=64, bias=True)
        (3): ReLU()
        (4): Linear(in_features=64, out_features=576, bias=True)
      )
    )
    (conv1): NNConv(9, 64, aggr=add, nn=EdgeMLP(
      (mlp): Sequential(
        (0): Linear(in_features=3, out_features=128, bias=True)
        (1): ReLU()
        (2): Linear(in_features=128, out_features=64, bias=True)
        (3): ReLU()
        (4): Linear(in_features=64, out_features=576, bias=True)
      )
    ))
    (conv2): GATConv(64, 32, heads=1)
    (fc1): Linear(in_features=32, out_features=64, bias=True)
    (fc2): Linear(in_features=64, out_features=128, bias=True)
    (fc3): Linear(in_features=128, out_features=256, bias=True)
  )
  (fc): Linear(in_features=512, out_features=1, bias=True)
)

#### Step 2: Extract Layers
In this step we extract the layers we want to use for the supervised model downstream. In this case, we need the NNConv and GATConv layers from our model, but since our NNConv actually depends on a separate layer called EdgeMLP (which is just a multilayer perpcetron), we'll need that too, since it's essentially part of the NNConv layer's parameters. You can assign it the old fashioned way using `EdgeMLP_module = model.edge_mlp` but this will create issues later on when we try to make two copies of `EdgeMLP_module` for freezing and unfreezing it, so we use the `copy` package instead.

In [2]:
EdgeMLP_pretrained = copy.deepcopy(model.encoder.edge_mlp)
NNConv_pretrained = copy.deepcopy(model.encoder.conv1)
GATConv_pretrained = copy.deepcopy(model.encoder.conv2)

We can examine the weights of a layer with the following:

In [3]:
for param_tensor in EdgeMLP_pretrained.state_dict():
    print(param_tensor, "\t", EdgeMLP_pretrained.state_dict()[param_tensor].size())


mlp.0.weight 	 torch.Size([128, 3])
mlp.0.bias 	 torch.Size([128])
mlp.2.weight 	 torch.Size([64, 128])
mlp.2.bias 	 torch.Size([64])
mlp.4.weight 	 torch.Size([576, 64])
mlp.4.bias 	 torch.Size([576])


And here's a test running some random input through the EdgeMLP, to verify it's functional.

In [4]:
# Create some dummy data
dummy_edge_attr = torch.randn(10, 3)  # 10 edges, each with `num_edge_features` features

# Run the data through the `edge_mlp` layer
output = EdgeMLP_pretrained(dummy_edge_attr)
print(output)

tensor([[ 0.0321,  0.0009,  0.0084,  ...,  0.0310, -0.0705,  0.0445],
        [-0.0479,  0.0043, -0.0925,  ...,  0.0205, -0.1029,  0.1922],
        [-0.0017, -0.1137, -0.0789,  ...,  0.1040, -0.0402,  0.0833],
        ...,
        [-0.0005, -0.0056,  0.0161,  ...,  0.0091, -0.0817,  0.0355],
        [-0.0350, -0.0275, -0.0349,  ...,  0.0347, -0.0643,  0.1101],
        [-0.1118,  0.0496, -0.0055,  ...,  0.0199,  0.0280,  0.0592]],
       grad_fn=<AddmmBackward0>)


#### Step 3: Downstream Task
After extracting the layers and verifying everything is functional, we can now either (1) use the layers and their weights as initialization, or (2) use the layers but freeze the weights (i.e. they won't be updated during training). Below uses method (1), using our transferred layers as the initial layers of our network, and then we add on newer (untrained) layers on top of it. I've opted to use another `NNConv` and `GATConv` layer from `PyG`, adding onto the existing `NNConv` and `GATConv` layers, as well as a `global_mean_pool` layer and two fully connected layers. Now we're ready to go!

In [6]:
from models import supervised_downstream1
config = {
    "hidden_channels": [64, 64, 32],
    "dropout": 0.1,
}

pretrained_layers = [EdgeMLP_pretrained, NNConv_pretrained, GATConv_pretrained]

model = supervised_downstream1(config, pretrained_layers, frozen=False)

In [28]:
def check_frozen_status(model):
    layers_to_check = ["conv1", "conv2"]  # Names of the layers in your model that are pretrained

    for layer_name in layers_to_check:
        layer = getattr(model, layer_name)
        for name, param in layer.named_parameters():
            print(f"Layer: {layer_name}, Parameter: {name}, Frozen: {not param.requires_grad}")

# Check if the pretrained layers are frozen or not
check_frozen_status(model)


Layer: conv1, Parameter: bias, Frozen: True
Layer: conv1, Parameter: nn.mlp.0.weight, Frozen: True
Layer: conv1, Parameter: nn.mlp.0.bias, Frozen: True
Layer: conv1, Parameter: nn.mlp.2.weight, Frozen: True
Layer: conv1, Parameter: nn.mlp.2.bias, Frozen: True
Layer: conv1, Parameter: nn.mlp.4.weight, Frozen: True
Layer: conv1, Parameter: nn.mlp.4.bias, Frozen: True
Layer: conv1, Parameter: lin.weight, Frozen: True
Layer: conv1, Parameter: edge_mlp.mlp.0.weight, Frozen: True
Layer: conv1, Parameter: edge_mlp.mlp.0.bias, Frozen: True
Layer: conv1, Parameter: edge_mlp.mlp.2.weight, Frozen: True
Layer: conv1, Parameter: edge_mlp.mlp.2.bias, Frozen: True
Layer: conv1, Parameter: edge_mlp.mlp.4.weight, Frozen: True
Layer: conv1, Parameter: edge_mlp.mlp.4.bias, Frozen: True
Layer: conv2, Parameter: att_src, Frozen: True
Layer: conv2, Parameter: att_dst, Frozen: True
Layer: conv2, Parameter: bias, Frozen: True
Layer: conv2, Parameter: lin_src.weight, Frozen: True


#### Finetuning on Downstream Task

In [24]:
from preprocess import create_data_loaders

# Paths
data_path = "/Users/xaviermootoo/Documents/Data/ssl-seizure-detection/patient_pyg/jh101/supervised/jh101_run1.pt"
data = torch.load(data_path)

loaders, _ = create_data_loaders(data, data_size=1.0, val_ratio=0.2, test_ratio=0.1, batch_size=32, num_workers=4, model_id="supervised")

Total number of examples in dataset: 1113.
Total number of examples used: 1113.
Number of training examples: 890. Number of training batches: 28.
Number of validation examples: 223. Number of validation batches: 7.
Number of test examples: 112. Number of test batches: 4.


In [25]:
train_loader, val_loader, test_loader = loaders

for batch in train_loader:
    print(f"Model output: {model(batch).size()}")
    break

Model output: torch.Size([32])


#### Automatic Transfer
If you want to do all of the above in one step, see below. Note that this implemented in `train.py` when you select the `downstream1` or `downstream2` models.

In [1]:
import sys
import torch
sys.path.append("../src")

from preprocess import extract_layers
from models import downstream1, downstream2

# Mac
model_path = "/Users/xaviermootoo/Documents/Data/ssl-seizure-detection/models/jh101/model/temporal_shuffling.pth"
model_dict_path = "/Users/xaviermootoo/Documents/Data/ssl-seizure-detection/models/jh101/model/temporal_shuffling_state_dict.pth"

# PC
model_path = r"C:\Users\xmoot\Desktop\Models\ssl-seizure-detection\jh101\model\temporal_shuffling.pth"
model_dict_path = r"C:\Users\xmoot\Desktop\Models\ssl-seizure-detection\jh101\model\temporal_shuffling_state_dict.pth"

extracted_layers = extract_layers(model_path, model_dict_path, "temporal_shuffling")

config1 = {
    "hidden_channels": [64, 64, 32],
    "dropout": 0.1,
}

config2 = {
    "hidden_channels": 32,
    "dropout": 0.1,
}

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model1 = downstream1(config1, extracted_layers, frozen=False).to(device)
model2 = downstream2(config2, extracted_layers, frozen=False).to(device)



Load the data for supervised learning.

In [2]:
# Paths
from preprocess import create_data_loaders

# Mac
data_path = "/Users/xaviermootoo/Documents/Data/ssl-seizure-detection/patient_pyg/jh101/supervised/jh101_run1.pt"

# PC
data_path = r"C:\Users\xmoot\Desktop\Data\ssl-seizure-detection\patient_pyg\jh101\supervised\jh101_run1.pt"


data = torch.load(data_path)
loaders, _ = create_data_loaders(data, data_size=0.1, val_ratio=0.2, test_ratio=0.1, batch_size=32, num_workers=4, model_id="supervised")
train_loader = loaders[0]

Total number of examples in dataset: 1113.
Total number of examples used: 1113.
Number of training examples: 890. Number of training batches: 28.
Number of validation examples: 223. Number of validation batches: 7.
Number of test examples: 112. Number of test batches: 4.


And we can see that the model output is working!

In [4]:
for batch in train_loader:
    batch = batch.to(device)
    print(f"Model Output: {model1(batch)}")
    print(f"Model Output: {model2(batch)}")
    break

Model Output: tensor([5.5096, 3.2366, 6.4307, 3.9484, 4.0148, 4.6887, 5.0815, 4.0649, 4.1764,
        3.5129, 5.6768, 6.5692, 3.6030, 4.9131, 2.8050, 3.2610, 3.0446, 2.3526,
        6.0145, 1.4550, 7.0167, 5.3131, 4.2521, 3.1954, 4.2855, 3.9693, 4.2903,
        2.9354, 2.5056, 4.2362, 5.4698, 3.6607], device='cuda:0',
       grad_fn=<SqueezeBackward1>)
Model Output: tensor([0.9967, 0.5494, 1.3112, 0.5155, 0.8721, 0.7237, 0.7408, 0.8231, 0.8585,
        0.9748, 0.8397, 0.9853, 0.4972, 0.6344, 0.6492, 0.8180, 1.0487, 0.8021,
        0.9034, 0.6751, 0.8598, 0.9663, 0.9806, 0.8835, 1.2074, 0.6641, 1.3122,
        0.7555, 0.9694, 0.6843, 0.7117, 0.7173], device='cuda:0',
       grad_fn=<SqueezeBackward1>)
