## Example of GraphCL

In [1]:
from sslgraph.utils import Encoder, EvalSemisupevised, EvalUnsupevised, get_dataset
from sslgraph.contrastive.model import GraphCL

### 1. Semi-supervised learning

#### Load dataset

In this example, we evaluate model on NCI1 dataset in the semi-supervised setting.

In [2]:
dataset, dataset_pretrain = get_dataset('DD', task='semisupervised')
feat_dim = dataset[0].x.shape[1]
embed_dim = 128

#### Define your encoder and contrastive model (GraphCL)

For semi-supervised setting, GraphCL uses ResGCN. 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [3]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='random2', aug_2='random2')

#### Define evaluatior instance

In this example, we use a label rate of 1%.

To setup configurations (num of epochs, learning rates, etc. for pretraining and finetuning), run


`evaluator.setup_train_config(batch_size = 128,
    p_optim = 'Adam', p_lr = 0.0001, p_weight_decay = 0, p_epoch = 100,
    f_optim = 'Adam', f_lr = 0.001, f_weight_decay = 0, f_epoch = 100)`


In [4]:
evaluator = EvalSemisupevised(dataset, dataset_pretrain, label_rate=0.1)

#### Perform evaluation

You can also perform evaluation with grid search on pre-training epoch and
learning rate by running
``
evaluator.grid_search(learning_model=graphcl, encoder=encoder, 
    p_lr_lst=[0.1,0.01,0.001,0.0001], p_epoch_lst=[20,40,60,80,100])
``

In [5]:
evaluator.grid_search(learning_model=graphcl, encoder=encoder,
                      p_lr_lst=[0.01,0.001,0.0001], p_epoch_lst=[20,40,60,80,100])

Pretraining: epoch 20: 100%|██████████| 20/20 [03:53<00:00, 11.67s/it, loss=2.863035]
Fold 1, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.40it/s, acc=0.6864, val_loss=1.1754]
Fold 2, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.70it/s, acc=0.7542, val_loss=0.9923]
Fold 3, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.29it/s, acc=0.6356, val_loss=1.7539]
Fold 4, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.19it/s, acc=0.7542, val_loss=0.9935]
Fold 5, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.25it/s, acc=0.8051, val_loss=0.9999]
Fold 6, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.47it/s, acc=0.7203, val_loss=1.1457]
Fold 7, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.47it/s, acc=0.6695, val_loss=1.3771]
Fold 8, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.33it/s, acc=0.7627, val_loss=0.8442]
Fold 9, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.76it/s, acc=0.7094, val_loss=1.7390]
Fold 10, finetuning: 10

Best paras: 20 epoch, lr=0.010000, acc=0.7631


(0.7631392478942871, 0.030191147699952126, (0.01, 20))

To reproduce results in the paper, you may want to perform grid search and run evaluation for 5 times and take the average.

#### Another example with a label rate of 10%.

In [6]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='random2', aug_2='random2')
evaluator = EvalSemisupevised(dataset, dataset_pretrain, label_rate=0.1)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [12:27<00:00,  7.47s/it, loss=2.694387]
Fold 1, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.11it/s, acc=0.7119, val_loss=1.5941]
Fold 2, finetuning: 100%|██████████| 100/100 [00:09<00:00, 11.04it/s, acc=0.7119, val_loss=1.2151]
Fold 3, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.53it/s, acc=0.7203, val_loss=1.7954]
Fold 4, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.53it/s, acc=0.7203, val_loss=1.0913]
Fold 5, finetuning: 100%|██████████| 100/100 [00:09<00:00, 11.08it/s, acc=0.7542, val_loss=1.3340]
Fold 6, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.73it/s, acc=0.7458, val_loss=1.1896]
Fold 7, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.21it/s, acc=0.8305, val_loss=1.0420]
Fold 8, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.46it/s, acc=0.7034, val_loss=1.3943]
Fold 9, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.39it/s, acc=0.6752, val_loss=1.6866]
Fold 10, finetuning:

(0.7554904222488403, 0.038286566734313965)

### 2. Unsupervised representation learning

#### Load dataset

In this example, we evaluate model on MUTAG dataset in the unsupervised setting.

In [7]:
dataset = get_dataset('MUTAG', task='unsupervised', feat_str='')

#### Define your encoder and contrastive model (GraphCL)

For unsupervised setting, GraphCL uses GIN with jumping knowledge (with output_dim = hidden_dim * n_layers). 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [8]:
embed_dim = 128
encoder = Encoder(feat_dim=dataset[0].x.shape[1], hidden_dim=embed_dim, n_layers=3, gnn='gin')
graphcl = GraphCL(embed_dim*3, aug_1=None, aug_2='random2')

#### Perform evaluation with grid search

In [9]:
evaluator = EvalUnsupevised(dataset, log_interval=20)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 20: 100%|██████████| 20/20 [00:06<00:00,  3.09it/s, loss=5.231109]

Best epoch 20: acc 0.8626 +/-(0.0615)





(0.8625730994152047, 0.061473087841019416)