## Example for reproducing GraphCL

In [1]:
from sslgraph.utils import Encoder, EvalSemisupevised, EvalUnsupevised, get_dataset
from sslgraph.contrastive.model import GraphCL

### 1. Semi-supervised learning setting

#### Load dataset

In this example, we evaluate model on NCI1 dataset in the semi-supervised setting.

In [2]:
dataset, dataset_pretrain = get_dataset('NCI1', task='semisupervised')
feat_dim = dataset[0].x.shape[1]
n_class = dataset.num_classes
embed_dim = 128

#### Define your encoder and contrastive model (GraphCL)

For semi-supervised setting, GraphCL uses ResGCN. 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [3]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='dropN', aug_2='dropN')

#### Define evaluatior instance

In this example, we use a label rate of 1%.

To setup configurations (num of epochs, learning rates, etc. for pretraining and finetuning), run


`evaluator.setup_train_config(batch_size = 128,
    p_optim = 'Adam', p_lr = 0.0001, p_weight_decay = 0, p_epoch = 100,
    f_optim = 'Adam', f_lr = 0.001, f_weight_decay = 0, f_epoch = 100)`


In [4]:
evaluator = EvalSemisupevised(dataset, dataset_pretrain, 0.01, n_class)

#### Perform evaluation

You can also perform evaluation with grid search on pre-training epoch and
learning rate by running
``
evaluator.grid_search(learning_model=graphcl, encoder=encoder, 
    p_lr_lst=[0.1,0.01,0.001,0.0001], p_epoch_lst=[20,40,60,80,100])
``

In [5]:
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [09:03<00:00,  5.43s/it, loss=0.831101]
Fold 1, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.23it/s, acc=0.5888, val_loss=2.5162]
Fold 2, finetuning: 100%|██████████| 100/100 [00:14<00:00,  7.00it/s, acc=0.6399, val_loss=7.7734]
Fold 3, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.41it/s, acc=0.5207, val_loss=3.1536]
Fold 4, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.16it/s, acc=0.6131, val_loss=3.7445]
Fold 5, finetuning: 100%|██████████| 100/100 [00:14<00:00,  7.00it/s, acc=0.5888, val_loss=2.7536]
Fold 6, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.21it/s, acc=0.6229, val_loss=1.5371]
Fold 7, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.29it/s, acc=0.6229, val_loss=1.9644]
Fold 8, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.64it/s, acc=0.6302, val_loss=2.1328]
Fold 9, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.25it/s, acc=0.6156, val_loss=2.0700]
Fold 10, finetuning:

(0.6282238960266113, 0.04049957916140556)

To reproduce results in the paper, you may want to perform grid search and run evaluation for 5 times and take the average.

#### Another example with a label rate of 10%.

In [4]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='subgraph', aug_2='random2')
evaluator = EvalSemisupevised(dataset, dataset_pretrain, 0.1, n_class)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [10:48<00:00,  6.49s/it, loss=2.119937]
Fold 1, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.59it/s, acc=0.7859, val_loss=0.8773]
Fold 2, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.57it/s, acc=0.7226, val_loss=1.5608]
Fold 3, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.39it/s, acc=0.7348, val_loss=1.3909]
Fold 4, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.44it/s, acc=0.7421, val_loss=1.4355]
Fold 5, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.44it/s, acc=0.7372, val_loss=1.1765]
Fold 6, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.43it/s, acc=0.7105, val_loss=1.4596]
Fold 7, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.40it/s, acc=0.7056, val_loss=1.3956]
Fold 8, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.40it/s, acc=0.6740, val_loss=1.4866]
Fold 9, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.43it/s, acc=0.7397, val_loss=1.3395]
Fold 10, finetuning:

(0.7408758401870728, 0.025447474792599678)

### 2. Unsupervised learning setting

#### Load dataset

In this example, we evaluate model on MUTAG dataset in the unsupervised setting.

In [7]:
dataset = get_dataset('MUTAG', task='unsupervised')

#### Define your encoder and contrastive model (GraphCL)

For unsupervised setting, GraphCL uses GIN with jumping knowledge (with output_dim = hidden_dim * n_layers). 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [8]:
embed_dim = 128
encoder = Encoder(feat_dim=dataset[0].x.shape[1], hidden_dim=embed_dim, n_layers=3, gnn='gin')
graphcl = GraphCL(embed_dim*3, aug_1=None, aug_2='random2')

#### Perform evaluation with grid search

In [9]:
evaluator = EvalUnsupevised(dataset, dataset.num_classes)
acc, sd, paras = evaluator.grid_search(learning_model=graphcl, encoder=encoder, 
                                       p_lr_lst=[0.01], p_epoch_lst=[5,10,15,20])

Pretraining: epoch 5: 100%|██████████| 5/5 [00:00<00:00,  6.69it/s, loss=5.231107]
Pretraining: epoch 10: 100%|██████████| 10/10 [00:01<00:00,  5.92it/s, loss=5.231107]
Pretraining: epoch 15: 100%|██████████| 15/15 [00:02<00:00,  6.58it/s, loss=5.222739]
Pretraining: epoch 20: 100%|██████████| 20/20 [00:03<00:00,  6.66it/s, loss=5.231109]


Best paras: 5 epoch, lr=0.010000, acc=0.8725
