## Example of GraphCL

In [1]:
from sslgraph.utils import Encoder, EvalSemisupevised, EvalUnsupevised, get_dataset
from sslgraph.contrastive.model import GraphCL

### 1. Semi-supervised learning

#### Load dataset

In this example, we evaluate model on NCI1 dataset in the semi-supervised setting.

In [2]:
dataset, dataset_pretrain = get_dataset('NCI1', task='semisupervised')
feat_dim = dataset[0].x.shape[1]
n_class = dataset.num_classes
embed_dim = 128

#### Define your encoder and contrastive model (GraphCL)

For semi-supervised setting, GraphCL uses ResGCN. 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [13]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='subgraph', aug_2='subgraph')

#### Define evaluatior instance

In this example, we use a label rate of 1%.

To setup configurations (num of epochs, learning rates, etc. for pretraining and finetuning), run


`evaluator.setup_train_config(batch_size = 128,
    p_optim = 'Adam', p_lr = 0.0001, p_weight_decay = 0, p_epoch = 100,
    f_optim = 'Adam', f_lr = 0.001, f_weight_decay = 0, f_epoch = 100)`


In [14]:
evaluator = EvalSemisupevised(dataset, dataset_pretrain, 0.01, n_class)

#### Perform evaluation

You can also perform evaluation with grid search on pre-training epoch and
learning rate by running
``
evaluator.grid_search(learning_model=graphcl, encoder=encoder, 
    p_lr_lst=[0.1,0.01,0.001,0.0001], p_epoch_lst=[20,40,60,80,100])
``

In [15]:
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [12:41<00:00,  7.61s/it, loss=2.683487]
Fold 1, finetuning: 100%|██████████| 100/100 [00:12<00:00,  7.76it/s, acc=0.6229, val_loss=3.7151]
Fold 2, finetuning: 100%|██████████| 100/100 [00:11<00:00,  8.84it/s, acc=0.6156, val_loss=9.0472]
Fold 3, finetuning: 100%|██████████| 100/100 [00:10<00:00,  9.91it/s, acc=0.5547, val_loss=3.8888]
Fold 4, finetuning: 100%|██████████| 100/100 [00:09<00:00, 10.01it/s, acc=0.6083, val_loss=5.6933]
Fold 5, finetuning: 100%|██████████| 100/100 [00:10<00:00,  9.26it/s, acc=0.6058, val_loss=5.6814]
Fold 6, finetuning: 100%|██████████| 100/100 [00:12<00:00,  7.77it/s, acc=0.6326, val_loss=2.8538]
Fold 7, finetuning: 100%|██████████| 100/100 [00:10<00:00,  9.72it/s, acc=0.6083, val_loss=2.3298]
Fold 8, finetuning: 100%|██████████| 100/100 [00:10<00:00,  9.59it/s, acc=0.5961, val_loss=2.7519]
Fold 9, finetuning: 100%|██████████| 100/100 [00:10<00:00,  9.64it/s, acc=0.6715, val_loss=1.6425]
Fold 10, finetuning:

(0.6255474090576172, 0.036114975810050964)

To reproduce results in the paper, you may want to perform grid search and run evaluation for 5 times and take the average.

#### Another example with a label rate of 10%.

In [6]:
encoder = Encoder(feat_dim, embed_dim, n_layers=3, gnn='resgcn')
graphcl = GraphCL(embed_dim, aug_1='random2', aug_2='random2')
evaluator = EvalSemisupevised(dataset, dataset_pretrain, 0.1, n_class)
evaluator.evaluate(learning_model=graphcl, encoder=encoder)

Pretraining: epoch 100: 100%|██████████| 100/100 [10:42<00:00,  6.42s/it, loss=2.271644]
Fold 1, finetuning: 100%|██████████| 100/100 [00:14<00:00,  6.80it/s, acc=0.7883, val_loss=0.9015]
Fold 2, finetuning: 100%|██████████| 100/100 [00:14<00:00,  6.79it/s, acc=0.7324, val_loss=1.5312]
Fold 3, finetuning: 100%|██████████| 100/100 [00:14<00:00,  6.77it/s, acc=0.7494, val_loss=1.3493]
Fold 4, finetuning: 100%|██████████| 100/100 [00:14<00:00,  6.78it/s, acc=0.7445, val_loss=1.2314]
Fold 5, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.12it/s, acc=0.7421, val_loss=1.2051]
Fold 6, finetuning: 100%|██████████| 100/100 [00:14<00:00,  6.78it/s, acc=0.7275, val_loss=1.2190]
Fold 7, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.01it/s, acc=0.6861, val_loss=1.5297]
Fold 8, finetuning: 100%|██████████| 100/100 [00:15<00:00,  6.46it/s, acc=0.7105, val_loss=1.3143]
Fold 9, finetuning: 100%|██████████| 100/100 [00:16<00:00,  6.16it/s, acc=0.7640, val_loss=1.2336]
Fold 10, finetuning:

(0.7440389394760132, 0.02773675136268139)

### 2. Unsupervised representation learning

#### Load dataset

In this example, we evaluate model on MUTAG dataset in the unsupervised setting.

In [7]:
dataset = get_dataset('MUTAG', task='unsupervised')

#### Define your encoder and contrastive model (GraphCL)

For unsupervised setting, GraphCL uses GIN with jumping knowledge (with output_dim = hidden_dim * n_layers). 

Available augmentation includes: dropN, maskN, permE, subgraph, random[2-4].

In [8]:
embed_dim = 128
encoder = Encoder(feat_dim=dataset[0].x.shape[1], hidden_dim=embed_dim, n_layers=3, gnn='gin')
graphcl = GraphCL(embed_dim*3, aug_1=None, aug_2='random2')

#### Perform evaluation with grid search

In [9]:
evaluator = EvalUnsupevised(dataset, dataset.num_classes)
acc, sd, paras = evaluator.grid_search(learning_model=graphcl, encoder=encoder, 
                                       p_lr_lst=[0.01], p_epoch_lst=[5,10,15,20])

Pretraining: epoch 5: 100%|██████████| 5/5 [00:00<00:00,  5.45it/s, loss=5.231080]
Pretraining: epoch 10: 100%|██████████| 10/10 [00:01<00:00,  6.01it/s, loss=5.231108]
Pretraining: epoch 15: 100%|██████████| 15/15 [00:02<00:00,  6.57it/s, loss=5.231109]
Pretraining: epoch 20: 100%|██████████| 20/20 [00:03<00:00,  6.47it/s, loss=5.231109]


Best paras: 5 epoch, lr=0.010000, acc=0.8617
