## Tiny GIN for OGBG molhiv prediction - leaderboard submission specifics

Hyperparameter values used:

- num_layers: 2

- hidden_dim: 64

- dropout: 0.5

- learning_rate: 0.001

- epochs: 50

- batch_size: 32

- weight_decay: 1e-6

Trained network from scratch on CPU of Google Colab L4 instance (used L4 GPU for speed in hyperparameter search, CPU for deterministic final results),
with the following random seeds and obtained the following results:

(it will give same results if using CPU of Google Colab L4 and same version of software; see the Google Colab notebook results above; randomness affects the training process when training from scratch)

```
seed 0: 0.792270 valid, 0.793741 test
seed 1: 0.808428 valid, 0.798843 test
seed 2: 0.810634 valid, 0.780251 test
seed 3: 0.790926 valid, 0.791958 test
seed 4: 0.802656 valid, 0.798700 test
seed 5: 0.805311 valid, 0.774121 test
seed 6: 0.807013 valid, 0.764565 test
seed 7: 0.804723 valid, 0.768452 test
seed 8: 0.788810 valid, 0.776000 test
seed 9: 0.798743 valid, 0.788814 test
```

If you like Jupyter notebooks and/or Google Colab, you can check the results above by copying this notebook and run the commands to reproduce the result in Google Colab's environment (please use CPU of L4 instance to reproduce randomness exactly):

(This notebook is https://colab.research.google.com/drive/1c3c9SO029Tv5TP_EV1Zeevof-r8CFvE4?usp=sharing ;

original notebook at 4 decimal places rocauc was at https://colab.research.google.com/drive/11lx7DRuEhfdRGDu1Q_oWvILGQKrP2IsP?usp=sharing )

Using `torch.mean()` and `torch.std()` to report the mean and unbiased sample standard deviation, one obtains:

```
>> test_rocaucs = torch.tensor([0.793741, 0.798843, 0.780251, 0.791958, 0.798700, 0.774121, 0.764565, 0.768452, 0.776000, 0.788814])
>> print(f"test rocauc mean: {test_rocaucs.mean():.6f}, test rocauc std: {test_rocaucs.std():.6f}")
>> valid_rocaucs = torch.tensor([0.792270, 0.808428, 0.810634, 0.790926, 0.802656, 0.805311, 0.807013, 0.804723, 0.788810, 0.798743])
>> print(f"valid rocauc mean: {valid_rocaucs.mean():.6f}, valid rocauc std: {valid_rocaucs.std():.6f}")

test rocauc mean: 0.783544, test rocauc std: 0.012520
valid rocauc mean: 0.800951, valid rocauc std: 0.007822
```

Note that the **test set performance is NEVER consulted or checked by the code in selecting the model**.
For each random seed during training the held out validation set performance is checked and the epoch with best validation set performance observed so far which did not have worse training performance than the previous best is kept as the result.

CSVs for validation and test set predictions vs ground truth will be generated as part of the script, if however you want examples for each of the seeds, I can provide upon request. A .pkl with the model weights will be generated at the end of the script which could be reused for inference. If you would like example weights, they are available upon request (or I can add them here if multiple people ask and would not want to generate on their CPU or in Google colab).

## Results

Test/validation CSVs and .pkl with weights will be generated if you copy this notebook and run one of the cells below which trains network from scratch and then tests it on the dataset (you can run multiple seeds to see variation due to randomness in different training runs).

## Warning on randomness (How to reproduce training from scratch exactly if needed)

Note even with a fixed seed, running this on a CPU-only vs using CPU of a L4 instance in Google Colab gives different random numbers (on L4 using CPU I can get same results on different machines on different runs on different days). I am tracking down the dependency that is the cause of this which is only potentially significant for tiebreaking in competitions with no central authority scoring the entries, in the meantime see results captured below and reproduce in Google Colab using L4 instance (or get something similar elsewhere). It is possible that something is getting initialized on the GPU before `.to('cpu')` is called that is not reset by `reset_parameters`, so if you want to reproduce this result exactly please use L4 instance of Google Colab with dependencies recorded above. Otherwise you may get different random numbers and a slightly different result. I have checked that I can reproduce the results below across multiple instances on multiple days.


In [None]:
!git clone https://github.com/willy-b/tiny-GIN-for-ogbg-molhiv
%cd tiny-GIN-for-ogbg-molhiv
!git checkout update-from-4-to-6-digits-of-precision-in-metrics-per-ogb-guideline

Cloning into 'tiny-GIN-for-ogbg-molhiv'...
remote: Enumerating objects: 13, done.[K
remote: Counting objects: 100% (13/13), done.[K
remote: Compressing objects: 100% (10/10), done.[K
remote: Total 13 (delta 4), reused 12 (delta 3), pack-reused 0[K
Receiving objects: 100% (13/13), 30.08 KiB | 603.00 KiB/s, done.
Resolving deltas: 100% (4/4), done.
/content/tiny-GIN-for-ogbg-molhiv
Branch 'update-from-4-to-6-digits-of-precision-in-metrics-per-ogb-guideline' set up to track remote branch 'update-from-4-to-6-digits-of-precision-in-metrics-per-ogb-guideline' from 'origin'.
Switched to a new branch 'update-from-4-to-6-digits-of-precision-in-metrics-per-ogb-guideline'


In [None]:
!chmod +x install_dependencies.sh
!./install_dependencies.sh

Looking in links: https://pytorch-geometric.com/whl/torch-2.2.1+cu121.html
Collecting torch-scatter
  Downloading https://data.pyg.org/whl/torch-2.2.0%2Bcu121/torch_scatter-2.1.2%2Bpt22cu121-cp310-cp310-linux_x86_64.whl (10.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.9/10.9 MB[0m [31m51.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: torch-scatter
Successfully installed torch-scatter-2.1.2+pt22cu121
Looking in links: https://pytorch-geometric.com/whl/torch-2.2.1+cu121.html
Collecting torch-sparse
  Downloading https://data.pyg.org/whl/torch-2.2.0%2Bcu121/torch_sparse-0.6.18%2Bpt22cu121-cp310-cp310-linux_x86_64.whl (5.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.0/5.0 MB[0m [31m48.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch-sparse
Successfully installed torch-sparse-0.6.18+pt22cu121
Collecting torch-geometric
  Downloading torch_geometric-2.5.3-py3-none-any.whl (1.1 MB)
[2K     [9

In [None]:
!python main_gin.py --random_seed 0

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
Downloading http://snap.stanford.edu/ogb/data/graphproppred/csv_mol_download/hiv.zip
Downloaded 0.00 GB: 100% 3/3 [00:00<00:00, 12.95it/s]
Extracting dataset/hiv.zip
Processing...
Loading necessary files...
This might take a while.
Processing graphs...
100% 41127/41127 [00:00<00:00, 105789.34it/s]
Converting graphs into PyG objects...
100% 41127/41127 [00:01<00:00, 29768.76it/s]
Saving...
Done!
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 71.28it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 231.08it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 132.17it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 132.54it/s]
New best validation score: 0.548360033313737 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.631918 (rocauc), Valid: 0.548360 (rocauc), Te

In [None]:
!python main_gin.py --random_seed 1

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 70.84it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 220.96it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.69it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.11it/s]
New best validation score: 0.5971426121889084 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.596869 (rocauc), Valid: 0.597143 (rocauc), Test: 0.502619 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 94.18it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 228.53it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 226.58it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 232.43it/s]
New best validation score: 0.6858036939055457 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.702759 (rocauc), 

In [None]:
!python main_gin.py --random_seed 2

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:16<00:00, 63.32it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 208.87it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 125.02it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 122.82it/s]
New best validation score: 0.5952289094650206 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.621742 (rocauc), Valid: 0.595229 (rocauc), Test: 0.568289 (rocauc)
Training batch: 100% 1029/1029 [00:12<00:00, 84.80it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 216.46it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 189.87it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 211.20it/s]
New best validation score: 0.6520184205369391 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.670385 (rocauc), 

In [None]:
!python main_gin.py --random_seed 3

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 71.39it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 223.50it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.65it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.55it/s]
New best validation score: 0.5596401626494218 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.629628 (rocauc), Valid: 0.559640 (rocauc), Test: 0.485307 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 94.33it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 229.90it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 224.47it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 233.05it/s]
Dataset ogbg-molhiv, Epoch: 2, Train: 0.687667 (rocauc), Valid: 0.559141 (rocauc), Test: 0.608542 (rocauc)
Training batch: 100% 1029/1029 [00:10<0

In [None]:
!python main_gin.py --random_seed 4

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 70.09it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 222.54it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 129.46it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 127.28it/s]
New best validation score: 0.6440451695081324 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.573958 (rocauc), Valid: 0.644045 (rocauc), Test: 0.568595 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 93.82it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 229.49it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 228.13it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 228.58it/s]
New best validation score: 0.6469723691945914 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.708294 (rocauc), 

In [None]:
!python main_gin.py --random_seed 5

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 68.80it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 219.53it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 126.49it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 126.94it/s]
New best validation score: 0.6342347638643935 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.665258 (rocauc), Valid: 0.634235 (rocauc), Test: 0.610234 (rocauc)
Training batch: 100% 1029/1029 [00:11<00:00, 90.91it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 223.93it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 211.31it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 226.54it/s]
New best validation score: 0.6595507544581618 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.689981 (rocauc), 

In [None]:
!python main_gin.py --random_seed 6

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 70.04it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 224.23it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 128.42it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 128.63it/s]
New best validation score: 0.6725149421908682 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.671991 (rocauc), Valid: 0.672515 (rocauc), Test: 0.589289 (rocauc)
Training batch: 100% 1029/1029 [00:11<00:00, 90.41it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 213.88it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 225.44it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 226.74it/s]
New best validation score: 0.7137177273172643 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.706749 (rocauc), 

In [None]:
!python main_gin.py --random_seed 7

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 69.24it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 220.43it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 126.87it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 126.02it/s]
New best validation score: 0.6461334019204389 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.631106 (rocauc), Valid: 0.646133 (rocauc), Test: 0.604656 (rocauc)
Training batch: 100% 1029/1029 [00:11<00:00, 91.79it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 227.30it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 226.63it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 212.28it/s]
New best validation score: 0.7059817754262199 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.659924 (rocauc), 

In [None]:
!python main_gin.py --random_seed 8

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 70.04it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 220.54it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.33it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.45it/s]
New best validation score: 0.6064569860866157 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.646691 (rocauc), Valid: 0.606457 (rocauc), Test: 0.559667 (rocauc)
Training batch: 100% 1029/1029 [00:11<00:00, 93.17it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 224.06it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 225.42it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 231.87it/s]
Dataset ogbg-molhiv, Epoch: 2, Train: 0.652216 (rocauc), Valid: 0.569910 (rocauc), Test: 0.508262 (rocauc)
Training batch: 100% 1029/1029 [00:10<0

In [13]:
!python main_gin.py --random_seed 9

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 69.95it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 224.11it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 129.17it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.35it/s]
New best validation score: 0.6133003870272389 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.622336 (rocauc), Valid: 0.613300 (rocauc), Test: 0.543469 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 94.78it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 229.11it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 228.98it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 226.29it/s]
New best validation score: 0.6894902508328434 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.732901 (rocauc), 

In [15]:
import torch
test_rocaucs = torch.tensor([0.793741, 0.798843, 0.780251, 0.791958, 0.798700, 0.774121, 0.764565, 0.768452, 0.776000, 0.788814])
print(f"test rocauc mean: {test_rocaucs.mean():.6f}, test rocauc std: {test_rocaucs.std():.6f}")
valid_rocaucs = torch.tensor([0.792270, 0.808428, 0.810634, 0.790926, 0.802656, 0.805311, 0.807013, 0.804723, 0.788810, 0.798743])
print(f"valid rocauc mean: {valid_rocaucs.mean():.6f}, valid rocauc std: {valid_rocaucs.std():.6f}")

test rocauc mean: 0.783544, test rocauc std: 0.012520
valid rocauc mean: 0.800951, valid rocauc std: 0.007822
