## Tiny GIN for OGBG molhiv prediction - leaderboard submission specifics

Hyperparameter values used:

- num_layers: 2

- hidden_dim: 64

- dropout: 0.5

- learning_rate: 0.001

- epochs: 50

- batch_size: 32

- weight_decay: 1e-6

Trained network from scratch on CPU of Google Colab L4 instance (used L4 GPU for speed in hyperparameter search, CPU for deterministic final results),
with the following random seeds and obtained the following results:

(it will give same results if using CPU of Google Colab L4 and same version of software; see the Google Colab notebook results above; randomness affects the training process when training from scratch)

```
seed 0: 0.7923 valid, 0.7937 test
seed 1: 0.8084 valid, 0.7988 test
seed 2: 0.8106 valid, 0.7803 test
seed 3: 0.7909 valid, 0.7920 test
seed 4: 0.8027 valid, 0.7987 test
seed 5: 0.8053 valid, 0.7741 test
seed 6: 0.8070 valid, 0.7646 test
seed 7: 0.8047 valid, 0.7685 test
seed 8: 0.7888 valid, 0.7760 test
seed 9: 0.7987 valid, 0.7888 test
```

If you like Jupyter notebooks and/or Google Colab, you can check the results above by copying this notebook and run the commands to reproduce the result in Google Colab's environment (please use CPU of L4 instance to reproduce randomness exactly):

(This notebook is https://colab.research.google.com/drive/11lx7DRuEhfdRGDu1Q_oWvILGQKrP2IsP?usp=sharing )

Using `torch.mean()` and `torch.std()` to report the mean and unbiased sample standard deviation, one obtains:

```
>>> data = torch.tensor([0.7937, 0.7988, 0.7803, 0.7920, 0.7987, 0.7741, 0.7646, 0.7685, 0.7760, 0.7888])

>>> data.mean()
tensor(0.7835)

>>> data.std()
tensor(0.0125)
```

Note that the **test set performance is NEVER consulted or checked by the code in selecting the model**.
For each random seed during training the held out validation set performance is checked and the epoch with best validation set performance observed so far which did not have worse training performance than the previous best is kept as the result.

CSVs for validation and test set predictions vs ground truth will be generated as part of the script, if however you want examples for each of the seeds, I can provide upon request. A .pkl with the model weights will be generated at the end of the script which could be reused for inference. If you would like example weights, they are available upon request (or I can add them here if multiple people ask and would not want to generate on their CPU or in Google colab).

## Results

Test/validation CSVs and .pkl with weights will be generated if you copy this notebook and run one of the cells below which trains network from scratch and then tests it on the dataset (you can run multiple seeds to see variation due to randomness in different training runs).

## Warning on randomness (How to reproduce training from scratch exactly if needed)

Note even with a fixed seed, running this on a CPU-only vs using CPU of a L4 instance in Google Colab gives different random numbers (on L4 using CPU I can get same results on different machines on different runs on different days). I am tracking down the dependency that is the cause of this which is only potentially significant for tiebreaking in competitions with no central authority scoring the entries, in the meantime see results captured below and reproduce in Google Colab using L4 instance (or get something similar elsewhere). It is possible that something is getting initialized on the GPU before `.to('cpu')` is called that is not reset by `reset_parameters`, so if you want to reproduce this result exactly please use L4 instance of Google Colab with dependencies recorded above. Otherwise you may get different random numbers and a slightly different result. I have checked that I can reproduce the results below across multiple instances on multiple days.


In [None]:
!git clone https://github.com/willy-b/tiny-GIN-for-ogbg-molhiv
%cd tiny-GIN-for-ogbg-molhiv

Cloning into 'tiny-GIN-for-ogbg-molhiv'...
remote: Enumerating objects: 5, done.[K
remote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (4/4), done.[K
remote: Total 5 (delta 1), reused 5 (delta 1), pack-reused 0[K
Receiving objects: 100% (5/5), 4.63 KiB | 4.63 MiB/s, done.
Resolving deltas: 100% (1/1), done.
/content/tiny-GIN-for-ogbg-molhiv


In [None]:
!chmod +x install_dependencies.sh
!./install_dependencies.sh

Looking in links: https://pytorch-geometric.com/whl/torch-2.2.1+cu121.html
Collecting torch-scatter
  Downloading https://data.pyg.org/whl/torch-2.2.0%2Bcu121/torch_scatter-2.1.2%2Bpt22cu121-cp310-cp310-linux_x86_64.whl (10.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.9/10.9 MB[0m [31m85.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: torch-scatter
Successfully installed torch-scatter-2.1.2+pt22cu121
Looking in links: https://pytorch-geometric.com/whl/torch-2.2.1+cu121.html
Collecting torch-sparse
  Downloading https://data.pyg.org/whl/torch-2.2.0%2Bcu121/torch_sparse-0.6.18%2Bpt22cu121-cp310-cp310-linux_x86_64.whl (5.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.0/5.0 MB[0m [31m86.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch-sparse
Successfully installed torch-sparse-0.6.18+pt22cu121
Collecting torch-geometric
  Downloading torch_geometric-2.5.3-py3-none-any.whl (1.1 MB)
[2K     [9

In [None]:
!python main_gin.py --random_seed 0

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
Downloading http://snap.stanford.edu/ogb/data/graphproppred/csv_mol_download/hiv.zip
Downloaded 0.00 GB: 100% 3/3 [00:03<00:00,  1.01s/it]
Extracting dataset/hiv.zip
Processing...
Loading necessary files...
This might take a while.
Processing graphs...
100% 41127/41127 [00:00<00:00, 101664.38it/s]
Converting graphs into PyG objects...
100% 41127/41127 [00:01<00:00, 29614.23it/s]
Saving...
Done!
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 70.07it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 233.03it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 132.33it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 131.02it/s]
New best validation score: 0.548360033313737 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.6319 (rocauc), Valid: 0.5484 (rocauc), Test: 

In [None]:
!python main_gin.py --random_seed 1

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 69.18it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 222.47it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.88it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 132.22it/s]
New best validation score: 0.5971426121889084 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.5969 (rocauc), Valid: 0.5971 (rocauc), Test: 0.5026 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 94.38it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 230.58it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 231.41it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 234.80it/s]
New best validation score: 0.6858036939055457 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.7028 (rocauc), Valid: 0

In [None]:
!python main_gin.py --random_seed 2

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 70.00it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 227.32it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 129.22it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 132.92it/s]
New best validation score: 0.5952289094650206 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.6217 (rocauc), Valid: 0.5952 (rocauc), Test: 0.5683 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 95.05it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 229.64it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 210.91it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 221.67it/s]
New best validation score: 0.6520184205369391 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.6704 (rocauc), Valid: 0

In [None]:
!python main_gin.py --random_seed 3

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 69.42it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 231.17it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.71it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 129.57it/s]
New best validation score: 0.5596401626494218 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.6296 (rocauc), Valid: 0.5596 (rocauc), Test: 0.4853 (rocauc)
Training batch: 100% 1029/1029 [00:11<00:00, 93.39it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 223.31it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 225.29it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 230.79it/s]
Dataset ogbg-molhiv, Epoch: 2, Train: 0.6877 (rocauc), Valid: 0.5591 (rocauc), Test: 0.6085 (rocauc)
Training batch: 100% 1029/1029 [00:11<00:00, 93.18i

In [None]:
!python main_gin.py --random_seed 4

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 69.74it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 225.43it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 124.37it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 123.75it/s]
New best validation score: 0.6440451695081324 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.5740 (rocauc), Valid: 0.6440 (rocauc), Test: 0.5686 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 94.67it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 223.48it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 226.02it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 230.16it/s]
New best validation score: 0.6469723691945914 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.7083 (rocauc), Valid: 0

In [None]:
!python main_gin.py --random_seed 5

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 70.68it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 225.98it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 130.30it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 132.55it/s]
New best validation score: 0.6342347638643935 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.6653 (rocauc), Valid: 0.6342 (rocauc), Test: 0.6102 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 95.48it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 230.74it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 230.10it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 234.44it/s]
New best validation score: 0.6595507544581618 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.6900 (rocauc), Valid: 0

In [None]:
!python main_gin.py --random_seed 6

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 68.81it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 228.54it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 131.68it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 131.27it/s]
New best validation score: 0.6725149421908682 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.6720 (rocauc), Valid: 0.6725 (rocauc), Test: 0.5893 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 95.09it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 223.02it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 204.31it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 215.47it/s]
New best validation score: 0.7137177273172643 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.7067 (rocauc), Valid: 0

In [None]:
!python main_gin.py --random_seed 7

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 71.03it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 226.31it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 128.61it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 128.98it/s]
New best validation score: 0.6461334019204389 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.6311 (rocauc), Valid: 0.6461 (rocauc), Test: 0.6047 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 96.68it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 226.35it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 233.52it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 236.01it/s]
New best validation score: 0.7059817754262199 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.6599 (rocauc), Valid: 0

In [None]:
!python main_gin.py --random_seed 8

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 69.64it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 226.55it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 125.16it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 105.20it/s]
New best validation score: 0.6064569860866157 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.6467 (rocauc), Valid: 0.6065 (rocauc), Test: 0.5597 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 94.32it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 225.74it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 228.94it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 226.79it/s]
Dataset ogbg-molhiv, Epoch: 2, Train: 0.6522 (rocauc), Valid: 0.5699 (rocauc), Test: 0.5083 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 94.91i

In [14]:
!python main_gin.py --random_seed 9

{'device': 'cpu', 'dataset_id': 'ogbg-molhiv', 'num_layers': 2, 'hidden_dim': 64, 'dropout': 0.5, 'learning_rate': 0.001, 'epochs': 50, 'batch_size': 32, 'weight_decay': 1e-06}
parameter count: 32385
Training batch: 100% 1029/1029 [00:14<00:00, 70.76it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 229.82it/s]
Evaluation batch: 100% 129/129 [00:01<00:00, 127.62it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 132.55it/s]
New best validation score: 0.6133003870272389 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 1, Train: 0.6223 (rocauc), Valid: 0.6133 (rocauc), Test: 0.5435 (rocauc)
Training batch: 100% 1029/1029 [00:10<00:00, 96.29it/s]
Evaluation batch: 100% 1029/1029 [00:04<00:00, 229.19it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 232.94it/s]
Evaluation batch: 100% 129/129 [00:00<00:00, 233.83it/s]
New best validation score: 0.6894902508328434 (rocauc) without training score regression
Dataset ogbg-molhiv, Epoch: 2, Train: 0.7329 (rocauc), Valid: 0

In [16]:
import torch
data = torch.tensor([0.7937, 0.7988, 0.7803, 0.7920, 0.7987, 0.7741, 0.7646, 0.7685, 0.7760, 0.7888])
print(f"mean: {data.mean()}, std: {data.std()}")

mean: 0.7835499048233032, std: 0.012501309625804424
