Explanation and Setup for the Code of Group 6 for Sheet 4

Requirements

Use Python 3.12, other versions have not been tested and are thus not necessarily working.

Install the requirements either in a conda environment or in a virtualenv. The latter can be done like this:

BASH

#ON LINUX
.../group6$ python -m venv venv
.../group6$ venv/bin/activate
.../group6$ python -m pip install -r requirements.txt

BATCH/CMDLine

::ON WINDOWS cmdline, NOT powershell
...\group6> python -m venv venv
...\group6> .\venv\Scripts\activate.bat
:: in powershell use `.\venv\Scripts\Activate.ps1` instead
...\group6> python -m pip install -r requirements.txt

How to run

How to run code for Ex.3 & 4:

::ON WINDOWS cmdline
...\group6> python src/main.py --task <TASK> --dataset <DATASET>

or

# ON LINUX
.../group6$ python src/main.py --task <TASK> --dataset <DATASET>

where <TASK> is either node or link for Ex.3 or 4 respectively, and
where <DATASET> is one of Cora or Citeseer for Ex.3, and Facebook or PPI for Ex.4.

By default node classification runs for p = 1.0, q = 0.1. Other values for p, q (as well as for other parameters) can be set independendly for Cora & Citeseer by (de)commenting their resp. dictionaries in function main() in node_class.py.

All files in src are clearly named accoring to their respective exercises and can be executed independendly to yield sample results.

Chosen Hyperparameters

Ex. 3

For Exercise 3, the following hyperparameters were used for each dataset:
We used an HPO to find these.

Dataset	sched	C	batch_size	delta	dim	l	l_ns	lr	n_epochs	p	q
Cora	plateau	98.533	8726	0.005616	128	5	5	0.006572	250	1	0.1
CiteSeer	linear	48.541	9742	0.00001324	128	5	5	0.0968	200	1	0.1

Ex. 4

For Exercise 4, the following hyperparameters were used for each dataset:
These were discovered by good intuition after the HPO for task 3.

Dataset	sched	C	batch_size	delta	dim	l	l_ns	lr	n_epochs	p	q
Facebook	-	-	2000	-	128	5	5	0.01	100	1.0	1.0
PPI	-	-	2000	-	128	5	5	0.01	100	1.0	1.0

The values not given were left at their default values.

Results

Ex. 3

Mean ± StD of Accuracy (rounded in %)

Dataset ↓ , p,q →	1.0, 0.1	0.1, 1.0	1.0, 1.0
Cora	85.3 ± 1.93	85.78 ± 1.73	84.38 ± 2.42
Citeseer	63.56 ± 2.0	59.78 ± 2.31	63.22 ± 2.62

Ex. 4

Mean ± StD (rounded in %) for p = q = 1.0

Dataset	Accuracy	ROC-AUC
Facebook	97.7 ± 1.29	97.78 ± 1.28
PPI	86.79 ± 4.4	86.8 ± 4.34

Discussion

For Citeseer finding good hyperparameters was difficult, which is why we ran an HPO for that. For Cora, the results are much better than the requested threshold, whereas for Citeseer, it is relatively tight.

For link prediction, the results are very good, reaching high ROC-AUC scores and accuracies well above the requested thresholds. The hyperparameters were relatively easy to find.

Yet it must be noted that the computation of the tensor of Hadamard products XX is rather memory-inefficient and may thus lead to memory issues, e.g. RAM spillovers and associated slowdowns. We tried to fix this issue by finding a more elegant way of mapping/indexing edges to XX but have yet to iron things out.

We initially had trouble reaching the desired thresholds until we ran the HPO for Ex.3 and intuited better hyperparameters for Ex.4, e.g. more and much larger batches.

For Ex.3 we again have wandb reports:
Cora Report Citeseer Report

Conclusion

This task was more successful than the last ones in achieving the desired results. Accounting for said memory issues, the implementation itself as well as the HPO were relatively fast.

There were however some ambiguities in the exercise: For example whether w & w' in the sum in the denominator of the loss function should be interpreted as sets or sequences, i.e. if they may contain repeated nodes. Moreover all the graphs - except that for Facebook - contained connected components with less than two edges - e.g. singular nodes with self-loops - which thus could not satisfy the connectivity conditions set forth for edge sampling in link prediction.

Luckily though, none of these issues seemed to lead to much of a performance loss. Likely because (a) the computed loss function values are still similar enough for both the set & sequence interpretation of w & w' in the denominator, and (b) because these too small connected components are negligable compared to the one or few largest ones.

Note on Exercise Split

David laid much of the groundwork for random walks (Ex.1), node2vec (Ex.2), node classification (Ex.3) & link prediction (Ex.4). Benedict greatly improved upon David's code by making it faster and adding more parallelization, especially for random walks. He also did the HPO for Ex.3, intuited good hyperparameters for Ex.4 and wrote main.py. Moreover for Ex.4 Benedict improved upon the edge sampling by introducing the building of spanning trees to avoid the removal of connecting edges within train. & eval. edge sets. Ahmet developed all his code side-by-side, yielding some results of his own. Benedict cleaned and submitted his forked code in the end.

Later Improvements

On david/sheet4 the memory issue for the computation of XX in link prediction was resolved simply by only taking the Hadamard products over the sampled edges rather than the complete graph. However for some unforseen reason the same issue could not yet be resolved for Benedict's version on main while maintaining the previous performance. We can only guess that it has something to do with the different edge sampling strategies used in the versions.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
datasets		datasets
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explanation and Setup for the Code of Group 6 for Sheet 4

Requirements

How to run

How to run code for Ex.3 & 4:

Chosen Hyperparameters

Ex. 3

Ex. 4

Results

Ex. 3

Ex. 4

Discussion

Conclusion

Note on Exercise Split

Later Improvements

About

Releases

Packages

Contributors 3

Languages

vantahawk/Graph_Learning_Group6

Folders and files

Latest commit

History

Repository files navigation

Explanation and Setup for the Code of Group 6 for Sheet 4

Requirements

How to run

How to run code for Ex.3 & 4:

Chosen Hyperparameters

Ex. 3

Ex. 4

Results

Ex. 3

Ex. 4

Discussion

Conclusion

Note on Exercise Split

Later Improvements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages