# L15d: Graph Classification with Graph Neural Networks (GNNs)
Fill me in.

___

## Task 1: Setup, Data, Prerequisites
We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. 
* The `Include.jl` file also loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem. It checks for a `Manifest.toml` file; if it finds one, packages are loaded. Other packages are downloaded and then loaded.

In [1]:
include("Include.jl");

### Data
One common task for graph neural networks is the graph classification problem, e.g., a molecular property prediction task, in which _molecules are represented as graphs_, and the task may be to infer whether a molecule inhibits HIV virus replication or not.

The [TU Dortmund University](https://www.tu-dortmund.de/en/) has collected a wide range of different graph classification datasets, known as the [TUDatasets](https://chrsmrrs.github.io/datasets/), which are [accessible via the  `MLDatasets.jl` package](https://juliaml.github.io/MLDatasets.jl/stable/). Let's load and inspect one of the smaller ones, the __MUTAG dataset__:

In [2]:
dataset = TUDataset("MUTAG"); # download the dataset

_What's in the the `dataset` variable?_ This dataset provides _188 different graphs_, and the task is to classify each graph into _one out of two classes_ (binary classification).

By inspecting the first graph object of the dataset, we can see that it comes with **17 nodes** and **38 edges**.
It also comes with exactly **one graph label**, and provides additional node labels (7 classes) and edge labels (4 classes).
However, for the sake of simplicity, we will not make use of edge labels.

We have some useful utilities for working with graph datasets, *e.g.*, we can shuffle the dataset and use the first 150 graphs as training graphs, while using the remaining ones for testing:

In [3]:
graphs = mldataset2gnngraph(dataset)

graphs = [GNNGraph(g, 
					ndata=Float32.(onehotbatch(g.ndata.targets, 0:6)),
					edata=nothing) 
			for g in graphs]

shuffled_idxs = randperm(length(graphs))
train_idxs = shuffled_idxs[1:150]
test_idxs = shuffled_idxs[151:end]
train_graphs = graphs[train_idxs]
test_graphs = graphs[test_idxs]
ytrain = onehotbatch(dataset.graph_data.targets[train_idxs], [-1, 1])
ytest = onehotbatch(dataset.graph_data.targets[test_idxs], [-1, 1]);

## Task 2: Setup mini-batching of graphs

Since graphs in graph classification datasets are usually small, a good idea is to **batch the graphs** before inputting them into a Graph Neural Network to guarantee full GPU utilization.
In the image or language domain, this procedure is typically achieved by **rescaling** or **padding** each example into a set of equally-sized shapes, and examples are then grouped in an additional dimension.
The length of this dimension is then equal to the number of examples grouped in a mini-batch and is typically referred to as the `batchsize`.

However, for GNNs the two approaches described above are either not feasible or may result in a lot of unnecessary memory consumption.
Therefore, GNN.jl opts for another approach to achieve parallelization across a number of examples. Here, adjacency matrices are stacked in a diagonal fashion (creating a giant graph that holds multiple isolated subgraphs), and node and target features are simply concatenated in the node dimension (the last dimension).

This procedure has some crucial advantages over other batching procedures:

1. GNN operators that rely on a message passing scheme do not need to be modified since messages are not exchanged between two nodes that belong to different graphs.

2. There is no computational or memory overhead since adjacency matrices are saved in a sparse fashion holding only non-zero entries, *i.e.*, the edges.

GNN.jl can **batch multiple graphs into a single giant graph** with the help of `collate` option of `DataLoader` that implicitly calls `Flux.batch` on the data:

In [4]:
train_loader = DataLoader((train_graphs, ytrain), batchsize=64, shuffle=true, collate=true)
test_loader = DataLoader((test_graphs, ytest), batchsize=10, shuffle=false, collate=true)

4-element DataLoader(::Tuple{Vector{GNNGraph{Tuple{Vector{Int64}, Vector{Int64}, Nothing}}}, OneHotArrays.OneHotMatrix{UInt32, Vector{UInt32}}}, batchsize=10, collate=Val{true}())
  with first element:
  (GNNGraph{Tuple{Vector{Int64}, Vector{Int64}, Nothing}}, 2×10 OneHotMatrix(::Vector{UInt32}) with eltype Bool,)

## Task 3: Define and Train a GNN model for the Graph Classification task
In this task, we will define a GNN model for the graph classification task, train the model on the training set, and evaluate it on the test set.

Let's start by setting some constants for the model, such as the number of input features, the number of hidden features, and the number of output classes. See the comments next to the constants for more details on what they mean, permissble values, etc.

In [5]:
nin = 7 # dimension of the node feature vectors  
nout = 2 # output dimension for the system
nh = 64 # number of hidden units

64

Next, lets define and train the GNN model. We will use a [custom `MyCustomConvolutionLayerModel` layer](src/Types.jl). You complete me.

In [None]:
mymodel = let
	
	Flux.@layer MyCustomConvolutionLayerModel

	inputlayer = MyCustomConvolutionLayerModel(nin => nh, relu);
	hiddenlayer = MyCustomConvolutionLayerModel(nh => nh, relu);
	outputlayer = MyCustomConvolutionLayerModel(nh => nh);

	model = GNNChain(inputlayer,
				hiddenlayer,
				outputlayer,
				GlobalPool(mean), # what is this doing?
				Dropout(0.5), # what is this doing?
				Dense(nh, nout))
				
	train!(model)
	model; # return the trained model
end;

# epoch = 0
train = (loss = 0.5919, acc = 50.0)
test = (loss = 0.7201, acc = 50.0)
# epoch = 10
train = (loss = 0.4301, acc = 51.0)
test = (loss = 0.5649, acc = 50.0)
# epoch = 20
train = (loss = 0.3357, acc = 57.67)
test = (loss = 0.5356, acc = 56.58)
# epoch = 30
train = (loss = 0.2771, acc = 80.33)
test = (loss = 0.4974, acc = 72.37)
# epoch = 40
train = (loss = 0.2529, acc = 84.67)
test = (loss = 0.3615, acc = 77.63)
# epoch = 50
train = (loss = 0.2359, acc = 90.33)
test = (loss = 0.4466, acc = 76.32)
# epoch = 60
train = (loss = 0.2006, acc = 87.67)
test = (loss = 0.4276, acc = 80.26)
# epoch = 70
train = (loss = 0.243, acc = 85.67)
test = (loss = 0.4435, acc = 82.89)
# epoch = 80
train = (loss = 0.1659, acc = 89.67)
test = (loss = 0.3884, acc = 82.89)
# epoch = 90
train = (loss = 0.175, acc = 92.0)
test = (loss = 0.5314, acc = 75.0)
# epoch = 100
train = (loss = 0.1706, acc = 92.0)
test = (loss = 0.4012, acc = 85.53)
# epoch = 110
train = (loss = 0.1439, acc = 91.67)
test = (loss