# A Gentle Introduction to Graph Neural Networks
This notebook follows https://distill.pub/2021/gnn-intro/ 

! A `Graph` represents the relations (_edges_) between a collection of entities (_nodes_).  
> A Graph has
- $V$ vertex (node identity)
- $E$ edge (directions)
- $U$ global (master node)

! Information in the form of scalars or embeddings can be sotred at each graph node or edge

> Types of graphs:
- Directed
- Undirected

> Bad examples of graphs:
- Images (pixes as node, with 8 connecttions to neighbours)
- Texts (directed graph, connections between adjacent words)

> Adjecency matrix is $n_{\rm nodes}\times n_{\rm nodes}$. Non-zero, where nodes have an edge.  
- For an image as a graph, this matrix is semi-diagonal __banded structure__ in adjacency matrix. 
- For text adjacency is diagonal (one connection) 

Graphs are __redundant__ when data structure is _regular_ (pixes, words are regular).  
Most __usefull__ when data is _heterogeneously_ structured (i.e., when number of neighbours is different for various nodes)  

> Good examples of graphs:
- Moleculas (adjecency matrix while symmetric is not banded anymore)
- Social networks as graphs (adjecency matrix _is not_ identical)
- Citation networks as graphs (why cited whome) (directed graph)
- In CV taged objects in a scene may be used as nodes in a graph
- ML models, math equations... (see _dataflow graph_)

> Tasks on graphs (solved with GNNs)
- Graph-level (property of the _entire_ graph aka image classification or sentiment analysis)
- Node-level (property if a node aka image segmentation (a role of a pixel in the image) or word-level anlaysis of speach)
- Edge-level (what nodes share an edge, and what is the property of the edge, aka image scene understanding)

> Information on a Graph
- Nodes (e.g., with node feature matrix $N$ where each node has an index)
- Edges (non-trivial, varaible number, space-inefficient, non-unique).  
__Adjacency lists__ a possible solution. Each connectivity is desctibed with a _tuple_ (i,j). This represenation is _permutation invariant_.  
- Global-context
- Connectivity

! For each graph/edge/node there exists a vector to complete the tensor

> A GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances). 

### Example (neglecting massege passing)

Example of method of construction: _message passing neural network_ using _Graph Nets_ architecture.  
Model accepts the graph as an input loaded into nodes, edges, global context and progressively transform these embeddings _without_ changing the connectivity.  
Graph in; graph out.  

! A GNN is using a _separate_ multilayer perceptron (MLP) $f$ on _each_ component of the graph.  
For each node and edge vecotors, $f$ is applied:
$$
U_n - f_{U_n} - U_{n+1} \\
V_n - f_{U_n} - V_{n+1} \\
E_n - f_{U_n} - E_{n+1} 
$$
via a _graph independent layer_. 
The result is a new graph with changed properties but __not changing__ the _adjecency list_ in this case. 

Exchange information between nodes and edges can be done with __pooling__ $\rho$ which is
- Gather all embeddings for each item (that is to be pooled) and concatenate into ! matrix
- Agregade the embeddings via e.g., sum operation

$$
\begin{cases}
V_{n} \\
E_{n}
\end{cases}
- \rho_{E_n-V_n}
- C_{v_{n}}
- \text{node prediction}
$$

where $\rho$ is a pooling operation and $C$ is a final classification operation

! Global features are predicted by performing `Global Average Pooling` in CNNs

! Key of a GNN is a way to pass information. Pooling is one a the ways. 

> Pooling _indie_ the GNN layer allows to use connectivity information when learning

This is done by _massege passing_, where neighbouring nodes/edges echange information and influence each other's updated embeddings as:
- Collect all _neighbouring_ node embeddings (or _messeges_) with function $g()$
- aggregate all messeges with an aggredate function (e.g., sum)
- _all_ pooled messeges are then passed through an _update function_ e.g., a NN.  

> This is the simpleset messege passing in GNN layer (collect neighbours, aggregate/pool, update)

! Messege passing is akin convolution -- both are aggregating information 

! Stacking layers with information passing allows to aggregate the information in the entire network

$$
U_n - f_{U_n} - U_{n+1} \\
V_n  - \rho_{V_n-V_n}- f_{U_n} - V_{n+1} \\
E_n - f_{U_n} - E_{n+1} 
$$

where $\rho$ is a pooling function


