## 消息传递网络

&emsp;&emsp;将卷积神经网络中的“卷积算子”应用到图上面，核心在于neighborhood aggregation机制，或者说是message passing的机制。Aggregate Neighbours，核心思想在于基于局部网络连接来生成Node embeddings（Generate node embeddings based on local network neighborhoods）。如下面这个图：

<img src="../../images/12-graph_message_pass.png" width="60%">


&emsp;&emsp;例如图中节点A的embedding决定于其邻居节点$\{B, C, D\}$，而这些节点又受到它们各自的邻居节点的影响。

&emsp;&emsp;图中的“黑箱”可以看成是整合其邻居节点信息的操作，它有一个很重要的属性——其操作应该是顺序（order invariant）无关的，如求和、求平均、求最大值这样的操作，可以采用神经网络来获取。

&emsp;&emsp;Layer-0节点$\mu$的embedding是其节点特征向量$x_{\mu}$。例如Layer-1中B节点的mebedding就由Layer-0中节点A的特征向量$x_{A}$和节点$C$的特征向量$X_{C}$经过神经网络计算得到:

<img src="../../images/12-graph_message_pass_1.png" width="60%">


&emsp;&emsp;也就是说，对于第$k$层的节点$i$来说，它的特征向量$x_{i}^{(k)}$就是:

$$
\mathbf{x}_{i}^{(k)}=\gamma^{(k)}\left(\mathbf{x}_{i}^{(k-1)}, \square_{j \in \mathcal{N}(i)} \phi^{(k)}\left(\mathbf{x}_{i}^{(k-1)}, \mathbf{x}_{j}^{(k-1)}, \mathbf{e}_{j, i}\right)\right)
$$

&emsp;&emsp;其中$x_{i}^{(k-1)} \in \mathbb{R}^{D}$是节点$i$第$k-1$层的特征向量；$e_{j, i} \in \mathbb{R}^{D}$为从节点$j$到节点$i$的边的特征向量，$\square$为一个可微的、置换不变的函数，表示聚合`aggregation`函数；$\gamma$表示`update`函数；$\phi$message函数，为其他的可微函数，比如像多层感知机`MLP`等。

## 消息传递基类



&emsp;&emsp;`PyTorch Geometric`提供了基本类`MessagePassing`，可以实现上述的图神经网络，来实现消息传递或消息聚集:

1. `MessagePassing(aggr="add", flow="source_to_target", node_dim=-2)`: 

    - aggr: 指定采用的置换不变函数，默认是add，可以定义为add、mean、max和None。
    - flow: 指定信息传递的反向，默认是source_to_target，还可以设置为target_to_source。

2. `MessagePassing.propagate(edge_index, size=None, **kwargs)`: 

3. `MessagePassing.message(...)`:

    - 这个函数定义了对于每个节点对$(x_{i}, x_{j})$，怎样生成信息(message)。

4. `MessagePassing.update(aggr_out, ...)`:

    - 这个函数利用聚合好的信息(message)更新每个节点的embedding。

## 实战GCN层

&emsp;&emsp;`GCN`在数学上的定义为:

$$
\mathbf{x}_{i}^{(k)}=\sum_{j \in \mathcal{N}(i) \cup\{i\}} \frac{1}{\sqrt{\operatorname{deg}(i)} \cdot \sqrt{\operatorname{deg}(j)}} \cdot\left(\boldsymbol{\Theta} \cdot \mathbf{x}_{j}^{(k-1)}\right)
$$

&emsp;&emsp;通过权值矩阵对相邻节点特征$\boldsymbol{\Theta}$进行变换，按照两个节点$i$和$j$的度进行标准化，然后求和，得到这一层节点$i$的`embedding`向量。

1. **Add self-loops to the adjacency matrix**：主要通过`torch_geometric.utils.add_self_loops`方法实现。这一步相当于是对邻接矩阵的预处理，即增加节点的自身循环。

<img src="../../images/12-graph_example1.png" width="40%">


In [9]:
import torch
from torch_geometric.data import Data
# 由于是无向图，因此有 4 条边：(0 -> 1), (1 -> 0), (1 -> 2), (2 -> 1)
edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
# 节点的特征                           
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

print("x.size: ")
print(x.size())

print("original edge_index")
print(edge_index)

from torch_geometric.utils import add_self_loops, degree
edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))

print("new edge_index")
print(edge_index)

x.size: 
torch.Size([3, 1])
original edge_index
tensor([[0, 1, 1, 2],
        [1, 0, 2, 1]])
new edge_index
tensor([[0, 1, 1, 2, 0, 1, 2],
        [1, 0, 2, 1, 0, 1, 2]])


2. **Linearly transform node feature matrix**: 第二步是对节点的特征矩阵进行线性变换。主要通过一个线性层`torch.nn.Linear`实现。

3. **Compute normalization coefficients**: 第三步是对变换后的节点特征进行标准化。节点的度可以通过`torch_geometric.utils.degree`实现。

In [10]:
import torch
from torch_geometric.utils import add_self_loops, degree

x = torch.tensor([[-1], [0], [1]], dtype=torch.float)
edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)

print("original edge_index ")
print(edge_index)

edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))
print("new edge_index")
print(edge_index)

row, col = edge_index
print("row, col is :")
print(row, col)

deg = degree(col, x.size(0), dtype=x.dtype)
print("deg is :")
print(deg)

deg_inv_sqrt = deg.pow(-0.5)

print("deg_inv_sqrt is : ")
print(deg_inv_sqrt)

print("deg_inv_sqrt row is : ")
print(deg_inv_sqrt[row])

print("deg_inv_sqrt col is :")
print(deg_inv_sqrt[col])

original edge_index 
tensor([[0, 1, 1, 2],
        [1, 0, 2, 1]])
new edge_index
tensor([[0, 1, 1, 2, 0, 1, 2],
        [1, 0, 2, 1, 0, 1, 2]])
row, col is :
tensor([0, 1, 1, 2, 0, 1, 2]) tensor([1, 0, 2, 1, 0, 1, 2])
deg is :
tensor([2., 3., 2.])
deg_inv_sqrt is : 
tensor([0.7071, 0.5774, 0.7071])
deg_inv_sqrt row is : 
tensor([0.7071, 0.5774, 0.5774, 0.7071, 0.7071, 0.5774, 0.7071])
deg_inv_sqrt col is :
tensor([0.5774, 0.7071, 0.7071, 0.5774, 0.7071, 0.5774, 0.7071])


4. 规范化节点特征$\phi$

5. 聚合相邻节点特征("add"聚合)。

&emsp;&emsp;前面三步是`message passing`之前的预操作，第四、第五步可以采用`MessagePassing`类里面的方法完成。

&emsp;&emsp;完整的代码如下所示:


In [11]:
import torch
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degree

class GCNConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super().__init__(aggr='add')
        self.lin = torch.nn.Linear(in_channels, out_channels)
        
    def forward(self, x, edge_index):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]
        
        # step 1: Add self-loops to the adjacency matrix.
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))
        print("edge_index:")
        print(edge_index)
        
        # step 2: Linearly transform node feature matrix
        print('x_pre\n', x)
        x = self.lin(x)
        print('x_aft\n', x)
        
        # step3: Compute normalization
        row, col = edge_index
        print("row, col \n", row, col)
        
        deg = degree(col, x.size(0), dtype=x.dtype)
        print('deg\n', deg)
        
        deg_inv_sqrt = deg.pow(-0.5)
        print('deg_inv_sqrt\n', deg_inv_sqrt)
        
        deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
        print('deg_inv_sqrt[row]', deg_inv_sqrt[row])
        
        norm = deg_inv_sqrt[row] * deg_inv_sqrt[col]
        
        # step5: Start propagating messages.
        return self.propagate(edge_index, x=x, norm=norm)
    
    def message(self, x_j, norm):
        # x_j has shape [E, out_channels]
        
        # Step 4: Normalize node features
        print('x_j\n', x_j)
        return norm.view(-1, 1) * x_j

&emsp;&emsp;我们建立的这个神经网络模型`GCNConv`继承于基础类`MessagePassing`，并且采用求和函数作为$\square$函数，通过`super(GCNConv, self).__init__(aggr='add')`来初始化。在完成`1-3`步之后，调用`MessagePassing`中的`propagate()`方法来完成`4-5`步，进行信息传播。

&emsp;&emsp;`message`函数用于对节点的邻居节点的信息进行标准化。


In [12]:
x = torch.tensor(torch.rand(3,2), dtype=torch.float)
edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
conv = GCNConv(2, 4)
out_put = conv(x, edge_index)

edge_index:
tensor([[0, 1, 1, 2, 0, 1, 2],
        [1, 0, 2, 1, 0, 1, 2]])
x_pre
 tensor([[0.0962, 0.8522],
        [0.2487, 0.9627],
        [0.8714, 0.4037]])
x_aft
 tensor([[ 0.6273, -0.1826, -0.1128, -0.2466],
        [ 0.5738, -0.1906, -0.0918, -0.4002],
        [ 0.3286, -0.3220,  0.2442, -0.4737]], grad_fn=<AddmmBackward>)
row, col 
 tensor([0, 1, 1, 2, 0, 1, 2]) tensor([1, 0, 2, 1, 0, 1, 2])
deg
 tensor([2., 3., 2.])
deg_inv_sqrt
 tensor([0.7071, 0.5774, 0.7071])
deg_inv_sqrt[row] tensor([0.7071, 0.5774, 0.5774, 0.7071, 0.7071, 0.5774, 0.7071])
x_j
 tensor([[ 0.6273, -0.1826, -0.1128, -0.2466],
        [ 0.5738, -0.1906, -0.0918, -0.4002],
        [ 0.5738, -0.1906, -0.0918, -0.4002],
        [ 0.3286, -0.3220,  0.2442, -0.4737],
        [ 0.6273, -0.1826, -0.1128, -0.2466],
        [ 0.5738, -0.1906, -0.0918, -0.4002],
        [ 0.3286, -0.3220,  0.2442, -0.4737]], grad_fn=<IndexSelectBackward>)


  x = torch.tensor(torch.rand(3,2), dtype=torch.float)


<img src="../../images/12-graph_example_message_pass_1.png" width="60%">

## 实现边缘卷积



&emsp;&emsp;边卷积层的数学定义如下：

$$
x_{i}^{(k)}=\max _{j \in N(i)} h_{\Theta}\left(x_{i}^{(k-1)}, x_{j}^{(k-1)}-x_{i}^{(k-1)}\right)
$$

&emsp;&emsp;其中$h_{\Theta}$为多层感知机，类似于`GCN`，边卷积层同样继承于基类`MessagePassing`，不同在于采用`max`函数作为$\square$函数。

&emsp;&emsp;边卷积层的主要理论来自于论文[Dynamic Graph CNN for Learning on Point Clouds](https://arxiv.org/abs/1801.07829)这篇文章提出一种边卷积（`EdgeConv`）操作，来完成点云中点与点之间关系的建模，使得网络能够更好地学习局部和全局特征。

In [13]:
from torch.nn import Sequential as Seq, Linear, ReLU
from torch_geometric.nn import MessagePassing

class EdgeConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super().__init__(aggr='max') #  "Max" aggregation.
        self.mlp = Seq(Linear(2 * in_channels, out_channels),
                       ReLU(),
                       Linear(out_channels, out_channels))

    def forward(self, x, edge_index):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]

        return self.propagate(edge_index, x=x)

    def message(self, x_i, x_j):
        # x_i has shape [E, in_channels]
        # x_j has shape [E, in_channels]

        tmp = torch.cat([x_i, x_j - x_i], dim=1)  # tmp has shape [E, 2 * in_channels]
        return self.mlp(tmp)

&emsp;&emsp;边缘卷积实际上是一种动态卷积，它使用特征空间中的最近邻重新计算每一层的图。幸运的是，`PyG`带有`GPU`加速的批处理`k-NN`图生成方法，名为`torch_geometric.nn.pool.knn_graph()`：


In [14]:
from torch_geometric.nn import knn_graph

class DynamicEdgeConv(EdgeConv):
    def __init__(self, in_channels, out_channels, k=6):
        super().__init__(in_channels, out_channels)
        self.k = k

    def forward(self, x, batch=None):
        edge_index = knn_graph(x, self.k, batch, loop=False, flow=self.flow)
        return super().forward(x, edge_index)

&emsp;&emsp;在这里，`knn_graph()`计算最近邻图，它进一步用于调用的`forward()`方法`EdgeConv`。

## 参考

https://blog.csdn.net/Jenny_oxaza/article/details/107561125