Spatial-based ConvGNN

文章目录

Spatial-based ConvGNN

1 Convolutional networks on graphs for learning molecular fingerprints
2 Column Networks for Collective Classification
3 Learning Convolutional Neural Networks for Graphs
4 Diffusion-Convolutional Neural Networks
5 Quantum-Chemical Insights from Deep Tensor Neural Networks
6 Interaction Networks for Learning about Objects, Relations and Physics
7 Geometric deep learning on graphs and manifolds using mixture model cnns
8 Molecular Graph Convolutions: Moving Beyond Fingerprints
9 Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
10 Neural Message Passing for Quantum Chemistry
11 Inductive Representation Learning on Large Graphs
12 Robust Spatial Filtering With Graph Convolutional Neural Networks
13 Large-Scale Learnable Graph Convolutional Networks
14 Signed Graph Convolutional Network
15 GeniePath: Graph Neural Networks with Adaptive Receptive Paths
16 Fast Learning with Graph Convolutional Networks via Importance Sampling
17 Stochastic Training of Graph Convolutional Networks with Variance Reduction
18 Adaptive Sampling Towards Fast Graph Representation Learning
参考文献

1 Convolutional networks on graphs for learning molecular fingerprints

[Duvenaud D, 2015, 1] 是最早对化学上使用图表示分子的文章之一。文章从常见的circular fingerprint作为类比出发，设计了一套类似circular fingerprint的neural graph fingerprint方法，能进行end-to-end学习。

空间卷积图神经网络

2 Column Networks for Collective Classification

[Pham T, 2017, 2] 考虑了边上的信息：

contextual features ：

顶点 $i$ 的第 $r$ 个边：
$\vec{c}_{i,r}^{(t)} = \frac{1}{|\mathcal{N}_r{i}|} \sum_{j \in \mathcal{N}_r{i}} \vec{h}_j^{(t-1)}. \tag{2.1}$

卷积：

$\vec{h}_i^{(t)} = g \left( \vec{b}^{(t)} + W^{(t)} \vec{h}_i^{(t-1)} + \frac{1}{z} \sum_{r=1}^{R} V_r^{(t)} \vec{c}_{i,r}^{(t)} \right). \tag{2.2}$
其中 $z$ 是超参。在最后的第 $T$ 步：
$P(y_i = l) = \text{softmax} \left( \vec{b}_l + W_l \vec{h}_i^T \right). \tag{2.3}$

空间卷积图神经网络

此外，式（2.1）可以加权和：
$\vec{c}_{i,r}^{(t)} = \sum_{j \in \mathcal{N}_r{i}} \alpha_j \vec{h}_j^{(t-1)}. \qquad \sum_{j} \alpha_j = 1 \tag{2.4}$

原文还给出了跳跃连接：
$\begin{aligned} \vec{h}^{(t)} &= \vec{\alpha} \odot \tilde{\vec{h}}^{(t)} + (\vec{1} - \vec{\alpha}) \odot \vec{h}^{(t-1)}, \\ \vec{\alpha} &= \sigma \left( \vec{b}_{\alpha}^{(t)} + W_{\alpha}^{(t)} \vec{h}_i^{(t-1)} + \frac{1}{z} \sum_{r=1}^{R} V_{\alpha r}^{(t)} \vec{c}_{i,r}^{(t)} \right). \end{aligned} \tag{2.5}$

3 Learning Convolutional Neural Networks for Graphs

[Niepert M, 2016, 3] 提出的PATCHY-SAN在图上使用传统的CNN。为了能用上CNN，对图结构进行了规范化：固定了图顶点的数量，固定了图顶点的邻居顶点的数量：

1.根据label排序后，选择固定的 $w$ 个顶点；
2.对于 $w$ 中的每个顶点，使用BFS算法选择至少 $k$ 个邻居顶点；
3.根据最佳的排序label选择 $k$ 给邻居顶点进行图规范化。

规范化后，得到顶点、边特征向量长度分别为 $a_v,a_e$ 的两个张量 $(w,k,a_v),(w,k,k,a_e)$ 。将它们reshape成 $(wk,a_v),(wk^2,a_e)$ ，将 $a_v,a_e$ 视作通道数，在第一维度上分别做1-D卷积。

最佳的排序label ：
最佳的排序label是指，任意两个图的图距离，和由label决定的邻接矩阵距离最小：
$\hat{l} = \arg \min_{l} \mathbb{E}_{\mathcal{G}} \left[ \left| d_A \left( A^l(G), A^l(G^{'})\right) - d_G \left( G, G^{'}\right) \right| \right].$

空间卷积图神经网络

4 Diffusion-Convolutional Neural Networks

[Atwood J, 2016, 4] 根据邻接矩阵 $A$ 计算度归一化的转换矩阵 $P_t$ ，它给出了从顶点 $i$ 跳到顶点 $j$ 的概率。

$P_{t}^{*}$ 是 $P_{t} \in \reals^{N_t \times H \times N_t}$ 的幂级数和，对于顶点分类、图分类、边分类分别有 $Z \in \reals^{N_t \times H \times F}, \reals^{H \times F}, \reals^{M_t \times H \times F}$ 。 $X_t \in \reals^{N_t \times F}$

Node Classification ：

$\begin{aligned} Z_{t,ijk} &= f \left( W_{jk}^{c} \sum_{l=1}^{N_t} P_{t,ijl}^{*} X_{t,lk} \right), \\ \text{即：} Z_t &= f \left( W^{c} \odot P_{t}^{*} X_{t} \right). \end{aligned} \tag{4.1}$
其中 $W^c \in \reals^{H \times F}$ 。

输出：
$\begin{aligned} \hat{Y} &= \arg \max \left( f(W^D \odot Z)\right), \\ \mathbb{P}(Y | X) &= \text{softmax} \left( f(W^D \odot Z)\right). \end{aligned} \tag{4.2}$

Graph Classification ：

$Z_t = f \left( W^c \odot \frac{\vec{1}_{N_t}^T P_{t}^{*} X_{t}}{N_t} \right). \tag{4.3}$

Edge Classification and Edge Features ：

用
$A_t^{'} = \begin{pmatrix} A_t & B_t^T\\ B_t & 0 \end{pmatrix} \tag{4.4}$
计算 $P_t^{'}$ 代替 $P_t$ 做卷积取分类顶点或边。

空间卷积图神经网络

5 Quantum-Chemical Insights from Deep Tensor Neural Networks

[Schutt K T, 2017, 5] 聚合了顶点和边的信息。用高斯核对顶点 $V$ 构造距离矩阵 $D=(\hat{\vec{d}}_{ij})_{|V| \times |V|}$ 。记每个顶点 $v_i \in V$ 的特征向量为 $\vec{x}_i$ ，每条 $e_{ij} \in E$ 上的特征向量为 $\vec{y}_{ij}$ 表示顶点 $v_j$ 对 $v_i$ 的作用。

顶点特征向量 $\vec{x}_i$ 更新：

$\vec{x}_i^{(t+1)} = \vec{x}_i^{(t)} + \sum_{j \neq i} \vec{y}_{ij}. \tag{5.1}$

边特征向量 $\vec{y}_{ij}$ 更新：

$\vec{y}_{ij} = \tanh \left[ W^{fc} \left( \left( W^{cf} \vec{x}_j + \vec{v}^{f_1} \right) \odot \left( W^{df} \hat{\vec{d}}_{ij} + \vec{v}^{f_2} \right) \right) \right]. \tag{5.2}$

空间卷积图神经网络

6 Interaction Networks for Learning about Objects, Relations and Physics

[Battaglia P W, 2016, 6] 将图网络应用于自然科学中。IN模块为：
$\begin{aligned} IN(G) &= \phi_{O} \left( a \left( G, X, \phi_{R} \left( m(G) \right) \right) \right);\\ m(G) &= B = \{b_k\}_{k = 1, \cdots, N_R} \\ \phi_{R}(B) &= E = \{e_k\}_{k = 1, \cdots, N_R} \\ a(G,X,E) &= C = \{c_j\}_{j = 1, \cdots, N_O} \\ \phi_{O}(C) &= P = \{p_j\}_{j = 1, \cdots, N_O} \\ \end{aligned} \tag{6.1}$

记 $N_O$ 表示顶点的个数， $N_R$ 表示边（关系）的个数， $D_S,D_R$ 分别表示顶点、边的特征向量长度。那么 $O \in \reals^{D_S \times N_O}$ 表示所有顶点的特征列向量组成的矩阵， $R_a \in \reals^{D_R \times N_R}$ 表示所有边的特征列向量组成的矩阵。将顶点编号，用0-1表示接受矩阵 $R_r \in \reals^{N_O \times N_R}$ 和发射矩阵 $R_s \in \reals^{N_O \times N_R}$ 。

用多层全连接实现 $\phi_R,\phi_O$ ：
$\begin{aligned} B &= m(G) = [OR_r; OR_s; R_a] &\in \reals^{(2D_S + D_R) \times N_R} \\ E &= \phi_R(B) &\in \reals^{D_E \times N_R} \\ \bar{E} &= E R_r^T &\in \reals^{D_E \times N_O} \\ C &= a(G,X,E) = [O; X; \bar{E}] &\in \reals^{(D_S + D_X + D_E) \times N_O} \\ P &= \phi_O(C) &\in \reals^{D_P \times N_O} \\ \end{aligned} \tag{6.2}$

7 Geometric deep learning on graphs and manifolds using mixture model cnns

[F. Monti, 2017, 7] 提出了能再流形和图上混合使用的卷积。对于每个顶点 $x$ 的邻居点 $y \in \mathcal{N}(x)$ 都用伪坐标向量 $\vec{u}(x,y)$ 将它们关联。定义由可学习参数 $\Theta$ 决定的权重函数 $\vec{w}_{\Theta}(\vec{u}) = (w_1(\vec{u}), \cdots, w_J(\vec{u}))$ ：
$D_j(x)f = \sum_{y \in \mathcal{N}(x)} w_j(\vec{u}(x,y)) f(y), j = 1, \cdots, J. \tag{7.1}$
其中 $J$ 代表提取的patch的维数。原文里 $w_j(\vec{u})$ 选的是由可学习的 $\Sigma_j,\vec{\mu}_j$ 决定的高斯核：
$w_j(\vec{u}) = \exp \left( -\frac{1}{2} \left( \vec{u} - \vec{\mu}_j \right)^T \Sigma_j^{-1} \left( \vec{u} - \vec{\mu}_j \right) \right). \tag{7.2}$
那么卷积操作为：
$\left( f \star g \right)(x) = \sum_{j=1}^{J} g_j D_j(x) f. \tag{7.3}$

8 Molecular Graph Convolutions: Moving Beyond Fingerprints

[Kearnes S, 2017, 8] 提出了图上建模的三个性质：

Property 1 (Order invariance). The output of the model should be invariant to the order that the atom and bond information is encoded in the input.
Property 2 (Atom and pair permutation invariance). The values of an atom layer and pair permute with the original input layer order. More precisely, if the inputs are permuted with a permutation operator $Q$ , then for all layers $x$ , $y$ , $A^x$ and $P^y$ are permuted with operator $Q$ as well.
Property 3 (Pair order invariance). For all pair layers $y$ , $P^y_{(a,b)} = P^y_{(b,a)}$

空间卷积图神经网络

$A^x,p^y$ 分别表示原子层（atom）、键对层（pair）的第 $x$ 、 $y$ 层的值。网络由4个主要操作组成：

( $A \rightarrow A$ ) ：

$A_a^y = f \left( A_a^{x1}, A_a^{x2}, \cdots, A_a^{xn} \right). \tag{8.1}$

( $A \rightarrow P$ ) ：

$P_{a,b}^y = g \left( f \left( A_{a}^{x}, A_{b}^{x} \right), f \left( A_{b}^{x}, A_{a}^{x} \right) \right). \tag{8.2}$

( $P \rightarrow A$ ) ：

$A_{a}^y = g \left( f \left(P_{(a,b)}^{x}\right), f \left(P_{(a,c)}^{x}\right), f \left(P_{(a,d)}^{x}\right), \cdots \right). \tag{8.3}$

( $P \rightarrow P$ ) ：

$P_{a,b}^y = f \left( P_{a,b}^{x1}, P_{a,b}^{x2}, \cdots, P_{a,b}^{xn} \right). \tag{8.4}$

其中 $f(\cdot),g(\cdot)$ 表示任意函数和任意的commutative函数。

9 Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs

[Simonovsky M, 2017, 9] 用边的特征做构造顶点的卷积权重。对于边的特征 $L(j,i) \in \reals^{s}$ :
$\Theta_{ji}^{l} = F^l(L(j,i)) \quad \in \reals^{d_l \times d_{l-1}}. \tag{9.1}$

卷积为：
$\begin{aligned} X^l(i) &= \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} F^l(L(j,i)) X^{l-1}(j) + b^l,\\ &= \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \Theta_{ji}^{l} X^{l-1}(j) + b^l. \end{aligned} \tag{9.2}$

空间卷积图神经网络

原文提到的池化方法：

1.下采样或者合并顶点
2.创建新的边缘结构 $E^{(h)}$ 并标记 $L^{(h)}$ （所谓的约简）
3.将原来的顶点映射到新的顶点上 $M^{(h)} : V^{(h-1)} \rightarrow V^{(h)}$ 。

10 Neural Message Passing for Quantum Chemistry

[Gilmer J, 2017, 10] 提出的是一个MPNN框架，将GNN分成了两个阶段：消息传播阶段和读出阶段。

message passing phase ：

消息传播阶段又由两部分组成，一是将顶点的邻居顶点和相连边的特征聚合 $M_t(\cdot)$ （message function），二是对顶点更新 $U_t(\cdot)$ （vertex update function）：
$\begin{aligned} m_v^{(t+1)} &= \sum_{w \in \mathcal{N}(v)} M_t \left( h_v^{(t)}, h_w^{(t)}, e_{vw} \right), \\ h_v^{(t+1)} &= U_t \left( h_v^{(t)}, m_v^{(t+1)} \right). \end{aligned} \tag{10.1}$

readout phase：

对图上每个顶点最后使用读出函数 $R(\cdot)$ ：
$\hat{y} = R \left( \{ h_v^{(T)} | v \in \mathcal{V}_G \}\right). \tag{10.2}$

11 Inductive Representation Learning on Large Graphs

[Hamilton W L, 2017, 11] 提出的方法叫GraphSAGE,可扩展性更强，对于节点分类和链接预测问题的表现也比较突出。同样，GraphSAGE也可以用MPNN框架描述：

消息聚合：

$h_{\mathcal{N}(v)}^{(k)} \leftarrow \text{AGGREGATE}_k \left( h_u^{(k-1)}, \forall u \in \mathcal{N}(v) \right). \tag{11.1}$

顶点特征更新：

$\begin{aligned} h_v^{(k)} & \leftarrow \sigma \left( W^{(k)} \cdot \text{CONCAT} \left( h_v^{(k-1)}, h_{\mathcal{N}(v)}^{(k)} \right)\right). \\ h_v^{(k)} & \leftarrow \frac{h_v^{(k)}}{\| h_v^{(k)} \|_2}, \quad \forall v \in \mathcal{V}. \end{aligned} \tag{11.2}$

读出函数：

$z_v \leftarrow h_v^{(K)}, \quad \forall v \in \mathcal{V}. \tag{11.3}$

空间卷积图神经网络

原文给出了三种聚合函数：

Mean aggregator：

$h_v^{(k)} \leftarrow \sigma \left( W \cdot \text{MEAN} \left( \{ h_v^{(k-1)} \} \cup \{ h_u^{(k-1)}, \forall u \in \mathcal{N}(v) \} \right) \right). \tag{11.4}$

LSTM aggregator.
Pooling aggregator：

$\text{AGGREGATE}_k^{\text{pool}} = \max \left( \left\{ \sigma \left( W_{\text{pool}} h_{u_i}^k + b \right), \forall u_i \in \mathcal{N}(v) \right\} \right). \tag{11.5}$

其中 $W_{\text{pool}}$ 是可学习的参数。

12 Robust Spatial Filtering With Graph Convolutional Neural Networks

[Such F P, 2017, 12] 提出的网络能够适用于同质或异质的图数据集。

对于不同的特征可以定义不同的邻接矩阵，将这些邻接矩阵组成 $\mathcal{A} = (A_1,\cdots, A_L) \in \reals^{N \times N \times L}$ 。

卷积核 $H \in \reals^{N \times N \times C}$ 为：
$H^{(c)} \approx \sum_{l=1}^{L} h_l^{(c)} A_l. \tag{12.1}$

卷积操作为：
$V_{\text{out}} = \sum_{c=1}^{C} H^{(c)} V_{\text{in}}^{(c)} + b. \tag{12.2}$

图嵌入池化过程。为了得到 $V_{emb} \in \reals^{N \times N^{'}}$ 使用卷积核 $H_{emb} \in \reals^{N \times N \times C \times N^{'}}$ ：
$\begin{aligned} V_{emb}^{(n^{'})} &= \sum_{c}^{C} H_{emb}^{(c,n^{'})} V_{in}^{(c)} + b, \\ V_{emb}^{*} &= \sigma (V_{emb}), \\ V_{out} &= V_{emb}^{*T} V_{in}, \\ A_{out} &= V_{emb}^{*T} A_{in} V_{emb}^{*} \end{aligned} \tag{12.3}$

空间卷积图神经网络

13 Large-Scale Learnable Graph Convolutional Networks

[Gao H, 2018, 13] 同样是在顶点特征向量上使用传统的CNN。

空间卷积图神经网络

设第 $l$ 层的图顶点特征向量长度为 $F_l$ ，每个顶点选择 $k$ 个邻居点。

卷积过程：

将每个顶点的邻居 $\mathcal{N}$ 的特征向量拼接成张量 $\reals^{|\mathcal{N}| \times F_l }$ ；
将张量 $\reals^{|\mathcal{N}| \times F_l }$ 的每列从大到小排序，然后截取前 $k$ 个，得 $\reals^{k \times F_l }$ ；
再和顶点的特征向量拼接为 $\reals^{ (k+1) \times F_l }$ ；
$CONV_1 : \reals^{ (k+1) \times F_l } \rightarrow \reals^{ (\frac{k}{2}+1) \times k }$ ；
$CONV_2 : \reals^{ (\frac{k}{2}+1) \times k } \rightarrow \reals^{ 1 \times F_{l+1} }$ 。

14 Signed Graph Convolutional Network

[Derr T, 2018, 14] 提出了在有符号的图数据上建立神经网络。在无符号的图中卷积一般是聚合邻居顶点的信息，但是由于在有符号的图上有正有负，含义不同，需要将图分成平衡路径和非平衡路径。

分解利用了平衡理论，直观的讲就是：1）朋友的朋友，就是我的朋友，2）敌人的朋友就是我的敌人。 平衡的部分包含偶数条负连接，反之，包含奇数条负链接，被认为是不平衡。见下图

空间卷积图神经网络

分解后，分别做卷积聚合，平衡部分：
$\large h_i^{B(l)} = \begin{cases} \sigma \left( W^{B(1)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{(0)}}{|\mathcal{N}_{i}^{+}|}, h_i^{(0)} \right] \right) \quad l = 1;\\ \sigma \left( W^{B(l)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{B(l-1)}}{|\mathcal{N}_{i}^{+}|}, \sum_{k \in \mathcal{N}_{i}^{-}} \frac{h_k^{U(l-1)}}{|\mathcal{N}_{i}^{-}|}, h_i^{B(l-1)} \right] \right) \quad l \neq 1. \end{cases} \tag{14.1}$

非平衡部分：
$\large h_i^{U(l)} = \begin{cases} \sigma \left( W^{U(1)} \left[ \sum_{j \in \mathcal{N}_{i}^{-}} \frac{h_j^{(0)}}{|\mathcal{N}_{i}^{-}|}, h_i^{(0)} \right] \right) \quad l = 1;\\ \sigma \left( W^{U(l)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{U(l-1)}}{|\mathcal{N}_{i}^{+}|}, \sum_{k \in \mathcal{N}_{i}^{-}} \frac{h_k^{B(l-1)}}{|\mathcal{N}_{i}^{-}|}, h_i^{U(l-1)} \right] \right) \quad l \neq 1. \end{cases} \tag{14.2}$

读出：
$z_i \leftarrow [h_i^{B(L)}, h_i^{U(L)}], \quad \forall u_i \in \mathcal{U}. \tag{14.3}$

空间卷积图神经网络

15 GeniePath: Graph Neural Networks with Adaptive Receptive Paths

[Liu Z, 2018, 15] 提出了在广度和深度自适应选择顶点经行特征聚合。

Adaptive Breadth：

基于GAT学习一跳邻居（隐）特征的重要性，决定继续探索的方向:
$h_i^{(\text{tmp})} = \tanh \left( {W^{(t)}}^{T} \sum_{j \in \mathcal{N}(i) \cup \{ i \}} \alpha \left( h_i^{(t)}, h_j^{(t)} \right) \cdot h_j^{(t)} \right). \tag{15.1}$
其中 $\alpha \left( h_i^{(t)}, h_j^{(t)} \right)$ 是注意力机制：
$\begin{aligned} \alpha (x,y) &= \text{softmax}_y \left( v^T \tanh \left( W_s^T x + W_d^T y \right) \right), \\ \text{softmax}_y(\cdot, y) &= \frac{\exp f(\cdot,y)}{\sum_{y^{'}} \exp f(\cdot, y^{'})}. \end{aligned} \tag{15.2}$

Adaptive Depth：

$\begin{aligned} i_i &= \sigma \left( {W_i^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\ f_i &= \sigma \left( {W_f^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\ o_i &= \sigma \left( {W_o^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\ \widetilde{C} &= \tanh \left( {W_c^{(t)}}^{T} h_i^{(\text{tmp})} \right),\\ C_i^{(t+1)} &= f_i \odot C_i^{(t)} + i_i \odot \widetilde{C}, \\ h_t^{(t+1)} &= o_i \odot \tanh(C_i^{(t+1)}). \end{aligned} \tag{15.3}$

空间卷积图神经网络

原文给出了一个变体：
$\begin{aligned} \mu_i^{(0)} &= W_x^T X_i,\\ i_i &= \sigma \left( {W_i^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\ f_i &= \sigma \left( {W_f^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\ o_i &= \sigma \left( {W_o^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\ \widetilde{C} &= \tanh \left( {W_c^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right),\\ C_i^{(t+1)} &= f_i \odot C_i^{(t)} + i_i \odot \widetilde{C}, \\ \mu_t^{(t+1)} &= o_i \odot \tanh(C_i^{(t+1)}). \end{aligned}$

16 Fast Learning with Graph Convolutional Networks via Importance Sampling

[Chen J, 2018, 16] 为了加快训练速度，提出了采样方法，即选取一部分顶点参与训练。

下图算法是随机采样：

空间卷积图神经网络

下图是论文中提出的采用重要性采样，原文证明能有更小的方差：

空间卷积图神经网络

其中的采样概率为：
$q(u) = \frac{\| \hat{A}(:,u) \|^2}{ \sum_{u^{'} \in \mathcal{V}} \| \hat{A}(:,u^{'}) \|^2}, \quad u \in \mathcal{V}. \tag{16.1}$

17 Stochastic Training of Graph Convolutional Networks with Variance Reduction

[Chen J, 2018, 17] 证明[Chen J, 2018, 16]的重要性采样方法在实践中并不好，因为有的顶点可能有很多采样点而另一些顶点可能没有采样点。

$\hat{A} = A + I_N, \hat{D}_{vv} = \sum_u \hat{A}_{uv}, P = \hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}}$ 。 $\hat{P}$ 是 $P$ 的无偏估计 $\hat{P}_{uv}^{(l)} = \frac{|\mathcal{N}(u)|}{D^{(l)}} P_{uv}$ ，原文将 $h_v^{(l)}$ 分解成 $h_v^{(l)} = \Delta h_v^{(l)} + \bar{h}_v^{(l)}$ 。原始的卷积中 $h_v^{(l)}$ 的递归计算导致计算量过大，[Chen J, 2018, 17]将每层得到的 $h_v^{(l)}$ 计算 $\bar{h}_v^{(l)}$ 用于保存做历史的近似。
$\begin{aligned} (PH^{(l)})_u &= \sum_{v \in \mathcal{N}(u)} P_{uv} \Delta h_v^{(l)} + \sum_{v \in \mathcal{N}(u)} P_{uv} \bar{h}_v^{(l)}, \\ &\approx \frac{|\mathcal{N}(u)|}{D^{(l)}} \sum_{v \in \hat{\mathcal{N}}(u)} P_{uv} \Delta h_v^{(l)} + \sum_{v \in \mathcal{N}(u)} P_{uv} \bar{h}_v^{(l)}. \end{aligned}$
也就是，随机选取近邻节点所得到的项是 $\Delta h_v^{(l)}$ 。

卷积为：
$Z^{(l+1)} = \left( \hat{P}^{(l)} \left( H^{(l)} - \bar{H}^{(l)} \right) + P \bar{H}^{(l)} \right) W^{(l)} \tag{17.1}$

Stochastic GCN:

随机选择一个小批量顶点集 $\mathcal{V}_B \in \mathcal{V}_L$ ;
建立一个仅包含当前小批量所需的** $h_v^{(l)}$ 和 $\bar{h}_v^{(l)}$ 的计算图；
通过等式(17.1)正向传播获得预测；
通过向后传播获取梯度，并通过SGD更新参数；
更新历史**值。

18 Adaptive Sampling Towards Fast Graph Representation Learning

[Huang W, 2018, 18]

空间卷积图神经网络

一般的卷积：
$h^{(l+1)}(v_i) = \sigma \left( \sum_{j=1}^{N} \hat{a}(v_i,u_j) h^{(l)}(u_j) W^{(l)} \right), \quad i = 1, \cdots, N. \tag{18.1}$

Node-Wise Sampling ：

将式（18.1）改写为：
$\begin{aligned} h^{(l+1)}(v_i) &= \sigma \left( \left( \sum_{j=1}^{N} \hat{a}(v_i,u_j) \right) \left( \sum_{j=1}^{N} \frac{\hat{a}(v_i,u_j)}{ \sum_{j=1}^{N} \hat{a}(v_i,u_j) } \right) h^{(l)}(u_j) W^{(l)} \right), ,\\ &\overset{N(v_i):= \sum_{j=1}^{N} \hat{a}(v_i,u_j)}{=} \sigma \left( N(v_i) \left( \sum_{j=1}^{N} \frac{\hat{a}(v_i,u_j)}{ N(v_i) } \right) h^{(l)}(u_j) W^{(l)} \right), \\ &\overset{p(u_j | v_i):= \frac{\hat{a}(v_i,u_j)}{ N(v_i) }}{=} \sigma \left( N(v_i) \left( \sum_{j=1}^{N} p(u_j | v_i) h^{(l)}(u_j) \right) W^{(l)} \right), \\ &= \sigma \left( N(v_i) \mathbb{E}_{p(u_j | v_i)} \left[ h^{(l)}(u_j) \right] W^{(l)} \right), \quad i = 1, \cdots, N.\\ \end{aligned} \tag{18.2}$

使用Monte-Carlo sampling， $\mu_p(v_i) = \mathbb{E}_{p(u_j | v_i)} \left[ h^{(l)}(u_j) \right]$ ：
$\hat{\mu}_p(v_i) = \frac{1}{n} \sum_{j=1}^{n} h^{(l)} (\hat{u}_j), \quad \hat{u}_j \sim p(u_j | v_i). \tag{18.3}$

Layer-Wise Sampling ：
同样地：
$h^{(l+1)}(v_i) = \sigma \left( N(v_i) \mathbb{E}_{q(u_j | v_1,\cdots,v_n)} \left[ \frac{p(u_j | v_i)}{q(u_j | v_1,\cdots,v_n)} h^{(l)}(u_j) \right] W^{(l)} \right), \quad i = 1, \cdots, N. \tag{18.4}$
同样Monte-Carlo sampling， $\mu_q(v_i) = \mathbb{E}_{q(u_j | v_1,\cdots,v_n)} \left[ \frac{p(u_j | v_i)}{q(u_j | v_1,\cdots,v_n)} h^{(l)}(u_j) \right]$ ：
$\hat{\mu}_q(v_i) = \frac{1}{n} \sum_{j=1}^{n} \frac{p(\hat{u}_j | v_i)}{q(\hat{u}_j | v_1,\cdots,v_n)} h^{(l)} (\hat{u}_j), \quad \hat{\mu}_j \sim q(\hat{u}_j | v_1,\cdots,v_n). \tag{18.5}$

原文给了 $q(u_j)$ 最佳取法：
$q^{*}(u_j) = \frac{ \sum_{i=1}^{n} p(u_j | v_i) | g(x(u_j)) | }{ \sum_{j=1}^{N} \sum_{i=1}^{n} p(u_j | v_i) | g(x(u_j)) | }. \tag{18.6}$
其中 $g(\cdot)$ 是线性行数。

空间卷积图神经网络

Spatial-based ConvGNN

文章目录

1 Convolutional networks on graphs for learning molecular fingerprints

2 Column Networks for Collective Classification

3 Learning Convolutional Neural Networks for Graphs

4 Diffusion-Convolutional Neural Networks

5 Quantum-Chemical Insights from Deep Tensor Neural Networks

6 Interaction Networks for Learning about Objects, Relations and Physics

7 Geometric deep learning on graphs and manifolds using mixture model cnns

8 Molecular Graph Convolutions: Moving Beyond Fingerprints

9 Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs

10 Neural Message Passing for Quantum Chemistry

11 Inductive Representation Learning on Large Graphs

12 Robust Spatial Filtering With Graph Convolutional Neural Networks

13 Large-Scale Learnable Graph Convolutional Networks

14 Signed Graph Convolutional Network

15 GeniePath: Graph Neural Networks with Adaptive Receptive Paths

16 Fast Learning with Graph Convolutional Networks via Importance Sampling

17 Stochastic Training of Graph Convolutional Networks with Variance Reduction

18 Adaptive Sampling Towards Fast Graph Representation Learning

参考文献

相关推荐