空间卷积图神经网络

Spatial-based ConvGNN

1 Convolutional networks on graphs for learning molecular fingerprints

[Duvenaud D, 2015, 1] 是最早对化学上使用图表示分子的文章之一。文章从常见的circular fingerprint作为类比出发,设计了一套类似circular fingerprint的neural graph fingerprint方法,能进行end-to-end学习。

空间卷积图神经网络

2 Column Networks for Collective Classification

[Pham T, 2017, 2] 考虑了边上的信息:

  • contextual features

顶点ii的第rr个边:
ci,r(t)=1NrijNrihj(t1).(2.1) \vec{c}_{i,r}^{(t)} = \frac{1}{|\mathcal{N}_r{i}|} \sum_{j \in \mathcal{N}_r{i}} \vec{h}_j^{(t-1)}. \tag{2.1}

  • 卷积

hi(t)=g(b(t)+W(t)hi(t1)+1zr=1RVr(t)ci,r(t)).(2.2) \vec{h}_i^{(t)} = g \left( \vec{b}^{(t)} + W^{(t)} \vec{h}_i^{(t-1)} + \frac{1}{z} \sum_{r=1}^{R} V_r^{(t)} \vec{c}_{i,r}^{(t)} \right). \tag{2.2}
其中zz是超参。在最后的第TT步:
P(yi=l)=softmax(bl+WlhiT).(2.3) P(y_i = l) = \text{softmax} \left( \vec{b}_l + W_l \vec{h}_i^T \right). \tag{2.3}

空间卷积图神经网络

此外,式(2.1)可以加权和:
ci,r(t)=jNriαjhj(t1).jαj=1(2.4) \vec{c}_{i,r}^{(t)} = \sum_{j \in \mathcal{N}_r{i}} \alpha_j \vec{h}_j^{(t-1)}. \qquad \sum_{j} \alpha_j = 1 \tag{2.4}

原文还给出了跳跃连接:
h(t)=αh~(t)+(1α)h(t1),α=σ(bα(t)+Wα(t)hi(t1)+1zr=1RVαr(t)ci,r(t)).(2.5) \begin{aligned} \vec{h}^{(t)} &= \vec{\alpha} \odot \tilde{\vec{h}}^{(t)} + (\vec{1} - \vec{\alpha}) \odot \vec{h}^{(t-1)}, \\ \vec{\alpha} &= \sigma \left( \vec{b}_{\alpha}^{(t)} + W_{\alpha}^{(t)} \vec{h}_i^{(t-1)} + \frac{1}{z} \sum_{r=1}^{R} V_{\alpha r}^{(t)} \vec{c}_{i,r}^{(t)} \right). \end{aligned} \tag{2.5}

3 Learning Convolutional Neural Networks for Graphs

[Niepert M, 2016, 3] 提出的PATCHY-SAN在图上使用传统的CNN。为了能用上CNN,对图结构进行了规范化:固定了图顶点的数量,固定了图顶点的邻居顶点的数量:

  • 1.根据label排序后,选择固定的ww个顶点;
  • 2.对于ww中的每个顶点,使用BFS算法选择至少kk个邻居顶点;
  • 3.根据最佳的排序label选择kk给邻居顶点进行图规范化。

规范化后,得到顶点、边特征向量长度分别为av,aea_v,a_e的两个张量(w,k,av),(w,k,k,ae)(w,k,a_v),(w,k,k,a_e)。将它们reshape成(wk,av),(wk2,ae)(wk,a_v),(wk^2,a_e),将av,aea_v,a_e视作通道数,在第一维度上分别做1-D卷积。

最佳的排序label
最佳的排序label是指,任意两个图的图距离,和由label决定的邻接矩阵距离最小:
l^=argminlEG[dA(Al(G),Al(G))dG(G,G)].\hat{l} = \arg \min_{l} \mathbb{E}_{\mathcal{G}} \left[ \left| d_A \left( A^l(G), A^l(G^{'})\right) - d_G \left( G, G^{'}\right) \right| \right].

空间卷积图神经网络
空间卷积图神经网络

空间卷积图神经网络
空间卷积图神经网络

4 Diffusion-Convolutional Neural Networks

[Atwood J, 2016, 4] 根据邻接矩阵AA计算度归一化的转换矩阵PtP_t,它给出了从顶点ii跳到顶点jj的概率。

PtP_{t}^{*}PtRNt×H×NtP_{t} \in \reals^{N_t \times H \times N_t}的幂级数和,对于顶点分类、图分类、边分类分别有ZRNt×H×F,RH×F,RMt×H×FZ \in \reals^{N_t \times H \times F}, \reals^{H \times F}, \reals^{M_t \times H \times F}XtRNt×FX_t \in \reals^{N_t \times F}

  • Node Classification

Zt,ijk=f(Wjkcl=1NtPt,ijlXt,lk),即:Zt=f(WcPtXt).(4.1) \begin{aligned} Z_{t,ijk} &= f \left( W_{jk}^{c} \sum_{l=1}^{N_t} P_{t,ijl}^{*} X_{t,lk} \right), \\ \text{即:} Z_t &= f \left( W^{c} \odot P_{t}^{*} X_{t} \right). \end{aligned} \tag{4.1}
其中WcRH×FW^c \in \reals^{H \times F}

输出:
Y^=argmax(f(WDZ)),P(YX)=softmax(f(WDZ)).(4.2) \begin{aligned} \hat{Y} &= \arg \max \left( f(W^D \odot Z)\right), \\ \mathbb{P}(Y | X) &= \text{softmax} \left( f(W^D \odot Z)\right). \end{aligned} \tag{4.2}

  • Graph Classification

Zt=f(Wc1NtTPtXtNt).(4.3) Z_t = f \left( W^c \odot \frac{\vec{1}_{N_t}^T P_{t}^{*} X_{t}}{N_t} \right). \tag{4.3}

  • Edge Classification and Edge Features


At=(AtBtTBt0)(4.4) A_t^{'} = \begin{pmatrix} A_t & B_t^T\\ B_t & 0 \end{pmatrix} \tag{4.4}
计算PtP_t^{'}代替PtP_t做卷积取分类顶点或边。

空间卷积图神经网络

5 Quantum-Chemical Insights from Deep Tensor Neural Networks

[Schutt K T, 2017, 5] 聚合了顶点和边的信息。用高斯核对顶点VV构造距离矩阵D=(d^ij)V×VD=(\hat{\vec{d}}_{ij})_{|V| \times |V|}。记每个顶点viVv_i \in V的特征向量为xi\vec{x}_i,每条eijEe_{ij} \in E上的特征向量为yij\vec{y}_{ij}表示顶点vjv_jviv_i的作用。

  • 顶点特征向量xi\vec{x}_i更新:

xi(t+1)=xi(t)+jiyij.(5.1) \vec{x}_i^{(t+1)} = \vec{x}_i^{(t)} + \sum_{j \neq i} \vec{y}_{ij}. \tag{5.1}

  • 边特征向量yij\vec{y}_{ij}更新:

yij=tanh[Wfc((Wcfxj+vf1)(Wdfd^ij+vf2))].(5.2) \vec{y}_{ij} = \tanh \left[ W^{fc} \left( \left( W^{cf} \vec{x}_j + \vec{v}^{f_1} \right) \odot \left( W^{df} \hat{\vec{d}}_{ij} + \vec{v}^{f_2} \right) \right) \right]. \tag{5.2}

空间卷积图神经网络

6 Interaction Networks for Learning about Objects, Relations and Physics

[Battaglia P W, 2016, 6] 将图网络应用于自然科学中。IN模块为:
IN(G)=ϕO(a(G,X,ϕR(m(G))));m(G)=B={bk}k=1,,NRϕR(B)=E={ek}k=1,,NRa(G,X,E)=C={cj}j=1,,NOϕO(C)=P={pj}j=1,,NO(6.1) \begin{aligned} IN(G) &= \phi_{O} \left( a \left( G, X, \phi_{R} \left( m(G) \right) \right) \right);\\ m(G) &= B = \{b_k\}_{k = 1, \cdots, N_R} \\ \phi_{R}(B) &= E = \{e_k\}_{k = 1, \cdots, N_R} \\ a(G,X,E) &= C = \{c_j\}_{j = 1, \cdots, N_O} \\ \phi_{O}(C) &= P = \{p_j\}_{j = 1, \cdots, N_O} \\ \end{aligned} \tag{6.1}

NON_O表示顶点的个数,NRN_R表示边(关系)的个数,DS,DRD_S,D_R分别表示顶点、边的特征向量长度。那么ORDS×NOO \in \reals^{D_S \times N_O}表示所有顶点的特征列向量组成的矩阵,RaRDR×NRR_a \in \reals^{D_R \times N_R}表示所有边的特征列向量组成的矩阵。将顶点编号,用0-1表示接受矩阵RrRNO×NRR_r \in \reals^{N_O \times N_R}和发射矩阵RsRNO×NRR_s \in \reals^{N_O \times N_R}

用多层全连接实现ϕR,ϕO\phi_R,\phi_O
B=m(G)=[ORr;ORs;Ra]R(2DS+DR)×NRE=ϕR(B)RDE×NREˉ=ERrTRDE×NOC=a(G,X,E)=[O;X;Eˉ]R(DS+DX+DE)×NOP=ϕO(C)RDP×NO(6.2) \begin{aligned} B &= m(G) = [OR_r; OR_s; R_a] &\in \reals^{(2D_S + D_R) \times N_R} \\ E &= \phi_R(B) &\in \reals^{D_E \times N_R} \\ \bar{E} &= E R_r^T &\in \reals^{D_E \times N_O} \\ C &= a(G,X,E) = [O; X; \bar{E}] &\in \reals^{(D_S + D_X + D_E) \times N_O} \\ P &= \phi_O(C) &\in \reals^{D_P \times N_O} \\ \end{aligned} \tag{6.2}

7 Geometric deep learning on graphs and manifolds using mixture model cnns

[F. Monti, 2017, 7] 提出了能再流形和图上混合使用的卷积。对于每个顶点xx的邻居点yN(x)y \in \mathcal{N}(x)都用伪坐标向量u(x,y)\vec{u}(x,y)将它们关联。定义由可学习参数Θ\Theta决定的权重函数wΘ(u)=(w1(u),,wJ(u))\vec{w}_{\Theta}(\vec{u}) = (w_1(\vec{u}), \cdots, w_J(\vec{u}))
Dj(x)f=yN(x)wj(u(x,y))f(y),j=1,,J.(7.1) D_j(x)f = \sum_{y \in \mathcal{N}(x)} w_j(\vec{u}(x,y)) f(y), j = 1, \cdots, J. \tag{7.1}
其中JJ代表提取的patch的维数。原文里wj(u)w_j(\vec{u})选的是由可学习的Σj,μj\Sigma_j,\vec{\mu}_j决定的高斯核:
wj(u)=exp(12(uμj)TΣj1(uμj)).(7.2) w_j(\vec{u}) = \exp \left( -\frac{1}{2} \left( \vec{u} - \vec{\mu}_j \right)^T \Sigma_j^{-1} \left( \vec{u} - \vec{\mu}_j \right) \right). \tag{7.2}
那么卷积操作为:
(fg)(x)=j=1JgjDj(x)f.(7.3) \left( f \star g \right)(x) = \sum_{j=1}^{J} g_j D_j(x) f. \tag{7.3}

8 Molecular Graph Convolutions: Moving Beyond Fingerprints

[Kearnes S, 2017, 8] 提出了图上建模的三个性质:

  • Property 1 (Order invariance). The output of the model should be invariant to the order that the atom and bond information is encoded in the input.

  • Property 2 (Atom and pair permutation invariance). The values of an atom layer and pair permute with the original input layer order. More precisely, if the inputs are permuted with a permutation operator QQ, then for all layers xx, yy, AxA^x and PyP^y are permuted with operator QQ as well.

  • Property 3 (Pair order invariance). For all pair layers yy, P(a,b)y=P(b,a)yP^y_{(a,b)} = P^y_{(b,a)}

空间卷积图神经网络

Ax,pyA^x,p^y分别表示原子层(atom)、键对层(pair)的第xxyy层的值。网络由4个主要操作组成:

  • (AAA \rightarrow A)

Aay=f(Aax1,Aax2,,Aaxn).(8.1) A_a^y = f \left( A_a^{x1}, A_a^{x2}, \cdots, A_a^{xn} \right). \tag{8.1}

  • (APA \rightarrow P)

Pa,by=g(f(Aax,Abx),f(Abx,Aax)).(8.2) P_{a,b}^y = g \left( f \left( A_{a}^{x}, A_{b}^{x} \right), f \left( A_{b}^{x}, A_{a}^{x} \right) \right). \tag{8.2}

  • (PAP \rightarrow A)

Aay=g(f(P(a,b)x),f(P(a,c)x),f(P(a,d)x),).(8.3) A_{a}^y = g \left( f \left(P_{(a,b)}^{x}\right), f \left(P_{(a,c)}^{x}\right), f \left(P_{(a,d)}^{x}\right), \cdots \right). \tag{8.3}

  • (PPP \rightarrow P)

Pa,by=f(Pa,bx1,Pa,bx2,,Pa,bxn).(8.4) P_{a,b}^y = f \left( P_{a,b}^{x1}, P_{a,b}^{x2}, \cdots, P_{a,b}^{xn} \right). \tag{8.4}

其中f(),g()f(\cdot),g(\cdot)表示任意函数和任意的commutative函数。

9 Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs

[Simonovsky M, 2017, 9] 用边的特征做构造顶点的卷积权重。对于边的特征L(j,i)RsL(j,i) \in \reals^{s}:
Θjil=Fl(L(j,i))Rdl×dl1.(9.1) \Theta_{ji}^{l} = F^l(L(j,i)) \quad \in \reals^{d_l \times d_{l-1}}. \tag{9.1}

卷积为:
Xl(i)=1N(i)jN(i)Fl(L(j,i))Xl1(j)+bl,=1N(i)jN(i)ΘjilXl1(j)+bl.(9.2) \begin{aligned} X^l(i) &= \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} F^l(L(j,i)) X^{l-1}(j) + b^l,\\ &= \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \Theta_{ji}^{l} X^{l-1}(j) + b^l. \end{aligned} \tag{9.2}

空间卷积图神经网络

原文提到的池化方法:

  • 1.下采样或者合并顶点
  • 2.创建新的边缘结构E(h)E^{(h)}并标记L(h)L^{(h)}(所谓的约简)
  • 3.将原来的顶点映射到新的顶点上M(h):V(h1)V(h)M^{(h)} : V^{(h-1)} \rightarrow V^{(h)}

10 Neural Message Passing for Quantum Chemistry

[Gilmer J, 2017, 10] 提出的是一个MPNN框架,将GNN分成了两个阶段:消息传播阶段和读出阶段。

  • message passing phase

消息传播阶段又由两部分组成,一是将顶点的邻居顶点和相连边的特征聚合Mt()M_t(\cdot)(message function),二是对顶点更新Ut()U_t(\cdot)(vertex update function):
mv(t+1)=wN(v)Mt(hv(t),hw(t),evw),hv(t+1)=Ut(hv(t),mv(t+1)).(10.1) \begin{aligned} m_v^{(t+1)} &= \sum_{w \in \mathcal{N}(v)} M_t \left( h_v^{(t)}, h_w^{(t)}, e_{vw} \right), \\ h_v^{(t+1)} &= U_t \left( h_v^{(t)}, m_v^{(t+1)} \right). \end{aligned} \tag{10.1}

  • readout phase

对图上每个顶点最后使用读出函数R()R(\cdot)
y^=R({hv(T)vVG}).(10.2) \hat{y} = R \left( \{ h_v^{(T)} | v \in \mathcal{V}_G \}\right). \tag{10.2}

11 Inductive Representation Learning on Large Graphs

[Hamilton W L, 2017, 11] 提出的方法叫GraphSAGE,可扩展性更强,对于节点分类和链接预测问题的表现也比较突出。同样,GraphSAGE也可以用MPNN框架描述:

  • 消息聚合:

hN(v)(k)AGGREGATEk(hu(k1),uN(v)).(11.1) h_{\mathcal{N}(v)}^{(k)} \leftarrow \text{AGGREGATE}_k \left( h_u^{(k-1)}, \forall u \in \mathcal{N}(v) \right). \tag{11.1}

  • 顶点特征更新:

hv(k)σ(W(k)CONCAT(hv(k1),hN(v)(k))).hv(k)hv(k)hv(k)2,vV.(11.2) \begin{aligned} h_v^{(k)} & \leftarrow \sigma \left( W^{(k)} \cdot \text{CONCAT} \left( h_v^{(k-1)}, h_{\mathcal{N}(v)}^{(k)} \right)\right). \\ h_v^{(k)} & \leftarrow \frac{h_v^{(k)}}{\| h_v^{(k)} \|_2}, \quad \forall v \in \mathcal{V}. \end{aligned} \tag{11.2}

  • 读出函数:

zvhv(K),vV.(11.3) z_v \leftarrow h_v^{(K)}, \quad \forall v \in \mathcal{V}. \tag{11.3}

空间卷积图神经网络

原文给出了三种聚合函数:

  • Mean aggregator:

hv(k)σ(WMEAN({hv(k1)}{hu(k1),uN(v)})).(11.4) h_v^{(k)} \leftarrow \sigma \left( W \cdot \text{MEAN} \left( \{ h_v^{(k-1)} \} \cup \{ h_u^{(k-1)}, \forall u \in \mathcal{N}(v) \} \right) \right). \tag{11.4}

  • LSTM aggregator.

  • Pooling aggregator:

AGGREGATEkpool=max({σ(Wpoolhuik+b),uiN(v)}).(11.5) \text{AGGREGATE}_k^{\text{pool}} = \max \left( \left\{ \sigma \left( W_{\text{pool}} h_{u_i}^k + b \right), \forall u_i \in \mathcal{N}(v) \right\} \right). \tag{11.5}

其中WpoolW_{\text{pool}}是可学习的参数。

12 Robust Spatial Filtering With Graph Convolutional Neural Networks

[Such F P, 2017, 12] 提出的网络能够适用于同质或异质的图数据集。

对于不同的特征可以定义不同的邻接矩阵,将这些邻接矩阵组成A=(A1,,AL)RN×N×L\mathcal{A} = (A_1,\cdots, A_L) \in \reals^{N \times N \times L}

卷积核HRN×N×CH \in \reals^{N \times N \times C}为:
H(c)l=1Lhl(c)Al.(12.1) H^{(c)} \approx \sum_{l=1}^{L} h_l^{(c)} A_l. \tag{12.1}

卷积操作为:
Vout=c=1CH(c)Vin(c)+b.(12.2) V_{\text{out}} = \sum_{c=1}^{C} H^{(c)} V_{\text{in}}^{(c)} + b. \tag{12.2}

图嵌入池化过程。为了得到VembRN×NV_{emb} \in \reals^{N \times N^{'}}使用卷积核HembRN×N×C×NH_{emb} \in \reals^{N \times N \times C \times N^{'}}
Vemb(n)=cCHemb(c,n)Vin(c)+b,Vemb=σ(Vemb),Vout=VembTVin,Aout=VembTAinVemb(12.3) \begin{aligned} V_{emb}^{(n^{'})} &= \sum_{c}^{C} H_{emb}^{(c,n^{'})} V_{in}^{(c)} + b, \\ V_{emb}^{*} &= \sigma (V_{emb}), \\ V_{out} &= V_{emb}^{*T} V_{in}, \\ A_{out} &= V_{emb}^{*T} A_{in} V_{emb}^{*} \end{aligned} \tag{12.3}

空间卷积图神经网络

13 Large-Scale Learnable Graph Convolutional Networks

[Gao H, 2018, 13] 同样是在顶点特征向量上使用传统的CNN。

空间卷积图神经网络

设第ll层的图顶点特征向量长度为FlF_l,每个顶点选择kk个邻居点。

卷积过程:

  1. 将每个顶点的邻居N\mathcal{N}的特征向量拼接成张量RN×Fl\reals^{|\mathcal{N}| \times F_l }
  2. 将张量RN×Fl\reals^{|\mathcal{N}| \times F_l }的每列从大到小排序,然后截取前kk个,得Rk×Fl\reals^{k \times F_l }
  3. 再和顶点的特征向量拼接为R(k+1)×Fl\reals^{ (k+1) \times F_l }
  4. CONV1:R(k+1)×FlR(k2+1)×kCONV_1 : \reals^{ (k+1) \times F_l } \rightarrow \reals^{ (\frac{k}{2}+1) \times k }
  5. CONV2:R(k2+1)×kR1×Fl+1CONV_2 : \reals^{ (\frac{k}{2}+1) \times k } \rightarrow \reals^{ 1 \times F_{l+1} }

14 Signed Graph Convolutional Network

[Derr T, 2018, 14] 提出了在有符号的图数据上建立神经网络。在无符号的图中卷积一般是聚合邻居顶点的信息,但是由于在有符号的图上有正有负,含义不同,需要将图分成平衡路径和非平衡路径。

分解利用了平衡理论,直观的讲就是:1)朋友的朋友,就是我的朋友,2)敌人的朋友就是我的敌人。 平衡的部分包含偶数条负连接,反之,包含奇数条负链接,被认为是不平衡。见下图

空间卷积图神经网络

分解后,分别做卷积聚合,平衡部分:
hiB(l)={σ(WB(1)[jNi+hj(0)Ni+,hi(0)])l=1;σ(WB(l)[jNi+hjB(l1)Ni+,kNihkU(l1)Ni,hiB(l1)])l1.(14.1) \large h_i^{B(l)} = \begin{cases} \sigma \left( W^{B(1)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{(0)}}{|\mathcal{N}_{i}^{+}|}, h_i^{(0)} \right] \right) \quad l = 1;\\ \sigma \left( W^{B(l)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{B(l-1)}}{|\mathcal{N}_{i}^{+}|}, \sum_{k \in \mathcal{N}_{i}^{-}} \frac{h_k^{U(l-1)}}{|\mathcal{N}_{i}^{-}|}, h_i^{B(l-1)} \right] \right) \quad l \neq 1. \end{cases} \tag{14.1}

非平衡部分:
hiU(l)={σ(WU(1)[jNihj(0)Ni,hi(0)])l=1;σ(WU(l)[jNi+hjU(l1)Ni+,kNihkB(l1)Ni,hiU(l1)])l1.(14.2) \large h_i^{U(l)} = \begin{cases} \sigma \left( W^{U(1)} \left[ \sum_{j \in \mathcal{N}_{i}^{-}} \frac{h_j^{(0)}}{|\mathcal{N}_{i}^{-}|}, h_i^{(0)} \right] \right) \quad l = 1;\\ \sigma \left( W^{U(l)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{U(l-1)}}{|\mathcal{N}_{i}^{+}|}, \sum_{k \in \mathcal{N}_{i}^{-}} \frac{h_k^{B(l-1)}}{|\mathcal{N}_{i}^{-}|}, h_i^{U(l-1)} \right] \right) \quad l \neq 1. \end{cases} \tag{14.2}

读出:
zi[hiB(L),hiU(L)],uiU.(14.3) z_i \leftarrow [h_i^{B(L)}, h_i^{U(L)}], \quad \forall u_i \in \mathcal{U}. \tag{14.3}

空间卷积图神经网络

15 GeniePath: Graph Neural Networks with Adaptive Receptive Paths

[Liu Z, 2018, 15] 提出了在广度和深度自适应选择顶点经行特征聚合。

  • Adaptive Breadth

基于GAT学习一跳邻居(隐)特征的重要性,决定继续探索的方向:
hi(tmp)=tanh(W(t)TjN(i){i}α(hi(t),hj(t))hj(t)).(15.1) h_i^{(\text{tmp})} = \tanh \left( {W^{(t)}}^{T} \sum_{j \in \mathcal{N}(i) \cup \{ i \}} \alpha \left( h_i^{(t)}, h_j^{(t)} \right) \cdot h_j^{(t)} \right). \tag{15.1}
其中α(hi(t),hj(t))\alpha \left( h_i^{(t)}, h_j^{(t)} \right)是注意力机制:
α(x,y)=softmaxy(vTtanh(WsTx+WdTy)),softmaxy(,y)=expf(,y)yexpf(,y).(15.2) \begin{aligned} \alpha (x,y) &= \text{softmax}_y \left( v^T \tanh \left( W_s^T x + W_d^T y \right) \right), \\ \text{softmax}_y(\cdot, y) &= \frac{\exp f(\cdot,y)}{\sum_{y^{'}} \exp f(\cdot, y^{'})}. \end{aligned} \tag{15.2}

  • Adaptive Depth

ii=σ(Wi(t)Thi(tmp)),fi=σ(Wf(t)Thi(tmp)),oi=σ(Wo(t)Thi(tmp)),C~=tanh(Wc(t)Thi(tmp)),Ci(t+1)=fiCi(t)+iiC~,ht(t+1)=oitanh(Ci(t+1)).(15.3) \begin{aligned} i_i &= \sigma \left( {W_i^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\ f_i &= \sigma \left( {W_f^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\ o_i &= \sigma \left( {W_o^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\ \widetilde{C} &= \tanh \left( {W_c^{(t)}}^{T} h_i^{(\text{tmp})} \right),\\ C_i^{(t+1)} &= f_i \odot C_i^{(t)} + i_i \odot \widetilde{C}, \\ h_t^{(t+1)} &= o_i \odot \tanh(C_i^{(t+1)}). \end{aligned} \tag{15.3}

空间卷积图神经网络

原文给出了一个变体:
μi(0)=WxTXi,ii=σ(Wi(t)TCONCAT(hi(t),μi(t))),fi=σ(Wf(t)TCONCAT(hi(t),μi(t))),oi=σ(Wo(t)TCONCAT(hi(t),μi(t))),C~=tanh(Wc(t)TCONCAT(hi(t),μi(t))),Ci(t+1)=fiCi(t)+iiC~,μt(t+1)=oitanh(Ci(t+1)). \begin{aligned} \mu_i^{(0)} &= W_x^T X_i,\\ i_i &= \sigma \left( {W_i^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\ f_i &= \sigma \left( {W_f^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\ o_i &= \sigma \left( {W_o^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\ \widetilde{C} &= \tanh \left( {W_c^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right),\\ C_i^{(t+1)} &= f_i \odot C_i^{(t)} + i_i \odot \widetilde{C}, \\ \mu_t^{(t+1)} &= o_i \odot \tanh(C_i^{(t+1)}). \end{aligned}

16 Fast Learning with Graph Convolutional Networks via Importance Sampling

[Chen J, 2018, 16] 为了加快训练速度,提出了采样方法,即选取一部分顶点参与训练。

下图算法是随机采样:

空间卷积图神经网络

下图是论文中提出的采用重要性采样,原文证明能有更小的方差:

空间卷积图神经网络

其中的采样概率为:
q(u)=A^(:,u)2uVA^(:,u)2,uV.(16.1) q(u) = \frac{\| \hat{A}(:,u) \|^2}{ \sum_{u^{'} \in \mathcal{V}} \| \hat{A}(:,u^{'}) \|^2}, \quad u \in \mathcal{V}. \tag{16.1}

17 Stochastic Training of Graph Convolutional Networks with Variance Reduction

[Chen J, 2018, 17] 证明[Chen J, 2018, 16]的重要性采样方法在实践中并不好,因为有的顶点可能有很多采样点而另一些顶点可能没有采样点。

A^=A+IN,D^vv=uA^uv,P=D^12A^D^12\hat{A} = A + I_N, \hat{D}_{vv} = \sum_u \hat{A}_{uv}, P = \hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}}P^\hat{P}PP的无偏估计P^uv(l)=N(u)D(l)Puv\hat{P}_{uv}^{(l)} = \frac{|\mathcal{N}(u)|}{D^{(l)}} P_{uv},原文将hv(l)h_v^{(l)}分解成hv(l)=Δhv(l)+hˉv(l)h_v^{(l)} = \Delta h_v^{(l)} + \bar{h}_v^{(l)}。原始的卷积中hv(l)h_v^{(l)}的递归计算导致计算量过大,[Chen J, 2018, 17]将每层得到的hv(l)h_v^{(l)}计算hˉv(l)\bar{h}_v^{(l)}用于保存做历史的近似。
(PH(l))u=vN(u)PuvΔhv(l)+vN(u)Puvhˉv(l),N(u)D(l)vN^(u)PuvΔhv(l)+vN(u)Puvhˉv(l). \begin{aligned} (PH^{(l)})_u &= \sum_{v \in \mathcal{N}(u)} P_{uv} \Delta h_v^{(l)} + \sum_{v \in \mathcal{N}(u)} P_{uv} \bar{h}_v^{(l)}, \\ &\approx \frac{|\mathcal{N}(u)|}{D^{(l)}} \sum_{v \in \hat{\mathcal{N}}(u)} P_{uv} \Delta h_v^{(l)} + \sum_{v \in \mathcal{N}(u)} P_{uv} \bar{h}_v^{(l)}. \end{aligned}
也就是,随机选取近邻节点所得到的项是Δhv(l)\Delta h_v^{(l)}

卷积为:
Z(l+1)=(P^(l)(H(l)Hˉ(l))+PHˉ(l))W(l)(17.1) Z^{(l+1)} = \left( \hat{P}^{(l)} \left( H^{(l)} - \bar{H}^{(l)} \right) + P \bar{H}^{(l)} \right) W^{(l)} \tag{17.1}

Stochastic GCN:

  1. 随机选择一个小批量顶点集VBVL\mathcal{V}_B \in \mathcal{V}_L;
  2. 建立一个仅包含当前小批量所需的**hv(l)h_v^{(l)}hˉv(l)\bar{h}_v^{(l)}的计算图;
  3. 通过等式(17.1)正向传播获得预测;
  4. 通过向后传播获取梯度,并通过SGD更新参数;
  5. 更新历史**值。

18 Adaptive Sampling Towards Fast Graph Representation Learning

[Huang W, 2018, 18]

空间卷积图神经网络

一般的卷积:
h(l+1)(vi)=σ(j=1Na^(vi,uj)h(l)(uj)W(l)),i=1,,N.(18.1) h^{(l+1)}(v_i) = \sigma \left( \sum_{j=1}^{N} \hat{a}(v_i,u_j) h^{(l)}(u_j) W^{(l)} \right), \quad i = 1, \cdots, N. \tag{18.1}

  • Node-Wise Sampling

将式(18.1)改写为:
h(l+1)(vi)=σ((j=1Na^(vi,uj))(j=1Na^(vi,uj)j=1Na^(vi,uj))h(l)(uj)W(l)),,=N(vi):=j=1Na^(vi,uj)σ(N(vi)(j=1Na^(vi,uj)N(vi))h(l)(uj)W(l)),=p(ujvi):=a^(vi,uj)N(vi)σ(N(vi)(j=1Np(ujvi)h(l)(uj))W(l)),=σ(N(vi)Ep(ujvi)[h(l)(uj)]W(l)),i=1,,N.(18.2) \begin{aligned} h^{(l+1)}(v_i) &= \sigma \left( \left( \sum_{j=1}^{N} \hat{a}(v_i,u_j) \right) \left( \sum_{j=1}^{N} \frac{\hat{a}(v_i,u_j)}{ \sum_{j=1}^{N} \hat{a}(v_i,u_j) } \right) h^{(l)}(u_j) W^{(l)} \right), ,\\ &\overset{N(v_i):= \sum_{j=1}^{N} \hat{a}(v_i,u_j)}{=} \sigma \left( N(v_i) \left( \sum_{j=1}^{N} \frac{\hat{a}(v_i,u_j)}{ N(v_i) } \right) h^{(l)}(u_j) W^{(l)} \right), \\ &\overset{p(u_j | v_i):= \frac{\hat{a}(v_i,u_j)}{ N(v_i) }}{=} \sigma \left( N(v_i) \left( \sum_{j=1}^{N} p(u_j | v_i) h^{(l)}(u_j) \right) W^{(l)} \right), \\ &= \sigma \left( N(v_i) \mathbb{E}_{p(u_j | v_i)} \left[ h^{(l)}(u_j) \right] W^{(l)} \right), \quad i = 1, \cdots, N.\\ \end{aligned} \tag{18.2}

使用Monte-Carlo sampling,μp(vi)=Ep(ujvi)[h(l)(uj)]\mu_p(v_i) = \mathbb{E}_{p(u_j | v_i)} \left[ h^{(l)}(u_j) \right]
μ^p(vi)=1nj=1nh(l)(u^j),u^jp(ujvi).(18.3) \hat{\mu}_p(v_i) = \frac{1}{n} \sum_{j=1}^{n} h^{(l)} (\hat{u}_j), \quad \hat{u}_j \sim p(u_j | v_i). \tag{18.3}

  • Layer-Wise Sampling
    同样地:
    h(l+1)(vi)=σ(N(vi)Eq(ujv1,,vn)[p(ujvi)q(ujv1,,vn)h(l)(uj)]W(l)),i=1,,N.(18.4) h^{(l+1)}(v_i) = \sigma \left( N(v_i) \mathbb{E}_{q(u_j | v_1,\cdots,v_n)} \left[ \frac{p(u_j | v_i)}{q(u_j | v_1,\cdots,v_n)} h^{(l)}(u_j) \right] W^{(l)} \right), \quad i = 1, \cdots, N. \tag{18.4}
    同样Monte-Carlo sampling,μq(vi)=Eq(ujv1,,vn)[p(ujvi)q(ujv1,,vn)h(l)(uj)]\mu_q(v_i) = \mathbb{E}_{q(u_j | v_1,\cdots,v_n)} \left[ \frac{p(u_j | v_i)}{q(u_j | v_1,\cdots,v_n)} h^{(l)}(u_j) \right]
    μ^q(vi)=1nj=1np(u^jvi)q(u^jv1,,vn)h(l)(u^j),μ^jq(u^jv1,,vn).(18.5) \hat{\mu}_q(v_i) = \frac{1}{n} \sum_{j=1}^{n} \frac{p(\hat{u}_j | v_i)}{q(\hat{u}_j | v_1,\cdots,v_n)} h^{(l)} (\hat{u}_j), \quad \hat{\mu}_j \sim q(\hat{u}_j | v_1,\cdots,v_n). \tag{18.5}

原文给了q(uj)q(u_j)最佳取法:
q(uj)=i=1np(ujvi)g(x(uj))j=1Ni=1np(ujvi)g(x(uj)).(18.6) q^{*}(u_j) = \frac{ \sum_{i=1}^{n} p(u_j | v_i) | g(x(u_j)) | }{ \sum_{j=1}^{N} \sum_{i=1}^{n} p(u_j | v_i) | g(x(u_j)) | }. \tag{18.6}
其中g()g(\cdot)是线性行数。

参考文献