Spatial-based ConvGNN
1 Convolutional networks on graphs for learning molecular fingerprints
[Duvenaud D, 2015, 1 ] 是最早对化学上使用图表示分子的文章之一。文章从常见的circular fingerprint作为类比出发,设计了一套类似circular fingerprint的neural graph fingerprint方法,能进行end-to-end学习。
2 Column Networks for Collective Classification
[Pham T, 2017, 2 ] 考虑了边上的信息:
顶点i i i 的第r r r 个边:c ⃗ i , r ( t ) = 1 ∣ N r i ∣ ∑ j ∈ N r i h ⃗ j ( t − 1 ) . (2.1)
\vec{c}_{i,r}^{(t)} = \frac{1}{|\mathcal{N}_r{i}|} \sum_{j \in \mathcal{N}_r{i}} \vec{h}_j^{(t-1)}.
\tag{2.1}
c i , r ( t ) = ∣ N r i ∣ 1 j ∈ N r i ∑ h j ( t − 1 ) . ( 2 . 1 )
h ⃗ i ( t ) = g ( b ⃗ ( t ) + W ( t ) h ⃗ i ( t − 1 ) + 1 z ∑ r = 1 R V r ( t ) c ⃗ i , r ( t ) ) . (2.2)
\vec{h}_i^{(t)} = g \left( \vec{b}^{(t)} + W^{(t)} \vec{h}_i^{(t-1)} + \frac{1}{z} \sum_{r=1}^{R} V_r^{(t)} \vec{c}_{i,r}^{(t)} \right).
\tag{2.2}
h i ( t ) = g ( b ( t ) + W ( t ) h i ( t − 1 ) + z 1 r = 1 ∑ R V r ( t ) c i , r ( t ) ) . ( 2 . 2 )
其中z z z 是超参。在最后的第T T T 步:P ( y i = l ) = softmax ( b ⃗ l + W l h ⃗ i T ) . (2.3)
P(y_i = l) = \text{softmax} \left( \vec{b}_l + W_l \vec{h}_i^T \right).
\tag{2.3}
P ( y i = l ) = softmax ( b l + W l h i T ) . ( 2 . 3 )
此外,式(2.1)可以加权和:c ⃗ i , r ( t ) = ∑ j ∈ N r i α j h ⃗ j ( t − 1 ) . ∑ j α j = 1 (2.4)
\vec{c}_{i,r}^{(t)} = \sum_{j \in \mathcal{N}_r{i}} \alpha_j \vec{h}_j^{(t-1)}. \qquad
\sum_{j} \alpha_j = 1
\tag{2.4}
c i , r ( t ) = j ∈ N r i ∑ α j h j ( t − 1 ) . j ∑ α j = 1 ( 2 . 4 )
原文还给出了跳跃连接:h ⃗ ( t ) = α ⃗ ⊙ h ⃗ ~ ( t ) + ( 1 ⃗ − α ⃗ ) ⊙ h ⃗ ( t − 1 ) , α ⃗ = σ ( b ⃗ α ( t ) + W α ( t ) h ⃗ i ( t − 1 ) + 1 z ∑ r = 1 R V α r ( t ) c ⃗ i , r ( t ) ) . (2.5)
\begin{aligned}
\vec{h}^{(t)} &= \vec{\alpha} \odot \tilde{\vec{h}}^{(t)} + (\vec{1} - \vec{\alpha}) \odot \vec{h}^{(t-1)}, \\
\vec{\alpha} &= \sigma \left( \vec{b}_{\alpha}^{(t)} + W_{\alpha}^{(t)} \vec{h}_i^{(t-1)} + \frac{1}{z} \sum_{r=1}^{R} V_{\alpha r}^{(t)} \vec{c}_{i,r}^{(t)} \right).
\end{aligned}
\tag{2.5}
h ( t ) α = α ⊙ h ~ ( t ) + ( 1 − α ) ⊙ h ( t − 1 ) , = σ ( b α ( t ) + W α ( t ) h i ( t − 1 ) + z 1 r = 1 ∑ R V α r ( t ) c i , r ( t ) ) . ( 2 . 5 )
3 Learning Convolutional Neural Networks for Graphs
[Niepert M, 2016, 3 ] 提出的PATCHY-SAN在图上使用传统的CNN。为了能用上CNN,对图结构进行了规范化:固定了图顶点的数量,固定了图顶点的邻居顶点的数量:
1.根据label排序后,选择固定的w w w 个顶点;
2.对于w w w 中的每个顶点,使用BFS算法选择至少k k k 个邻居顶点;
3.根据最佳的排序label选择k k k 给邻居顶点进行图规范化。
规范化后,得到顶点、边特征向量长度分别为a v , a e a_v,a_e a v , a e 的两个张量( w , k , a v ) , ( w , k , k , a e ) (w,k,a_v),(w,k,k,a_e) ( w , k , a v ) , ( w , k , k , a e ) 。将它们reshape成( w k , a v ) , ( w k 2 , a e ) (wk,a_v),(wk^2,a_e) ( w k , a v ) , ( w k 2 , a e ) ,将a v , a e a_v,a_e a v , a e 视作通道数,在第一维度上分别做1-D卷积。
最佳的排序label :
最佳的排序label是指,任意两个图的图距离,和由label决定的邻接矩阵距离最小:l ^ = arg min l E G [ ∣ d A ( A l ( G ) , A l ( G ′ ) ) − d G ( G , G ′ ) ∣ ] . \hat{l} = \arg \min_{l} \mathbb{E}_{\mathcal{G}} \left[ \left| d_A \left( A^l(G), A^l(G^{'})\right) - d_G \left( G, G^{'}\right) \right| \right]. l ^ = arg l min E G [ ∣ ∣ ∣ d A ( A l ( G ) , A l ( G ′ ) ) − d G ( G , G ′ ) ∣ ∣ ∣ ] .
4 Diffusion-Convolutional Neural Networks
[Atwood J, 2016, 4 ] 根据邻接矩阵A A A 计算度归一化的转换矩阵P t P_t P t ,它给出了从顶点i i i 跳到顶点j j j 的概率。
P t ∗ P_{t}^{*} P t ∗ 是P t ∈ R N t × H × N t P_{t} \in \reals^{N_t \times H \times N_t} P t ∈ R N t × H × N t 的幂级数和,对于顶点分类、图分类、边分类分别有Z ∈ R N t × H × F , R H × F , R M t × H × F Z \in \reals^{N_t \times H \times F}, \reals^{H \times F}, \reals^{M_t \times H \times F} Z ∈ R N t × H × F , R H × F , R M t × H × F 。X t ∈ R N t × F X_t \in \reals^{N_t \times F} X t ∈ R N t × F
Z t , i j k = f ( W j k c ∑ l = 1 N t P t , i j l ∗ X t , l k ) , 即: Z t = f ( W c ⊙ P t ∗ X t ) . (4.1)
\begin{aligned}
Z_{t,ijk} &= f \left( W_{jk}^{c} \sum_{l=1}^{N_t} P_{t,ijl}^{*} X_{t,lk} \right), \\
\text{即:} Z_t &= f \left( W^{c} \odot P_{t}^{*} X_{t} \right).
\end{aligned}
\tag{4.1}
Z t , i j k 即: Z t = f ( W j k c l = 1 ∑ N t P t , i j l ∗ X t , l k ) , = f ( W c ⊙ P t ∗ X t ) . ( 4 . 1 )
其中W c ∈ R H × F W^c \in \reals^{H \times F} W c ∈ R H × F 。
输出:Y ^ = arg max ( f ( W D ⊙ Z ) ) , P ( Y ∣ X ) = softmax ( f ( W D ⊙ Z ) ) . (4.2)
\begin{aligned}
\hat{Y} &= \arg \max \left( f(W^D \odot Z)\right), \\
\mathbb{P}(Y | X) &= \text{softmax} \left( f(W^D \odot Z)\right).
\end{aligned}
\tag{4.2}
Y ^ P ( Y ∣ X ) = arg max ( f ( W D ⊙ Z ) ) , = softmax ( f ( W D ⊙ Z ) ) . ( 4 . 2 )
Z t = f ( W c ⊙ 1 ⃗ N t T P t ∗ X t N t ) . (4.3)
Z_t = f \left( W^c \odot \frac{\vec{1}_{N_t}^T P_{t}^{*} X_{t}}{N_t} \right).
\tag{4.3}
Z t = f ( W c ⊙ N t 1 N t T P t ∗ X t ) . ( 4 . 3 )
Edge Classification and Edge Features :
用A t ′ = ( A t B t T B t 0 ) (4.4)
A_t^{'} =
\begin{pmatrix}
A_t & B_t^T\\
B_t & 0
\end{pmatrix}
\tag{4.4}
A t ′ = ( A t B t B t T 0 ) ( 4 . 4 )
计算P t ′ P_t^{'} P t ′ 代替P t P_t P t 做卷积取分类顶点或边。
5 Quantum-Chemical Insights from Deep Tensor Neural Networks
[Schutt K T, 2017, 5 ] 聚合了顶点和边的信息。用高斯核对顶点V V V 构造距离矩阵D = ( d ⃗ ^ i j ) ∣ V ∣ × ∣ V ∣ D=(\hat{\vec{d}}_{ij})_{|V| \times |V|} D = ( d ^ i j ) ∣ V ∣ × ∣ V ∣ 。记每个顶点v i ∈ V v_i \in V v i ∈ V 的特征向量为x ⃗ i \vec{x}_i x i ,每条e i j ∈ E e_{ij} \in E e i j ∈ E 上的特征向量为y ⃗ i j \vec{y}_{ij} y i j 表示顶点v j v_j v j 对v i v_i v i 的作用。
x ⃗ i ( t + 1 ) = x ⃗ i ( t ) + ∑ j ≠ i y ⃗ i j . (5.1)
\vec{x}_i^{(t+1)} = \vec{x}_i^{(t)} + \sum_{j \neq i} \vec{y}_{ij}.
\tag{5.1}
x i ( t + 1 ) = x i ( t ) + j = i ∑ y i j . ( 5 . 1 )
边特征向量y ⃗ i j \vec{y}_{ij} y i j 更新:
y ⃗ i j = tanh [ W f c ( ( W c f x ⃗ j + v ⃗ f 1 ) ⊙ ( W d f d ⃗ ^ i j + v ⃗ f 2 ) ) ] . (5.2)
\vec{y}_{ij} = \tanh \left[ W^{fc} \left( \left( W^{cf} \vec{x}_j + \vec{v}^{f_1} \right) \odot \left( W^{df} \hat{\vec{d}}_{ij} + \vec{v}^{f_2} \right) \right) \right].
\tag{5.2}
y i j = tanh [ W f c ( ( W c f x j + v f 1 ) ⊙ ( W d f d ^ i j + v f 2 ) ) ] . ( 5 . 2 )
6 Interaction Networks for Learning about Objects, Relations and Physics
[Battaglia P W, 2016, 6 ] 将图网络应用于自然科学中。IN模块为:I N ( G ) = ϕ O ( a ( G , X , ϕ R ( m ( G ) ) ) ) ; m ( G ) = B = { b k } k = 1 , ⋯ , N R ϕ R ( B ) = E = { e k } k = 1 , ⋯ , N R a ( G , X , E ) = C = { c j } j = 1 , ⋯ , N O ϕ O ( C ) = P = { p j } j = 1 , ⋯ , N O (6.1)
\begin{aligned}
IN(G) &= \phi_{O} \left( a \left( G, X, \phi_{R} \left( m(G) \right) \right) \right);\\
m(G) &= B = \{b_k\}_{k = 1, \cdots, N_R} \\
\phi_{R}(B) &= E = \{e_k\}_{k = 1, \cdots, N_R} \\
a(G,X,E) &= C = \{c_j\}_{j = 1, \cdots, N_O} \\
\phi_{O}(C) &= P = \{p_j\}_{j = 1, \cdots, N_O} \\
\end{aligned}
\tag{6.1}
I N ( G ) m ( G ) ϕ R ( B ) a ( G , X , E ) ϕ O ( C ) = ϕ O ( a ( G , X , ϕ R ( m ( G ) ) ) ) ; = B = { b k } k = 1 , ⋯ , N R = E = { e k } k = 1 , ⋯ , N R = C = { c j } j = 1 , ⋯ , N O = P = { p j } j = 1 , ⋯ , N O ( 6 . 1 )
记N O N_O N O 表示顶点的个数,N R N_R N R 表示边(关系)的个数,D S , D R D_S,D_R D S , D R 分别表示顶点、边的特征向量长度。那么O ∈ R D S × N O O \in \reals^{D_S \times N_O} O ∈ R D S × N O 表示所有顶点的特征列向量组成的矩阵,R a ∈ R D R × N R R_a \in \reals^{D_R \times N_R} R a ∈ R D R × N R 表示所有边的特征列向量组成的矩阵。将顶点编号,用0-1表示接受矩阵R r ∈ R N O × N R R_r \in \reals^{N_O \times N_R} R r ∈ R N O × N R 和发射矩阵R s ∈ R N O × N R R_s \in \reals^{N_O \times N_R} R s ∈ R N O × N R 。
用多层全连接实现ϕ R , ϕ O \phi_R,\phi_O ϕ R , ϕ O :B = m ( G ) = [ O R r ; O R s ; R a ] ∈ R ( 2 D S + D R ) × N R E = ϕ R ( B ) ∈ R D E × N R E ˉ = E R r T ∈ R D E × N O C = a ( G , X , E ) = [ O ; X ; E ˉ ] ∈ R ( D S + D X + D E ) × N O P = ϕ O ( C ) ∈ R D P × N O (6.2)
\begin{aligned}
B &= m(G) = [OR_r; OR_s; R_a] &\in \reals^{(2D_S + D_R) \times N_R} \\
E &= \phi_R(B) &\in \reals^{D_E \times N_R} \\
\bar{E} &= E R_r^T &\in \reals^{D_E \times N_O} \\
C &= a(G,X,E) = [O; X; \bar{E}] &\in \reals^{(D_S + D_X + D_E) \times N_O} \\
P &= \phi_O(C) &\in \reals^{D_P \times N_O} \\
\end{aligned}
\tag{6.2}
B E E ˉ C P = m ( G ) = [ O R r ; O R s ; R a ] = ϕ R ( B ) = E R r T = a ( G , X , E ) = [ O ; X ; E ˉ ] = ϕ O ( C ) ∈ R ( 2 D S + D R ) × N R ∈ R D E × N R ∈ R D E × N O ∈ R ( D S + D X + D E ) × N O ∈ R D P × N O ( 6 . 2 )
7 Geometric deep learning on graphs and manifolds using mixture model cnns
[F. Monti, 2017, 7 ] 提出了能再流形和图上混合使用的卷积。对于每个顶点x x x 的邻居点y ∈ N ( x ) y \in \mathcal{N}(x) y ∈ N ( x ) 都用伪坐标向量u ⃗ ( x , y ) \vec{u}(x,y) u ( x , y ) 将它们关联。定义由可学习参数Θ \Theta Θ 决定的权重函数w ⃗ Θ ( u ⃗ ) = ( w 1 ( u ⃗ ) , ⋯ , w J ( u ⃗ ) ) \vec{w}_{\Theta}(\vec{u}) = (w_1(\vec{u}), \cdots, w_J(\vec{u})) w Θ ( u ) = ( w 1 ( u ) , ⋯ , w J ( u ) ) :D j ( x ) f = ∑ y ∈ N ( x ) w j ( u ⃗ ( x , y ) ) f ( y ) , j = 1 , ⋯ , J . (7.1)
D_j(x)f = \sum_{y \in \mathcal{N}(x)} w_j(\vec{u}(x,y)) f(y), j = 1, \cdots, J.
\tag{7.1}
D j ( x ) f = y ∈ N ( x ) ∑ w j ( u ( x , y ) ) f ( y ) , j = 1 , ⋯ , J . ( 7 . 1 )
其中J J J 代表提取的patch的维数。原文里w j ( u ⃗ ) w_j(\vec{u}) w j ( u ) 选的是由可学习的Σ j , μ ⃗ j \Sigma_j,\vec{\mu}_j Σ j , μ j 决定的高斯核:w j ( u ⃗ ) = exp ( − 1 2 ( u ⃗ − μ ⃗ j ) T Σ j − 1 ( u ⃗ − μ ⃗ j ) ) . (7.2)
w_j(\vec{u}) = \exp \left( -\frac{1}{2} \left( \vec{u} - \vec{\mu}_j \right)^T \Sigma_j^{-1} \left( \vec{u} - \vec{\mu}_j \right) \right).
\tag{7.2}
w j ( u ) = exp ( − 2 1 ( u − μ j ) T Σ j − 1 ( u − μ j ) ) . ( 7 . 2 )
那么卷积操作为:( f ⋆ g ) ( x ) = ∑ j = 1 J g j D j ( x ) f . (7.3)
\left( f \star g \right)(x) = \sum_{j=1}^{J} g_j D_j(x) f.
\tag{7.3}
( f ⋆ g ) ( x ) = j = 1 ∑ J g j D j ( x ) f . ( 7 . 3 )
8 Molecular Graph Convolutions: Moving Beyond Fingerprints
[Kearnes S, 2017, 8 ] 提出了图上建模的三个性质:
Property 1 (Order invariance). The output of the model should be invariant to the order that the atom and bond information is encoded in the input.
Property 2 (Atom and pair permutation invariance). The values of an atom layer and pair permute with the original input layer order. More precisely, if the inputs are permuted with a permutation operator Q Q Q , then for all layers x x x , y y y , A x A^x A x and P y P^y P y are permuted with operator Q Q Q as well.
Property 3 (Pair order invariance). For all pair layers y y y , P ( a , b ) y = P ( b , a ) y P^y_{(a,b)} = P^y_{(b,a)} P ( a , b ) y = P ( b , a ) y
A x , p y A^x,p^y A x , p y 分别表示原子层(atom)、键对层(pair)的第x x x 、y y y 层的值。网络由4个主要操作组成:
(A → A A \rightarrow A A → A ) :
A a y = f ( A a x 1 , A a x 2 , ⋯ , A a x n ) . (8.1)
A_a^y = f \left( A_a^{x1}, A_a^{x2}, \cdots, A_a^{xn} \right).
\tag{8.1}
A a y = f ( A a x 1 , A a x 2 , ⋯ , A a x n ) . ( 8 . 1 )
(A → P A \rightarrow P A → P ) :
P a , b y = g ( f ( A a x , A b x ) , f ( A b x , A a x ) ) . (8.2)
P_{a,b}^y = g \left( f \left( A_{a}^{x}, A_{b}^{x} \right), f \left( A_{b}^{x}, A_{a}^{x} \right) \right).
\tag{8.2}
P a , b y = g ( f ( A a x , A b x ) , f ( A b x , A a x ) ) . ( 8 . 2 )
(P → A P \rightarrow A P → A ) :
A a y = g ( f ( P ( a , b ) x ) , f ( P ( a , c ) x ) , f ( P ( a , d ) x ) , ⋯ ) . (8.3)
A_{a}^y = g \left( f \left(P_{(a,b)}^{x}\right), f \left(P_{(a,c)}^{x}\right), f \left(P_{(a,d)}^{x}\right), \cdots \right).
\tag{8.3}
A a y = g ( f ( P ( a , b ) x ) , f ( P ( a , c ) x ) , f ( P ( a , d ) x ) , ⋯ ) . ( 8 . 3 )
(P → P P \rightarrow P P → P ) :
P a , b y = f ( P a , b x 1 , P a , b x 2 , ⋯ , P a , b x n ) . (8.4)
P_{a,b}^y = f \left( P_{a,b}^{x1}, P_{a,b}^{x2}, \cdots, P_{a,b}^{xn} \right).
\tag{8.4}
P a , b y = f ( P a , b x 1 , P a , b x 2 , ⋯ , P a , b x n ) . ( 8 . 4 )
其中f ( ⋅ ) , g ( ⋅ ) f(\cdot),g(\cdot) f ( ⋅ ) , g ( ⋅ ) 表示任意函数和任意的commutative函数。
9 Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
[Simonovsky M, 2017, 9 ] 用边的特征做构造顶点的卷积权重。对于边的特征L ( j , i ) ∈ R s L(j,i) \in \reals^{s} L ( j , i ) ∈ R s :Θ j i l = F l ( L ( j , i ) ) ∈ R d l × d l − 1 . (9.1)
\Theta_{ji}^{l} = F^l(L(j,i)) \quad \in \reals^{d_l \times d_{l-1}}.
\tag{9.1}
Θ j i l = F l ( L ( j , i ) ) ∈ R d l × d l − 1 . ( 9 . 1 )
卷积为:X l ( i ) = 1 ∣ N ( i ) ∣ ∑ j ∈ N ( i ) F l ( L ( j , i ) ) X l − 1 ( j ) + b l , = 1 ∣ N ( i ) ∣ ∑ j ∈ N ( i ) Θ j i l X l − 1 ( j ) + b l . (9.2)
\begin{aligned}
X^l(i) &= \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} F^l(L(j,i)) X^{l-1}(j) + b^l,\\
&= \frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \Theta_{ji}^{l} X^{l-1}(j) + b^l.
\end{aligned}
\tag{9.2}
X l ( i ) = ∣ N ( i ) ∣ 1 j ∈ N ( i ) ∑ F l ( L ( j , i ) ) X l − 1 ( j ) + b l , = ∣ N ( i ) ∣ 1 j ∈ N ( i ) ∑ Θ j i l X l − 1 ( j ) + b l . ( 9 . 2 )
原文提到的池化方法:
1.下采样或者合并顶点
2.创建新的边缘结构E ( h ) E^{(h)} E ( h ) 并标记L ( h ) L^{(h)} L ( h ) (所谓的约简)
3.将原来的顶点映射到新的顶点上M ( h ) : V ( h − 1 ) → V ( h ) M^{(h)} : V^{(h-1)} \rightarrow V^{(h)} M ( h ) : V ( h − 1 ) → V ( h ) 。
10 Neural Message Passing for Quantum Chemistry
[Gilmer J, 2017, 10 ] 提出的是一个MPNN框架,将GNN分成了两个阶段:消息传播阶段和读出阶段。
消息传播阶段又由两部分组成,一是将顶点的邻居顶点和相连边的特征聚合M t ( ⋅ ) M_t(\cdot) M t ( ⋅ ) (message function),二是对顶点更新U t ( ⋅ ) U_t(\cdot) U t ( ⋅ ) (vertex update function):m v ( t + 1 ) = ∑ w ∈ N ( v ) M t ( h v ( t ) , h w ( t ) , e v w ) , h v ( t + 1 ) = U t ( h v ( t ) , m v ( t + 1 ) ) . (10.1)
\begin{aligned}
m_v^{(t+1)} &= \sum_{w \in \mathcal{N}(v)} M_t \left( h_v^{(t)}, h_w^{(t)}, e_{vw} \right), \\
h_v^{(t+1)} &= U_t \left( h_v^{(t)}, m_v^{(t+1)} \right).
\end{aligned}
\tag{10.1}
m v ( t + 1 ) h v ( t + 1 ) = w ∈ N ( v ) ∑ M t ( h v ( t ) , h w ( t ) , e v w ) , = U t ( h v ( t ) , m v ( t + 1 ) ) . ( 1 0 . 1 )
对图上每个顶点最后使用读出函数R ( ⋅ ) R(\cdot) R ( ⋅ ) :y ^ = R ( { h v ( T ) ∣ v ∈ V G } ) . (10.2)
\hat{y} = R \left( \{ h_v^{(T)} | v \in \mathcal{V}_G \}\right).
\tag{10.2}
y ^ = R ( { h v ( T ) ∣ v ∈ V G } ) . ( 1 0 . 2 )
11 Inductive Representation Learning on Large Graphs
[Hamilton W L, 2017, 11 ] 提出的方法叫GraphSAGE,可扩展性更强,对于节点分类和链接预测问题的表现也比较突出。同样,GraphSAGE也可以用MPNN框架描述:
h N ( v ) ( k ) ← AGGREGATE k ( h u ( k − 1 ) , ∀ u ∈ N ( v ) ) . (11.1)
h_{\mathcal{N}(v)}^{(k)} \leftarrow \text{AGGREGATE}_k \left( h_u^{(k-1)}, \forall u \in \mathcal{N}(v) \right).
\tag{11.1}
h N ( v ) ( k ) ← AGGREGATE k ( h u ( k − 1 ) , ∀ u ∈ N ( v ) ) . ( 1 1 . 1 )
h v ( k ) ← σ ( W ( k ) ⋅ CONCAT ( h v ( k − 1 ) , h N ( v ) ( k ) ) ) . h v ( k ) ← h v ( k ) ∥ h v ( k ) ∥ 2 , ∀ v ∈ V . (11.2)
\begin{aligned}
h_v^{(k)} & \leftarrow \sigma \left( W^{(k)} \cdot \text{CONCAT} \left( h_v^{(k-1)}, h_{\mathcal{N}(v)}^{(k)} \right)\right). \\
h_v^{(k)} & \leftarrow \frac{h_v^{(k)}}{\| h_v^{(k)} \|_2}, \quad \forall v \in \mathcal{V}.
\end{aligned}
\tag{11.2}
h v ( k ) h v ( k ) ← σ ( W ( k ) ⋅ CONCAT ( h v ( k − 1 ) , h N ( v ) ( k ) ) ) . ← ∥ h v ( k ) ∥ 2 h v ( k ) , ∀ v ∈ V . ( 1 1 . 2 )
z v ← h v ( K ) , ∀ v ∈ V . (11.3)
z_v \leftarrow h_v^{(K)}, \quad \forall v \in \mathcal{V}.
\tag{11.3}
z v ← h v ( K ) , ∀ v ∈ V . ( 1 1 . 3 )
原文给出了三种聚合函数:
h v ( k ) ← σ ( W ⋅ MEAN ( { h v ( k − 1 ) } ∪ { h u ( k − 1 ) , ∀ u ∈ N ( v ) } ) ) . (11.4)
h_v^{(k)} \leftarrow \sigma \left( W \cdot \text{MEAN} \left( \{ h_v^{(k-1)} \} \cup \{ h_u^{(k-1)}, \forall u \in \mathcal{N}(v) \} \right) \right).
\tag{11.4}
h v ( k ) ← σ ( W ⋅ MEAN ( { h v ( k − 1 ) } ∪ { h u ( k − 1 ) , ∀ u ∈ N ( v ) } ) ) . ( 1 1 . 4 )
LSTM aggregator.
Pooling aggregator:
AGGREGATE k pool = max ( { σ ( W pool h u i k + b ) , ∀ u i ∈ N ( v ) } ) . (11.5)
\text{AGGREGATE}_k^{\text{pool}} = \max \left( \left\{ \sigma \left( W_{\text{pool}} h_{u_i}^k + b \right), \forall u_i \in \mathcal{N}(v) \right\} \right).
\tag{11.5}
AGGREGATE k pool = max ( { σ ( W pool h u i k + b ) , ∀ u i ∈ N ( v ) } ) . ( 1 1 . 5 )
其中W pool W_{\text{pool}} W pool 是可学习的参数。
12 Robust Spatial Filtering With Graph Convolutional Neural Networks
[Such F P, 2017, 12 ] 提出的网络能够适用于同质或异质的图数据集。
对于不同的特征可以定义不同的邻接矩阵,将这些邻接矩阵组成A = ( A 1 , ⋯ , A L ) ∈ R N × N × L \mathcal{A} = (A_1,\cdots, A_L) \in \reals^{N \times N \times L} A = ( A 1 , ⋯ , A L ) ∈ R N × N × L 。
卷积核H ∈ R N × N × C H \in \reals^{N \times N \times C} H ∈ R N × N × C 为:H ( c ) ≈ ∑ l = 1 L h l ( c ) A l . (12.1)
H^{(c)} \approx \sum_{l=1}^{L} h_l^{(c)} A_l.
\tag{12.1}
H ( c ) ≈ l = 1 ∑ L h l ( c ) A l . ( 1 2 . 1 )
卷积操作为:V out = ∑ c = 1 C H ( c ) V in ( c ) + b . (12.2)
V_{\text{out}} = \sum_{c=1}^{C} H^{(c)} V_{\text{in}}^{(c)} + b.
\tag{12.2}
V out = c = 1 ∑ C H ( c ) V in ( c ) + b . ( 1 2 . 2 )
图嵌入池化过程。为了得到V e m b ∈ R N × N ′ V_{emb} \in \reals^{N \times N^{'}} V e m b ∈ R N × N ′ 使用卷积核H e m b ∈ R N × N × C × N ′ H_{emb} \in \reals^{N \times N \times C \times N^{'}} H e m b ∈ R N × N × C × N ′ :V e m b ( n ′ ) = ∑ c C H e m b ( c , n ′ ) V i n ( c ) + b , V e m b ∗ = σ ( V e m b ) , V o u t = V e m b ∗ T V i n , A o u t = V e m b ∗ T A i n V e m b ∗ (12.3)
\begin{aligned}
V_{emb}^{(n^{'})} &= \sum_{c}^{C} H_{emb}^{(c,n^{'})} V_{in}^{(c)} + b, \\
V_{emb}^{*} &= \sigma (V_{emb}), \\
V_{out} &= V_{emb}^{*T} V_{in}, \\
A_{out} &= V_{emb}^{*T} A_{in} V_{emb}^{*}
\end{aligned}
\tag{12.3}
V e m b ( n ′ ) V e m b ∗ V o u t A o u t = c ∑ C H e m b ( c , n ′ ) V i n ( c ) + b , = σ ( V e m b ) , = V e m b ∗ T V i n , = V e m b ∗ T A i n V e m b ∗ ( 1 2 . 3 )
13 Large-Scale Learnable Graph Convolutional Networks
[Gao H, 2018, 13 ] 同样是在顶点特征向量上使用传统的CNN。
设第l l l 层的图顶点特征向量长度为F l F_l F l ,每个顶点选择k k k 个邻居点。
卷积过程:
将每个顶点的邻居N \mathcal{N} N 的特征向量拼接成张量R ∣ N ∣ × F l \reals^{|\mathcal{N}| \times F_l } R ∣ N ∣ × F l ;
将张量R ∣ N ∣ × F l \reals^{|\mathcal{N}| \times F_l } R ∣ N ∣ × F l 的每列从大到小排序,然后截取前k k k 个,得R k × F l \reals^{k \times F_l } R k × F l ;
再和顶点的特征向量拼接为R ( k + 1 ) × F l \reals^{ (k+1) \times F_l } R ( k + 1 ) × F l ;
C O N V 1 : R ( k + 1 ) × F l → R ( k 2 + 1 ) × k CONV_1 : \reals^{ (k+1) \times F_l } \rightarrow \reals^{ (\frac{k}{2}+1) \times k } C O N V 1 : R ( k + 1 ) × F l → R ( 2 k + 1 ) × k ;
C O N V 2 : R ( k 2 + 1 ) × k → R 1 × F l + 1 CONV_2 : \reals^{ (\frac{k}{2}+1) \times k } \rightarrow \reals^{ 1 \times F_{l+1} } C O N V 2 : R ( 2 k + 1 ) × k → R 1 × F l + 1 。
14 Signed Graph Convolutional Network
[Derr T, 2018, 14 ] 提出了在有符号的图数据上建立神经网络。在无符号的图中卷积一般是聚合邻居顶点的信息,但是由于在有符号的图上有正有负,含义不同,需要将图分成平衡路径和非平衡路径。
分解利用了平衡理论,直观的讲就是:1)朋友的朋友,就是我的朋友,2)敌人的朋友就是我的敌人。 平衡的部分包含偶数条负连接,反之,包含奇数条负链接,被认为是不平衡。见下图
分解后,分别做卷积聚合,平衡部分:h i B ( l ) = { σ ( W B ( 1 ) [ ∑ j ∈ N i + h j ( 0 ) ∣ N i + ∣ , h i ( 0 ) ] ) l = 1 ; σ ( W B ( l ) [ ∑ j ∈ N i + h j B ( l − 1 ) ∣ N i + ∣ , ∑ k ∈ N i − h k U ( l − 1 ) ∣ N i − ∣ , h i B ( l − 1 ) ] ) l ≠ 1. (14.1)
\large
h_i^{B(l)} =
\begin{cases}
\sigma \left( W^{B(1)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{(0)}}{|\mathcal{N}_{i}^{+}|}, h_i^{(0)} \right] \right) \quad l = 1;\\
\sigma \left( W^{B(l)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{B(l-1)}}{|\mathcal{N}_{i}^{+}|}, \sum_{k \in \mathcal{N}_{i}^{-}} \frac{h_k^{U(l-1)}}{|\mathcal{N}_{i}^{-}|}, h_i^{B(l-1)} \right] \right) \quad l \neq 1.
\end{cases}
\tag{14.1}
h i B ( l ) = ⎩ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎧ σ ( W B ( 1 ) [ ∑ j ∈ N i + ∣ N i + ∣ h j ( 0 ) , h i ( 0 ) ] ) l = 1 ; σ ( W B ( l ) [ ∑ j ∈ N i + ∣ N i + ∣ h j B ( l − 1 ) , ∑ k ∈ N i − ∣ N i − ∣ h k U ( l − 1 ) , h i B ( l − 1 ) ] ) l = 1 . ( 1 4 . 1 )
非平衡部分:h i U ( l ) = { σ ( W U ( 1 ) [ ∑ j ∈ N i − h j ( 0 ) ∣ N i − ∣ , h i ( 0 ) ] ) l = 1 ; σ ( W U ( l ) [ ∑ j ∈ N i + h j U ( l − 1 ) ∣ N i + ∣ , ∑ k ∈ N i − h k B ( l − 1 ) ∣ N i − ∣ , h i U ( l − 1 ) ] ) l ≠ 1. (14.2)
\large
h_i^{U(l)} =
\begin{cases}
\sigma \left( W^{U(1)} \left[ \sum_{j \in \mathcal{N}_{i}^{-}} \frac{h_j^{(0)}}{|\mathcal{N}_{i}^{-}|}, h_i^{(0)} \right] \right) \quad l = 1;\\
\sigma \left( W^{U(l)} \left[ \sum_{j \in \mathcal{N}_{i}^{+}} \frac{h_j^{U(l-1)}}{|\mathcal{N}_{i}^{+}|}, \sum_{k \in \mathcal{N}_{i}^{-}} \frac{h_k^{B(l-1)}}{|\mathcal{N}_{i}^{-}|}, h_i^{U(l-1)} \right] \right) \quad l \neq 1.
\end{cases}
\tag{14.2}
h i U ( l ) = ⎩ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎧ σ ( W U ( 1 ) [ ∑ j ∈ N i − ∣ N i − ∣ h j ( 0 ) , h i ( 0 ) ] ) l = 1 ; σ ( W U ( l ) [ ∑ j ∈ N i + ∣ N i + ∣ h j U ( l − 1 ) , ∑ k ∈ N i − ∣ N i − ∣ h k B ( l − 1 ) , h i U ( l − 1 ) ] ) l = 1 . ( 1 4 . 2 )
读出:z i ← [ h i B ( L ) , h i U ( L ) ] , ∀ u i ∈ U . (14.3)
z_i \leftarrow [h_i^{B(L)}, h_i^{U(L)}], \quad \forall u_i \in \mathcal{U}.
\tag{14.3}
z i ← [ h i B ( L ) , h i U ( L ) ] , ∀ u i ∈ U . ( 1 4 . 3 )
15 GeniePath: Graph Neural Networks with Adaptive Receptive Paths
[Liu Z, 2018, 15 ] 提出了在广度和深度自适应选择顶点经行特征聚合。
基于GAT学习一跳邻居(隐)特征的重要性,决定继续探索的方向:h i ( tmp ) = tanh ( W ( t ) T ∑ j ∈ N ( i ) ∪ { i } α ( h i ( t ) , h j ( t ) ) ⋅ h j ( t ) ) . (15.1)
h_i^{(\text{tmp})} = \tanh \left( {W^{(t)}}^{T} \sum_{j \in \mathcal{N}(i) \cup \{ i \}} \alpha \left( h_i^{(t)}, h_j^{(t)} \right) \cdot h_j^{(t)} \right).
\tag{15.1}
h i ( tmp ) = tanh ⎝ ⎛ W ( t ) T j ∈ N ( i ) ∪ { i } ∑ α ( h i ( t ) , h j ( t ) ) ⋅ h j ( t ) ⎠ ⎞ . ( 1 5 . 1 )
其中α ( h i ( t ) , h j ( t ) ) \alpha \left( h_i^{(t)}, h_j^{(t)} \right) α ( h i ( t ) , h j ( t ) ) 是注意力机制:α ( x , y ) = softmax y ( v T tanh ( W s T x + W d T y ) ) , softmax y ( ⋅ , y ) = exp f ( ⋅ , y ) ∑ y ′ exp f ( ⋅ , y ′ ) . (15.2)
\begin{aligned}
\alpha (x,y) &= \text{softmax}_y \left( v^T \tanh \left( W_s^T x + W_d^T y \right) \right), \\
\text{softmax}_y(\cdot, y) &= \frac{\exp f(\cdot,y)}{\sum_{y^{'}} \exp f(\cdot, y^{'})}.
\end{aligned}
\tag{15.2}
α ( x , y ) softmax y ( ⋅ , y ) = softmax y ( v T tanh ( W s T x + W d T y ) ) , = ∑ y ′ exp f ( ⋅ , y ′ ) exp f ( ⋅ , y ) . ( 1 5 . 2 )
i i = σ ( W i ( t ) T h i ( tmp ) ) , f i = σ ( W f ( t ) T h i ( tmp ) ) , o i = σ ( W o ( t ) T h i ( tmp ) ) , C ~ = tanh ( W c ( t ) T h i ( tmp ) ) , C i ( t + 1 ) = f i ⊙ C i ( t ) + i i ⊙ C ~ , h t ( t + 1 ) = o i ⊙ tanh ( C i ( t + 1 ) ) . (15.3)
\begin{aligned}
i_i &= \sigma \left( {W_i^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\
f_i &= \sigma \left( {W_f^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\
o_i &= \sigma \left( {W_o^{(t)}}^{T} h_i^{(\text{tmp})} \right), \\
\widetilde{C} &= \tanh \left( {W_c^{(t)}}^{T} h_i^{(\text{tmp})} \right),\\
C_i^{(t+1)} &= f_i \odot C_i^{(t)} + i_i \odot \widetilde{C}, \\
h_t^{(t+1)} &= o_i \odot \tanh(C_i^{(t+1)}).
\end{aligned}
\tag{15.3}
i i f i o i C C i ( t + 1 ) h t ( t + 1 ) = σ ( W i ( t ) T h i ( tmp ) ) , = σ ( W f ( t ) T h i ( tmp ) ) , = σ ( W o ( t ) T h i ( tmp ) ) , = tanh ( W c ( t ) T h i ( tmp ) ) , = f i ⊙ C i ( t ) + i i ⊙ C , = o i ⊙ tanh ( C i ( t + 1 ) ) . ( 1 5 . 3 )
原文给出了一个变体:μ i ( 0 ) = W x T X i , i i = σ ( W i ( t ) T CONCAT ( h i ( t ) , μ i ( t ) ) ) , f i = σ ( W f ( t ) T CONCAT ( h i ( t ) , μ i ( t ) ) ) , o i = σ ( W o ( t ) T CONCAT ( h i ( t ) , μ i ( t ) ) ) , C ~ = tanh ( W c ( t ) T CONCAT ( h i ( t ) , μ i ( t ) ) ) , C i ( t + 1 ) = f i ⊙ C i ( t ) + i i ⊙ C ~ , μ t ( t + 1 ) = o i ⊙ tanh ( C i ( t + 1 ) ) .
\begin{aligned}
\mu_i^{(0)} &= W_x^T X_i,\\
i_i &= \sigma \left( {W_i^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\
f_i &= \sigma \left( {W_f^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\
o_i &= \sigma \left( {W_o^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right), \\
\widetilde{C} &= \tanh \left( {W_c^{(t)}}^{T} \text{CONCAT} \left( h_i^{(t)}, \mu_i^{(t)} \right) \right),\\
C_i^{(t+1)} &= f_i \odot C_i^{(t)} + i_i \odot \widetilde{C}, \\
\mu_t^{(t+1)} &= o_i \odot \tanh(C_i^{(t+1)}).
\end{aligned}
μ i ( 0 ) i i f i o i C C i ( t + 1 ) μ t ( t + 1 ) = W x T X i , = σ ( W i ( t ) T CONCAT ( h i ( t ) , μ i ( t ) ) ) , = σ ( W f ( t ) T CONCAT ( h i ( t ) , μ i ( t ) ) ) , = σ ( W o ( t ) T CONCAT ( h i ( t ) , μ i ( t ) ) ) , = tanh ( W c ( t ) T CONCAT ( h i ( t ) , μ i ( t ) ) ) , = f i ⊙ C i ( t ) + i i ⊙ C , = o i ⊙ tanh ( C i ( t + 1 ) ) .
16 Fast Learning with Graph Convolutional Networks via Importance Sampling
[Chen J, 2018, 16 ] 为了加快训练速度,提出了采样方法,即选取一部分顶点参与训练。
下图算法是随机采样:
下图是论文中提出的采用重要性采样,原文证明能有更小的方差:
其中的采样概率为:q ( u ) = ∥ A ^ ( : , u ) ∥ 2 ∑ u ′ ∈ V ∥ A ^ ( : , u ′ ) ∥ 2 , u ∈ V . (16.1)
q(u) = \frac{\| \hat{A}(:,u) \|^2}{ \sum_{u^{'} \in \mathcal{V}} \| \hat{A}(:,u^{'}) \|^2}, \quad u \in \mathcal{V}.
\tag{16.1}
q ( u ) = ∑ u ′ ∈ V ∥ A ^ ( : , u ′ ) ∥ 2 ∥ A ^ ( : , u ) ∥ 2 , u ∈ V . ( 1 6 . 1 )
17 Stochastic Training of Graph Convolutional Networks with Variance Reduction
[Chen J, 2018, 17 ] 证明[Chen J, 2018, 16 ]的重要性采样方法在实践中并不好,因为有的顶点可能有很多采样点而另一些顶点可能没有采样点。
A ^ = A + I N , D ^ v v = ∑ u A ^ u v , P = D ^ − 1 2 A ^ D ^ − 1 2 \hat{A} = A + I_N, \hat{D}_{vv} = \sum_u \hat{A}_{uv}, P = \hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}} A ^ = A + I N , D ^ v v = ∑ u A ^ u v , P = D ^ − 2 1 A ^ D ^ − 2 1 。P ^ \hat{P} P ^ 是P P P 的无偏估计P ^ u v ( l ) = ∣ N ( u ) ∣ D ( l ) P u v \hat{P}_{uv}^{(l)} = \frac{|\mathcal{N}(u)|}{D^{(l)}} P_{uv} P ^ u v ( l ) = D ( l ) ∣ N ( u ) ∣ P u v ,原文将h v ( l ) h_v^{(l)} h v ( l ) 分解成h v ( l ) = Δ h v ( l ) + h ˉ v ( l ) h_v^{(l)} = \Delta h_v^{(l)} + \bar{h}_v^{(l)} h v ( l ) = Δ h v ( l ) + h ˉ v ( l ) 。原始的卷积中h v ( l ) h_v^{(l)} h v ( l ) 的递归计算导致计算量过大,[Chen J, 2018, 17 ]将每层得到的h v ( l ) h_v^{(l)} h v ( l ) 计算h ˉ v ( l ) \bar{h}_v^{(l)} h ˉ v ( l ) 用于保存做历史的近似。( P H ( l ) ) u = ∑ v ∈ N ( u ) P u v Δ h v ( l ) + ∑ v ∈ N ( u ) P u v h ˉ v ( l ) , ≈ ∣ N ( u ) ∣ D ( l ) ∑ v ∈ N ^ ( u ) P u v Δ h v ( l ) + ∑ v ∈ N ( u ) P u v h ˉ v ( l ) .
\begin{aligned}
(PH^{(l)})_u &= \sum_{v \in \mathcal{N}(u)} P_{uv} \Delta h_v^{(l)} + \sum_{v \in \mathcal{N}(u)} P_{uv} \bar{h}_v^{(l)}, \\
&\approx \frac{|\mathcal{N}(u)|}{D^{(l)}} \sum_{v \in \hat{\mathcal{N}}(u)} P_{uv} \Delta h_v^{(l)} + \sum_{v \in \mathcal{N}(u)} P_{uv} \bar{h}_v^{(l)}.
\end{aligned}
( P H ( l ) ) u = v ∈ N ( u ) ∑ P u v Δ h v ( l ) + v ∈ N ( u ) ∑ P u v h ˉ v ( l ) , ≈ D ( l ) ∣ N ( u ) ∣ v ∈ N ^ ( u ) ∑ P u v Δ h v ( l ) + v ∈ N ( u ) ∑ P u v h ˉ v ( l ) .
也就是,随机选取近邻节点所得到的项是Δ h v ( l ) \Delta h_v^{(l)} Δ h v ( l ) 。
卷积为:Z ( l + 1 ) = ( P ^ ( l ) ( H ( l ) − H ˉ ( l ) ) + P H ˉ ( l ) ) W ( l ) (17.1)
Z^{(l+1)} = \left( \hat{P}^{(l)} \left( H^{(l)} - \bar{H}^{(l)} \right) + P \bar{H}^{(l)} \right) W^{(l)}
\tag{17.1}
Z ( l + 1 ) = ( P ^ ( l ) ( H ( l ) − H ˉ ( l ) ) + P H ˉ ( l ) ) W ( l ) ( 1 7 . 1 )
Stochastic GCN:
随机选择一个小批量顶点集V B ∈ V L \mathcal{V}_B \in \mathcal{V}_L V B ∈ V L ;
建立一个仅包含当前小批量所需的**h v ( l ) h_v^{(l)} h v ( l ) 和h ˉ v ( l ) \bar{h}_v^{(l)} h ˉ v ( l ) 的计算图;
通过等式(17.1)正向传播获得预测;
通过向后传播获取梯度,并通过SGD更新参数;
更新历史**值。
18 Adaptive Sampling Towards Fast Graph Representation Learning
[Huang W, 2018, 18 ]
一般的卷积:h ( l + 1 ) ( v i ) = σ ( ∑ j = 1 N a ^ ( v i , u j ) h ( l ) ( u j ) W ( l ) ) , i = 1 , ⋯ , N . (18.1)
h^{(l+1)}(v_i) = \sigma \left( \sum_{j=1}^{N} \hat{a}(v_i,u_j) h^{(l)}(u_j) W^{(l)} \right), \quad i = 1, \cdots, N.
\tag{18.1}
h ( l + 1 ) ( v i ) = σ ( j = 1 ∑ N a ^ ( v i , u j ) h ( l ) ( u j ) W ( l ) ) , i = 1 , ⋯ , N . ( 1 8 . 1 )
将式(18.1)改写为:h ( l + 1 ) ( v i ) = σ ( ( ∑ j = 1 N a ^ ( v i , u j ) ) ( ∑ j = 1 N a ^ ( v i , u j ) ∑ j = 1 N a ^ ( v i , u j ) ) h ( l ) ( u j ) W ( l ) ) , , = N ( v i ) : = ∑ j = 1 N a ^ ( v i , u j ) σ ( N ( v i ) ( ∑ j = 1 N a ^ ( v i , u j ) N ( v i ) ) h ( l ) ( u j ) W ( l ) ) , = p ( u j ∣ v i ) : = a ^ ( v i , u j ) N ( v i ) σ ( N ( v i ) ( ∑ j = 1 N p ( u j ∣ v i ) h ( l ) ( u j ) ) W ( l ) ) , = σ ( N ( v i ) E p ( u j ∣ v i ) [ h ( l ) ( u j ) ] W ( l ) ) , i = 1 , ⋯ , N . (18.2)
\begin{aligned}
h^{(l+1)}(v_i) &= \sigma \left( \left( \sum_{j=1}^{N} \hat{a}(v_i,u_j) \right) \left( \sum_{j=1}^{N} \frac{\hat{a}(v_i,u_j)}{ \sum_{j=1}^{N} \hat{a}(v_i,u_j) } \right) h^{(l)}(u_j) W^{(l)} \right), ,\\
&\overset{N(v_i):= \sum_{j=1}^{N} \hat{a}(v_i,u_j)}{=} \sigma \left( N(v_i) \left( \sum_{j=1}^{N} \frac{\hat{a}(v_i,u_j)}{ N(v_i) } \right) h^{(l)}(u_j) W^{(l)} \right), \\
&\overset{p(u_j | v_i):= \frac{\hat{a}(v_i,u_j)}{ N(v_i) }}{=} \sigma \left( N(v_i) \left( \sum_{j=1}^{N} p(u_j | v_i) h^{(l)}(u_j) \right) W^{(l)} \right), \\
&= \sigma \left( N(v_i) \mathbb{E}_{p(u_j | v_i)} \left[ h^{(l)}(u_j) \right] W^{(l)} \right), \quad i = 1, \cdots, N.\\
\end{aligned}
\tag{18.2}
h ( l + 1 ) ( v i ) = σ ( ( j = 1 ∑ N a ^ ( v i , u j ) ) ( j = 1 ∑ N ∑ j = 1 N a ^ ( v i , u j ) a ^ ( v i , u j ) ) h ( l ) ( u j ) W ( l ) ) , , = N ( v i ) : = ∑ j = 1 N a ^ ( v i , u j ) σ ( N ( v i ) ( j = 1 ∑ N N ( v i ) a ^ ( v i , u j ) ) h ( l ) ( u j ) W ( l ) ) , = p ( u j ∣ v i ) : = N ( v i ) a ^ ( v i , u j ) σ ( N ( v i ) ( j = 1 ∑ N p ( u j ∣ v i ) h ( l ) ( u j ) ) W ( l ) ) , = σ ( N ( v i ) E p ( u j ∣ v i ) [ h ( l ) ( u j ) ] W ( l ) ) , i = 1 , ⋯ , N . ( 1 8 . 2 )
使用Monte-Carlo sampling,μ p ( v i ) = E p ( u j ∣ v i ) [ h ( l ) ( u j ) ] \mu_p(v_i) = \mathbb{E}_{p(u_j | v_i)} \left[ h^{(l)}(u_j) \right] μ p ( v i ) = E p ( u j ∣ v i ) [ h ( l ) ( u j ) ] :μ ^ p ( v i ) = 1 n ∑ j = 1 n h ( l ) ( u ^ j ) , u ^ j ∼ p ( u j ∣ v i ) . (18.3)
\hat{\mu}_p(v_i) = \frac{1}{n} \sum_{j=1}^{n} h^{(l)} (\hat{u}_j), \quad \hat{u}_j \sim p(u_j | v_i).
\tag{18.3}
μ ^ p ( v i ) = n 1 j = 1 ∑ n h ( l ) ( u ^ j ) , u ^ j ∼ p ( u j ∣ v i ) . ( 1 8 . 3 )
Layer-Wise Sampling :
同样地:h ( l + 1 ) ( v i ) = σ ( N ( v i ) E q ( u j ∣ v 1 , ⋯ , v n ) [ p ( u j ∣ v i ) q ( u j ∣ v 1 , ⋯ , v n ) h ( l ) ( u j ) ] W ( l ) ) , i = 1 , ⋯ , N . (18.4)
h^{(l+1)}(v_i) = \sigma \left( N(v_i) \mathbb{E}_{q(u_j | v_1,\cdots,v_n)} \left[ \frac{p(u_j | v_i)}{q(u_j | v_1,\cdots,v_n)} h^{(l)}(u_j) \right] W^{(l)} \right), \quad i = 1, \cdots, N.
\tag{18.4}
h ( l + 1 ) ( v i ) = σ ( N ( v i ) E q ( u j ∣ v 1 , ⋯ , v n ) [ q ( u j ∣ v 1 , ⋯ , v n ) p ( u j ∣ v i ) h ( l ) ( u j ) ] W ( l ) ) , i = 1 , ⋯ , N . ( 1 8 . 4 )
同样Monte-Carlo sampling,μ q ( v i ) = E q ( u j ∣ v 1 , ⋯ , v n ) [ p ( u j ∣ v i ) q ( u j ∣ v 1 , ⋯ , v n ) h ( l ) ( u j ) ] \mu_q(v_i) = \mathbb{E}_{q(u_j | v_1,\cdots,v_n)} \left[ \frac{p(u_j | v_i)}{q(u_j | v_1,\cdots,v_n)} h^{(l)}(u_j) \right] μ q ( v i ) = E q ( u j ∣ v 1 , ⋯ , v n ) [ q ( u j ∣ v 1 , ⋯ , v n ) p ( u j ∣ v i ) h ( l ) ( u j ) ] :μ ^ q ( v i ) = 1 n ∑ j = 1 n p ( u ^ j ∣ v i ) q ( u ^ j ∣ v 1 , ⋯ , v n ) h ( l ) ( u ^ j ) , μ ^ j ∼ q ( u ^ j ∣ v 1 , ⋯ , v n ) . (18.5)
\hat{\mu}_q(v_i) = \frac{1}{n} \sum_{j=1}^{n} \frac{p(\hat{u}_j | v_i)}{q(\hat{u}_j | v_1,\cdots,v_n)} h^{(l)} (\hat{u}_j), \quad \hat{\mu}_j \sim q(\hat{u}_j | v_1,\cdots,v_n).
\tag{18.5}
μ ^ q ( v i ) = n 1 j = 1 ∑ n q ( u ^ j ∣ v 1 , ⋯ , v n ) p ( u ^ j ∣ v i ) h ( l ) ( u ^ j ) , μ ^ j ∼ q ( u ^ j ∣ v 1 , ⋯ , v n ) . ( 1 8 . 5 )
原文给了q ( u j ) q(u_j) q ( u j ) 最佳取法:q ∗ ( u j ) = ∑ i = 1 n p ( u j ∣ v i ) ∣ g ( x ( u j ) ) ∣ ∑ j = 1 N ∑ i = 1 n p ( u j ∣ v i ) ∣ g ( x ( u j ) ) ∣ . (18.6)
q^{*}(u_j) = \frac{ \sum_{i=1}^{n} p(u_j | v_i) | g(x(u_j)) | }{ \sum_{j=1}^{N} \sum_{i=1}^{n} p(u_j | v_i) | g(x(u_j)) | }.
\tag{18.6}
q ∗ ( u j ) = ∑ j = 1 N ∑ i = 1 n p ( u j ∣ v i ) ∣ g ( x ( u j ) ) ∣ ∑ i = 1 n p ( u j ∣ v i ) ∣ g ( x ( u j ) ) ∣ . ( 1 8 . 6 )
其中g ( ⋅ ) g(\cdot) g ( ⋅ ) 是线性行数。
参考文献
1 Duvenaud D, Maclaurin D, Aguileraiparraguirre J, et al. Convolutional networks on graphs for learning molecular fingerprints[C]. neural information processing systems, 2015: 2224-2232.
2 Pham T, Tran T, Phung D, et al. Column Networks for Collective Classification[C]. national conference on artificial intelligence, 2017: 2485-2491.
3 Niepert M, Ahmed M H, Kutzkov K, et al. Learning Convolutional Neural Networks for Graphs[J]. arXiv: Learning, 2016.
4 Atwood J, Towsley D. Diffusion-Convolutional Neural Networks[C]. neural information processing systems, 2016: 1993-2001.
5 Schutt K T, Arbabzadah F, Chmiela S, et al. Quantum-chemical insights from deep tensor neural networks[J]. Nature Communications, 2017, 8(1): 13890-13890.
6 Battaglia P W, Pascanu R, Lai M, et al. Interaction networks for learning about objects, relations and physics[C]. neural information processing systems, 2016: 4509-4517.
7 F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M.Bronstein, “Geometric deep learning on graphs and manifolds using mixture model cnns,” in Proc. of CVPR, 2017, pp. 5115–5124.
8 Kearnes S, Mccloskey K, Berndl M, et al. Molecular graph convolutions: moving beyond fingerprints[J]. Journal of Computer-aided Molecular Design, 2016, 30(8): 595-608.
9 Simonovsky M, Komodakis N. Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs[C]. computer vision and pattern recognition, 2017: 29-38.
10 Gilmer J, Schoenholz S S, Riley P, et al. Neural Message Passing for Quantum Chemistry[C]. international conference on machine learning, 2017: 1263-1272.
11 Hamilton W L, Ying Z, Leskovec J, et al. Inductive Representation Learning on Large Graphs[C]. neural information processing systems, 2017: 1024-1034.
12 Such F P, Sah S, Dominguez M, et al. Robust Spatial Filtering With Graph Convolutional Neural Networks[J]. IEEE Journal of Selected Topics in Signal Processing, 2017, 11(6): 884-896.
13 Gao H, Wang Z, Ji S, et al. Large-Scale Learnable Graph Convolutional Networks[C]. knowledge discovery and data mining, 2018: 1416-1424.
14 Derr T, Ma Y, Tang J, et al. Signed Graph Convolutional Network.[J]. arXiv: Social and Information Networks, 2018.
15 Liu Z, Chen C, Li L, et al. GeniePath: Graph Neural Networks with Adaptive Receptive Paths.[J]. arXiv: Learning, 2018.
16 Chen J, Ma T, Xiao C, et al. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling[C]. international conference on learning representations, 2018.
17 Chen J, Zhu J, Song L, et al. Stochastic Training of Graph Convolutional Networks with Variance Reduction[C]. international conference on machine learning, 2018: 941-949.
18 Huang W, Zhang T, Rong Y, et al. Adaptive Sampling Towards Fast Graph Representation Learning[C]. neural information processing systems, 2018: 4558-4567.