Paper Reading 2017/04/18 Yuan Xin
Paper Unitary Evolution Recurrent Neural Networks (ICML 16, Bengio et al) Learning Convolutional Neural Networks for Graphs(ICML 16, NEC Lab)
Unitary Evolution Recurrent Neural Networks RNNs difficult to train Learns a unitary weight matrix Parametrizing unitary matrices require expensive computations construct an expressive unitary weight matrix by composing several structured matrices Recurrent neural networks (RNNs) are notoriously difficult to train.主要是因为特征值偏离1时候,指数级别的增长或者衰减。Vanishing and exploding gradients, especially when long-term dependencies. learns a unitary weight matrix
Orthogonal Weights and Bounding the Long-Term Gradient By RNNs by the chain rule 根据矩阵乘积范数小于矩阵范数乘积 where
Orthogonal Weights and Bounding the Long-Term Gradient We prove: When ||𝐷 𝑘 || is not 1, vanish or explode. linear unit (ReLU) nonlinearity an attractive choice Dk是激活值导数的最大值。 Dk不是一就会爆炸或者消失。 使用ReLu是很好的选择,只要不被全部判为0,导数最大必为1。
Unitary Evolution RNNs a gradient step will typically yield a matrix that is not unitary Projecting costs O(n3) computation. Lamma: A complex square matrix W is unitary if and only if it has an eigendecomposition of the form W = VDV* , W is a real orthogonal matrix if and only if for every eigenvalue D(j,j) = 𝜆 𝑗 with eigenvector 𝑣 𝑗 Requires O(n2) memory Setting V to identity reducing memory and W remain diagonal. 在这样一个基于梯度的方法中,直接参数化酉矩阵不现实。 我们想要利用的是,正交矩阵和酉矩阵特征值绝对值为1的特性。
Unitary Evolution RNNs Alternative strategy to parameterize unitary matrices Compose several simple, parameteric, unitary matrices to construct a single, expressive unitary matrix. D,R,Pi 都是时间空间O(n)的。F和F-1时间是O(nlogn),不需要空间。 除了中间的都是负数,但是我们用实数来表示,顶层只要是实数并且可微分,就可以进行学习。 uRNN:
Complex hidden units Complex hidden units 在实现中实际上使用实数来表示复数的。 用虚实部的形式来写。 This allows us to implement everything using real valued operations, compatible with any any deep learning framework with automatic differentiation such as Theano.
Input to Hidden, Nonlinearity, Hidden to Output Input to hiden: follows the same hidden to hidden mapping as equation. Nonliearity: A variation of the ReLU that we name modReLU. Hidden to out: 非线性用relu特别不好,可能是直接强行分别对虚部实部进行relu破坏了本来的相位。 定义U是实数,得到实数输出,之后接softmax或者什么的就好。 因为通过酉矩阵的正交约束,很稳定,不太受初始值的影响。但是还是给了。
Experiments See Paper
Learning Convolutional Neural Networks for Graphs Motivation Given a collection of graphs, learn a function that can be used for classification and regression problems on unseen graphs. (chemical compound) - Given a large graph, learn graph representations that can be used to infer unseen graph properties such as node types and missing edges.(social network, unlabeled graph) 2016ICML上的一篇论文,对于如何将cnn应用在graph上提供了一种新的思路。 总体上讲,就是用w个固定size=(k+1)的子图来表示输入的graph,再将这w个子图正则化后,生成一个w(k+1)维的向量,作为传统的cnn的输入,进行学习。 其实就是做了一个从graph到向量的映射的一个预处理过程。 首先进行把graph和image进行了这样一个类比。 这个邻域的东西实际上就是感受野,卷积kernel。
Learning Convolutional Neural Networks for Graphs An image can be represented as a square grid graph whose nodes represent pixels. Now, a CNN can be seen as traversing a node sequence (nodes 1 -4 in Figure 1 (a)) and generating fixed-size neighborhood graphs (the 3 x3 grids in Figure 1 (b)) for each of the nodes. The neighborhood graphs serve as the receptive fields to read feature values from the pixel nodes. 2016ICML上的一篇论文,对于如何将cnn应用在graph上提供了一种新的思路。 总体上讲,就是用w个固定size=(k+1)的子图来表示输入的graph,再将这w个子图正则化后,生成一个w(k+1)维的向量,作为传统的cnn的输入,进行学习。 其实就是做了一个从graph到向量的映射的一个预处理过程。 首先进行把graph和image进行了这样一个类比。 这个邻域的东西实际上就是感受野,卷积kernel。
PATCHY -SAN PATCHY -SAN architecture which has several advantages over existing approaches: First, it is highly efficient, naively parallelizable, and applicable to large graphs. Second, for a number of applications, ranging from computational biology to social network analysis, it is important to visualize learned network motifs (Milo et al. , 2002 ). (1) Select a fixed-length sequence of nodes from the graph; (2) assemble a fixed-size neighborhood for each node in the selected sequence; (3) normalize the extracted neighborhood graph; (4) learn neighborhood representations with convolutional neural networks from the resulting sequence of patches 容易并行 有大规模应用 从图里选一个节点序列,对序列中的某些节点生成邻域子徒并且归一化,得到感受野,然后进行正常的卷积。 输入:任意一张图 输出:每个channel输出w个receptive field,之后拿这个感受野就可以进行卷积操作了。 接下来就详细说下这几个操作。
Node Sequence Selection Step1: graph labeling(对图的节点做标记,比如可以用节点的度做标记,做图的划分,也可以叫做color refinement or vertex classification) 文中采用The Weisfeiler-Lehman algorithm做图的划分。由此可以得到每个节点的rank 值(为了不同的图能够有一个规范化的组织方式) Step2:对labeling好的节点排序,取前w个节点,作为处理的节点序列。(这样就可以把不同size的graph,变成同一个size)若不足w个节点,则,在输出中加全零的receptive field,相当于padding Step3:采用stride=s来遍历这w个节点。文中s=1(若s)1,为了输出有w个receptive field, 也用step2的方式补全) Given Labeling: Weisfeiler-Lehman algorithm for graph labeling
Neighborhood Assembly 对遍历到的每个节点v(称作root),采用广搜的方式获得此节点的k个1-neighborhood, 如果不k个,再遍历1-neighborhood的1-neighborhood。直到满足k个,层次遍历。或者所有的 邻居节点都遍历完。此节点和他的k个邻居节点就生成了neighborhood graph。 breadth-first search K 1-neighborhood, then k 1-1-neighborhood, … until k or end.
Graph Normalization But Optimal graph normalization is NP-hard! So we only compare but not solve Problem 1. 解释算法一,找标记算法实际上是找好的结构化描述,即相似的结构应该在labeling以后也保持相似。 Step5: step4就生成了w个(s=1)neighborhood graph。需要对着w个graph 进行labeling, 根据离root节点v的远近来计算每个节点的rank,根据算法4是离v越近,rank越高。 如果每个neighborhood graph不足k个节点,用0节点补充pad。 规范化step5得到了已经label好的graph,因为需要把它变成injective,使每个节点 的标签唯一,采用nauty的算法。
Convolutional Architecture For each input G After the algrithm above, we have a blob(batch,height,width,channel) Vertices (w,k,1,av) Edges (w,k,k,ae) 如何去接入CNN的框架呢? 没有用文章里的张量解释,
Experiments See Paper
Thank You