CVPR2018论文笔记(三)PPFNet_Part3
这周重点学习了本文第四部分内容,是对PPFNet的全面介绍。
4.PPFNet
(1)Overview
概述部分主要提到了下文重点讲述的内容。第一,对输入准备作出解释。第二,介绍PPFNet的体系结构。第三,作者准备用一个新的损失函数来解释训练方法,以全局方式解决组合对应问题。
【Loss function】损失函数(loss function)是用来估量你模型的预测值f(x)与真实值Y的不一致程度,它是一个非负实值函数,通常使用L(Y, f(x))来表示,损失函数越小,模型的鲁棒性就越好。(定义转自https://blog.****.net/hk121/article/details/71465469)
(2)Encoding of Local Geometry
1.给定一个位于点云xr∈X上的参考点,定义一个局部区域Ω ⊂ X,并在这个局部附近收集一组点{mi} ∈ Ω并计算点集的法线;
2.将相关联的局部参考系[41]的标准轴对准贴片,定向的{xr ∪ {xi}}表示一个局部几何,我们称之为局部贴图;
3.将每个相邻点i与参考点r配对,并计算PPFs;
4.注意,从复杂性方面来说,这与使用点本身无关,由于中心参考点xr的固定,省略了二次配对。
如图3所示,最后的局部几何描述和输入到PPFNet中的是点法线和PPFs的组合集合:
【Local reference frame】In theoretical physics, a local reference frame (local frame) refers to a coordinate system or frame of reference that is only expected to function over a small region or a restricted region of space or spacetime.(From Wikipedia)
(3)Network architecture
PPFNET的总体结构如图2所示。输入包括从一个片段均匀采样的N个局部贴片。由于点式数据表示的稀疏性和PointNet对GPU的高效利用,PPFNet可以同时合并N个贴片。PPFNET的第一个模块是一组mini-PointNets,从局部贴片中提取特征。在训练期间,所有的PointNets都共享权重和梯度。然后,最大池化层将所有局部特征聚合为一个全局特征,将截然不同的局部信息汇总到整个片段的全局背景中。然后将该全局特征连接到每个局部特征。使用一组MLP进一步将全局和局部特征融合到最终全局背景感知的局部描述符中。
(4)N-tuple loss
Our goal is to use PPFNet to extract features for local patches, a process of mapping from a high dimensional non-linear data space into a low dimensional linear feature space. Distinctiveness of the resulting features are closely related to the separability in the embedded space. Ideally, the proximity of neighboring patches in the data space should be preserved in the feature space.
我们的目标是为局部贴片提取特征,一个从一个高维非线性信息数据空间映射到一个低维线性特征空间的过程。所得到特征的独特性与嵌入空间(降维后的空间)的可分离性密切相关。理想情况下,数据空间中相邻贴片的邻近度应该被保存在特征空间中。如图4所示:
Figure 4. Illustration of N-tuple sampling in feature space. Green lines link similar pairs, which are coerced to keep close. Red lines connect non-similar pairs, pushed further apart. Without N-tuple loss, there remains to be some non-similar patches that are close in the feature space and some distant similar patches. Our novel N-tuple method pairs each patch with all the others guaranteeing that all the similar patches remain close and non-similar ones, distant.
图4,对特征空间N元组抽样的说明。绿色的线连接相似的点对,这些点对被“拉”得保持相近。红色的线连接非相似的点对,把它们推得更分离。在没有N元组损失的情况下,仍然会有一些在特征空间中距离近的非相似贴片和一些距离远的相似贴片。我们新颖的N元组方法将每一个贴片与其他贴片配对以保证所有相似的贴片保持相近并且不相似的贴片保持远距离。
To this end, the state of the art seems to adopt two loss functions: contrastive [48] and triplet [23], which try to consider pairs and triplets respectively. Yet, a fragment consists of more than 3 patches and in that case the widely followed practice trains networks by randomly retrieving 2/3-tuples of patches from the dataset. However, networks trained in such manner only learn to differentiate maximum 3 patches, preventing them from uncovering the true matching, which is combinatorial in the patch count.
为此,现有技术似乎采用了两种损失函数:对比[48]和三个一组[23],他们分别试图考虑两两一组和三个一组。然而,一个片段包含超过3个贴片,并且在这种情况下,广泛遵循通过从数据集中随机检索贴片的2/3元组的实际训练网络。然而,以这种方式训练的网络仅学习区分最大3个贴片,防止它们发现真实匹配,这种真实的匹配就是在贴片计数中组合。
Generalizing these losses to N-patches, we propose N-tuple loss, an N-to-N contrastive loss, to correctly learn to solve this combinatorial problem by catering for the many-to-many relations as depicted in Fig. 4. Given the ground truth transformation T, N-tuple loss operates by constructing a correspondence matrix M ∈ RN×N on the points of the aligned fragments. M = (mij) where:
将这些损失推广到N元组,我们提出N元组损失,一种N到N的对比损失,用以通过满足如图4所示的多对多关系来正确学习解决这类组合问题。给出地面真实变换T,N元损失通过在已校准的片段的点上构造一个关联矩阵M ∈ RN×N。M = (mij)来自:
1 is an indicator function. Likewise, we compute a feature space distance matrix D ∈ RN×N and D = (dij) where
符号1是一个指示函数。同样地,我们计算一个特征空间距离矩阵D ∈ RN×N并且D = (dij)来自:
The N-tuple loss then functions on the two distance matrices solving the correspondence problem. For simplicity of expression, we define an operation ∑*(·) to sum up all the elements in a matrix. N-tuple loss can be written as:
N元损失接着在两个距离矩阵上起到解决一致性问题的作用。为了简化表达式,我们定义了一个运算∑*(·)来求取在一个矩阵中所有元素之和。N元损失可以被写作:
Here ◦ stands for Hadamard Product - element-wise multiplication. α is a hyper-parameter balancing the weight between matching and non-matching pairs and θ is the lower bound on the expected distance between non-correspondent pairs. We train PPFNet via N-tuple loss, as shown in Fig.5, by drawing random pairs of fragments instead of patches. This also eases the preparation of training data.
这里◦代表Hadamard Product - element-wise法,α是一个超参数平衡已匹配的点和未匹配点对的宽度,θ是非对应点对之间期望距离的下界。我们通过N元组损失来训练PPFNet,正如图5所示,通过绘制片段的随机对而不是贴片。这也简化了训练数据的准备。
Figure 5. Overall training pipeline of PPFNet. Local patches are sampled from a pair of fragments respectively, and feed into PPFNet to get local features. Based on these features a feature distance matrix is computed for all the patch pairs. Meanwhile, a distance matrix of local patches is formed based on the ground-truth rigid pose between the fragments. By binarizing the distance matrix, we get a correspondence matrix to indicate all the matching and non-matching relationships between patches. N-tuple loss is then calculated by coupling the feature distance matrix and correspondence matrix to guide the PPFNet to find an optimal feature space.
图5.PPFNet的整体训练管道图。局部贴图是分别从一对片段中提取而来的,并且反馈到PPFNet中获得局部特征。基于这些特征,计算所有贴片对的特征距离矩阵。同时,在碎片之间基于地面真实刚性姿态形成局部贴片距离矩阵。通过对距离矩阵进行二值化,我们获得了一个对应矩阵,用于表示所有贴片之间所有匹配和未匹配关系。接着通过耦合特征距离矩阵和对应矩阵来计算N元组损失,来引导PPFNet寻找一个最优特征空间。
关于N元损失函数部分花了很大部分介绍,但是实际上其工作原理还需要花时间进一步理解。这周各方面工作步入正轨。