MVSNet

摘要

MVSNet的大致流程

先根据网络提取特征
通过单应性根据照相机的锥形体构建3D的cost volume
用3D的卷积去回归初始深度图，然后根据参考的图片去生成最终的结果
（适合任意的输入数目）

介绍

传统方法在**朗博情景（Lambertian scenarios,理想反射面，不吸收任何光，都反射掉）**下有很好的结果，但是对于镜面反射、纹理等难以处理，在重建的完成性有很大的提升空间
基于CNN的重建，可以引入全局的语义信息，例如反射的先验信息，有更强的匹配能力。在两个视角的立体匹配中，有了很好的尝试，优于传统方法。将匹配问题变成了水平方向的像素的视差估计（disparity estimation什么意思，定义）
目前多视角立体重建的两种方法
SurfaceNet、
Learned Stereo Machine (LSM)
适合小尺寸的，耗时过长
MVSNet（将重建 ===> 深度估计的问题）
输入：一个参考图+n个源图像
输出：参考图的深度图（一张）
1. 先根据网络提取特征
2. 通过单应性（key）根据照相机的锥形体构建3D的cost volume
3. 用3D的卷积去回归初始深度图，然后根据参考的图片去生成最终的结果
4. 基于方差的度量方法，maps multiple features into one cost feature in the volume
  （适合任意的输入数目）
5. 后处理，重建点云

MVSNet

training
1. 数据准备（图片的大小必须是32的倍数）
  1. images [None, 3, None, None, 3]
  2. cam [None, 3, 2, 4, 4] （相机参数具体的形式）
  3. depth_img [None, None, None, 1]
    （None会动态的设定shape）
  4. depth_start = tf.reshape(
    tf.slice(cams, [0, 0, 1, 3, 0], [1, 1, 1, 1, 1]), [1])
    depth_interval = tf.reshape(
    tf.slice(cams, [0, 0, 1, 3, 1], [1, 1, 1, 1, 1]), [1])
2. 网络结构
  1. Image Features+ Differentiable Homography
    1. Image Features-----UNetDS2GN
      1. 定义
        ref_image = tf.squeeze(tf.slice(images, [0, 0, 0, 0, 0], [-1, 1, -1, -1, 3]), axis=1)
        ref_cam = tf.squeeze(tf.slice(cams, [0, 0, 0, 0, 0], [-1, 1, 2, 4, 4]), axis=1)
        ref_tower = UNetDS2GN({‘data’: ref_image}, is_training=True, reuse=True)
        view_towers = []
        view_tower = UNetDS2GN({‘data’: view_image}, is_training=True, reuse=True)
        view_towers.append(view_tower)
      2. 结构
        UNetDS2GN:unet+uniNetDS2GN(7conv_gn+1conv[kerner:3,3,5,3,3,5,3,3],stride:[1,1,2,1,1,2,1,1])
    2. Differentiable Homography
      1. 定义
        depth_end = depth_start + (tf.cast(depth_num, tf.float32) - 1) * depth_interval
        depth_num = 192
      2. 程序
        view_homographies = []
        for view in range(1, FLAGS.view_num):
        view_cam = tf.squeeze(tf.slice(cams, [0, view, 0, 0, 0], [-1, 1, 2, 4, 4]), axis=1)
        homographies = get_homographies(ref_cam, view_cam, depth_num=depth_num,
        depth_start=depth_start, depth_interval=depth_interval)
        view_homographies.append(homographies)
      3. get_homographies
        depth = depth_start + tf.cast(tf.range(depth_num), tf.float32) * depth_interval(一维数组)
        R(3x3) T(3x1) depth_mat（1x192x3x3）
        计算公式 KiRi((1 - (RiTi - RT)*fronto_direction)/depth_mat)RtK-1
        结果的大小 (1X192X3X3)
        对于192个不同的depth都对应一个3x3的矩阵（单应性）
        view_homographies的大小 [2(num_view-1),192,3,3]
    3. cost Metric — cost volume
      1. 定义
        feature_c = 32
        feature_h = FLAGS.max_h / 4
        feature_w = FLAGS.max_w / 4
        ave_feature = ref_feature = ref_tower.get_output()
        ave_feature = ref_feature2 = tf.square(ref_feature)
        view_features = tf.stack(view_features, axis=0)
      2. 过程
        根据计算的单应性的矩阵，将其他视角的照片变为参考图片的视角的图片
        根据各个视角，计算Cost Volumes(公式）
    4. cost volume regularization
      filtered cost volume, size of (B, D, H, W, 1)
      网络 RegNetUS0
      output filtered_cost_volume
    5. depth map and probability map
      网络的结果-> softmax() x-1 --> 分别乘以不同的深度，得出 depth map
      
      根据depth map在深度[0,192]的那一个位置来抽取filtered_cost_volume中的4张图，求和，得出probability map
    6. refinde-depth_map
    7. loss

重建后的度量标准

f-score https://www.tanksandtemples.org/tutorial/
accuracy
completeness

相机的参数

cam [2,4,4]
1[4,4] R + T
[[r1,r2,r3,t1]
[r4,r5,r6,t2]
[r7,r8,r9,t3]
[0,0,0,1]]
2[4,4] K内参矩阵
[[fx,s,x0,0]
[0,fy,y0,0]
[0,0,1,0]
[depth_start,depth_interval,0,1]]

视差

tenserflow

tf.slice(input_, begin, size, name=None)：按照指定的下标范围抽取连续区域的子集
input = [[[1, 1, 1], [2, 2, 2]],
[[3, 3, 3], [4, 4, 4]],
[[5, 5, 5], [6, 6, 6]]]
tf.slice(input, [1, 0, 0], [1, 1, 3]) ==> [[[3, 3, 3]]]
tf.slice(input, [1, 0, 0], [1, 2, 3]) ==> [[[3, 3, 3],
[4, 4, 4]]]
tf.set_shape()
shape里面可以有None，会根据输入的大小自动调整
tf.tile(input ,multiples,name=None)
tensorflow中的tile()函数是用来对张量(Tensor)进行扩展的，其特点是对当前张量内的数据进行一定规则的复制。最终的输出张量维度不变。
a = tf.constant([[1, 2], [3, 4], [5, 6]], dtype=tf.float32)
a1 = tf.tile(a,[2,3])
[[1,2,1,2,1,2],[3,4,3,4,3,4],[5,6,5,6,5,6],
[1,2,1,2,1,2],[3,4,3,4,3,4],[5,6,5,6,5,6]]
squeeze(input,axis=None,name=None,squeeze_dims=None)

‘t’ is a tensor of shape [1, 2, 1, 3, 1, 1]

tf.shape(tf.squeeze(t)) # [2, 3]

Variable
在TensorFlow中，变量(Variable)是特殊的张量(Tensor)，它的值可以是一个任何类型和形状的张量。
v = tf.Variable([1,2,3]) #创建变量v，为一个array
tf.stack
tf.stack(values, axis=0, name=‘stack’)以指定的轴axis，将一个维度为R的张量数组转变成一个维度为R+1的张量。即会在新的张量阶上合并，张量的阶数也会增加
a = tf.constant([[1,1],[2,2]]) #2x2
b = tf.constant([[3,3],[4,4]]) #2x2
c = tf.stack([a,b],axis=0) # 2x2x2
[[[1,1],[2,2]],[[3,3],[4,4]]]
tf.linspace(start,stop,num,name=None)
返回一个tensor，该tensor中的数值在start到stop区间之间取等差数列
tf.reduce_sum() 求和 axis=0 ,axis=1
tf.clip_by_value(A, min, max):输入一个张量A,把A中的每一个元素的值都压缩在min和max之间。小于min的让它等于min,大于max的元素的值等于max
tf.gather_nd
同上，但允许在多维上进行索引

网站

https://www.eth3d.net/view_multi_view_result?dataset=12&tid=1 (球)
http://zhyan.tk/2017/07/03/mvs-learn-1-middlebury/
(A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms)
https://www.cnblogs.com/gemstone/archive/2011/12/19/2293806.html 视差 Disparity / Parallax

问题

点云重建成网格是不是很难
the Poisson reconstruction
看论文 poisson surface reconstruction以及screened poisson reconstruction
为了便于理解可以先阅读marching cubes reconstrution

MVSNet

MVSNet

摘要

介绍

相关的工作

MVSNet

重建后的度量标准

相机的参数

视差

tenserflow

‘t’ is a tensor of shape [1, 2, 1, 3, 1, 1]

网站

问题

相关推荐