无监督且尺度一致的深度估计与视觉SLAM

看了 https://www.bilibili.com/video/av77782083?t=2318 以后记录了一下

1. 单目无监督深度估计原理

1. Stereo Pair

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
无监督且尺度一致的深度估计与视觉SLAM
优点:有绝对尺度
缺点:occlusion issue(左图中的image在右图中没有出现)(双目匹配中的问题)

2. Monocular Video

Unsupervised Learning of Depth and Ego-Motion from Video
无监督且尺度一致的深度估计与视觉SLAM
缺点:没有绝对尺度(scale ambiguity,尺度奇异)
occlusion issue
动态物体问题
尺度不一致性(不同序列训练出的深度尺度不同)

  1. Stereo Video
    Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction
    无监督且尺度一致的深度估计与视觉SLAM
    优点:
    有绝对尺度
    可以做VO
    问题:
    occlusion issue
    动态物体问题

2. 输出尺度不一致问题

现象与影响:
Predict depth and pose with varying scales on a sequence
Depth cannot be fused together for mapping
Poses cannot concatenated for camera localization
造成问题的原因:
Scale ambiguity
Photometric loss is scale-invariant
Training samples are independently processed

3. 作者的方案

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video
无监督且尺度一致的深度估计与视觉SLAM
Geometry Consistency Loss (for scale consistency)
Self Discovered Mask (for handling occlusion and dynamics)

Relative depth error:
无监督且尺度一致的深度估计与视觉SLAM
Geometry Consistency Loss:
无监督且尺度一致的深度估计与视觉SLAM
Self Discovered Mask:
无监督且尺度一致的深度估计与视觉SLAM
问题:
Depth 估计存在问题:
Although consistent, but the scale is still unknown
Visual Odometry 存在问题:
Lack of muti-view optimization
Heavy drifts in long videos

4. 用输出尺度一致的深度做视觉SLAM