【阅读】-- PillarFlow : End-to-end Birds-eye-view Flow Estimation for Autonomous Driving

IROS 2020 ， Toyota Research Institute 的一篇文章。

本文提出了一种基于激光雷达的场景运动估计器，该估计器与目标检测分离，因此具有互补性。我们的方法将两个连续的完整LIDAR点云扫描作为输入。每个LIDAR扫描都编码为学习的特征向量的离散化2D BeV表示（也称为“Pillars”）。然后，基于PWC-Net工作改编了光流网络，以局部匹配两个连续的BeV特征网格之间的pillars。整个架构是端到端学习的，最终输出是每个网格单元的二维流向量。

Why BEV?

对于自动驾驶，我们主要关注道路和相邻表面上发生的运动，尤其是运动计划。

这种欧几里得表示使我们能够设计网络体系结构，以利用相对于场景运动的空间先验。

与体积方法相比，二维表示的计算效率更高，并且便于与与我们的对象不可知的流动网络并行运行的对象检测器共享表示。

Related Work

<1> Scene / Point Flow Estimation
[1] S. A. Baur, F. Moosmann, S. Wirges, and C. B. Rist, “Real-time 3D LiDAR flow for autonomous vehicles,” in IV, 2019.
[2] V. Vaquero, A. Sanfeliu, and F. Moreno-Noguer, “Deep Lidar CNN to understand the dynamics of moving vehicles,” in ICRA, 2018.

3D场景流方法专注于准确地预测点对点对应。它们经常遭受高昂的计算成本，这对于在机器人平台上实时部署是一个关键的挑战。

<2> Occupancy Grid Maps
[1] D. Nuss, S. Reuter, M. Thom, T. Yuan, G. Krehl, M. Maile, A. Gern, and K. Dietmayer, “A random finite set approach for dynamic occupancy grid maps with real-time application,” IJRR, vol. 37, no. 8, pp.841–866, July 2018.
[2] S. Hoermann, M. Bach, and K. Dietmayer, “Dynamic occupancy grid prediction for urban autonomous driving: A deep learning approach with fully automatic labeling,” in ICRA, 2018.
[3] F. Piewak, T. Rehfeld, M. Weber, and J. M. Zollner, “Fully convolutional neural networks for dynamic object detection in grid maps,” inIV, 2017.

Dynamic occupancy grid maps (DOGMa) estimation.

Method
Step 1: voxelization
将雷达点云投影到同一坐标系下，进行栅格化处理。采用pillar的形式。栅格化后的pillar结构为 $(D, P, N)$ 的tensor。其中 D 是point descriptors的数量，P 是pillars的个数，N 是每个pillar里点云的数目。
本文中， $D = 9$ ，定义如下 $（x, y, z, r, x_c, y_c, z_c, x_p, y_p）$ ，坐标xyz，反射率r，一个pillar中所有点的平均值xc yc zc，距离pillar中心的偏差 xp yp。

Step 2: Encoder
将voxelized pillars 送入简单的PointNet，得到特征图 $（C, P, N）$ 。然后在最后一维上进行max operation，得到 $（C, P）$ 的encoded特征，也就是说每个pillar都是C维特征。最后，将特征图映射回原始的Pillar 位置，形成 $（C, H, W）$ 的pseudo-image。

Step3: 2D BeV Flow Network
基于PWC-Net的工作，进行flow estimation。如图所示。
【阅读】-- PillarFlow : End-to-end Birds-eye-view Flow Estimation for Autonomous Driving
将pillar features送入特征金字塔网路（feature pyramid network）进一步特征编码。然后使用cost volume layer来估计flow，其中将匹配cost定义为两个特征图之间的相关性。最后，应用context network来利用上下文信息进行其他优化。上下文网络是基于前馈卷积的前馈CNN，以及批量归一化和ReLU。

Train
$\mathcal{L} = \sum_{l=l_0}^{L} \alpha_l \sum_x |\hat{f}^l_\theta - f^l_{gt}|_2$

Experiments

Dataset : nuScenes

flow 和 tracking 两个实验

【阅读】-- PillarFlow : End-to-end Birds-eye-view Flow Estimation for Autonomous Driving

相关推荐