K-d树进行最近邻搜索的过程演示和详细分解

K-d树进行最近邻搜索的过程演示和详细分解

In this tutorial we will go over how to use a KdTree for finding the K nearest neighbors of a specific point or location, and then we will also go over how to find all neighbors within some radius specified by the user (in this case random).

本文将演示我们如何用KdTree去寻找查找特定点或位置的k个最近邻,然后我们还将介绍如何查找用户指定半径内的所有邻(在本例中是随机的)。

K-d树的组织

A k-d tree, or k-dimensional tree, is a data structure used in computer science for organizing some number of points in a space with k dimensions. It is a binary search tree with other constraints imposed on it. K-d trees are very useful for range and nearest neighbor searches. Each level of a k-d tree splits all children along a specific dimension, using a hyperplane that is perpendicular to the corresponding axis. At the root of the tree all children will be split based on the first dimension (i.e. if the first dimension coordinate is less than the root it will be in the left-sub tree and if it is greater than the root it will obviously be in the right sub-tree). Each level down in the tree divides on the next dimension, returning to the first dimension once all others have been exhausted. They most efficient way to build a k-d tree is to use a partition method like the one Quick Sort uses to place the median point at the root and everything with a smaller one dimensional value to the left and larger to the right. You then repeat this procedure on both the left and right sub-trees until the last trees that you are to partition are only composed of one element.

From~Wikipedia[1]^[1]

K-d树,或称K维树,在计算机科学中常用来组织K维空间中若干点的数据结构。它是一个带有其他约束的二进制搜索树。K-D树在范围查询(range search)最近邻(nearest neighbor search)中非常有效。K-D树的每个级别(level)都使用垂直于相应轴的超平面沿特定维度分割所有子级。在树的根上,所有子级都将根据第一个维度进行拆分(即,如果第一个维度坐标小于根,则它将位于左子树中,如果它大于根,则它显然位于右子树中)。树中的每一层都在下一个维度上进行划分,当所有其他维度都用尽后,将返回到第一个维度。构建K-D树最有效的方法是使用一种类似于快速排序的分区方法,将中间点放在根上,所有东西的一维值越小越好。然后在左树和右树上重复此过程,直到要分区的最后一个树只由一个元素组成。
K-d树进行最近邻搜索的过程演示和详细分解

图1 k-d 树组织和空间划分

图1展示的是一个二维k-d树,对点集(2,3),(5,4),(9,6),(4,7),(8,1),(7,2){(2,3),(5,4),(9,6),(4,7),(8,1),(7,2)}进行空间划分,首先从A节点(7,2)(7,2)开始,以垂直X轴超平面(二维坐标系中的超平面是直线])进行划分,坐标小于A节点的被划分到左子树,横坐标大于A节点的被划分到右子树.同理对其他节点进行划分.

利用K-d树进行最近邻搜索

搜索过程如以下动图2所示:

K-d树进行最近邻搜索的过程演示和详细分解

图2 kd树搜索动态演示

对动图2过程进行拆解如下:

(1)对于给定的kd tree,其根节点是A,以垂直于X轴的直线划分二维空间,在X轴上坐标比A小的被划分到左子树,比A大的被划分到右子树.对第二级(level)节点B,C通过垂直Y轴的直线划分空间.

K-d树进行最近邻搜索的过程演示和详细分解
(2)给定一个点(图上打叉的十字交叉点),如何利用kd树找到其最近邻(Nearest Neighbor,NN)节点及距离呢?当前是不知道的.

K-d树进行最近邻搜索的过程演示和详细分解

(3)首先从kd树的根节点(即A节点)出发,进行深度优先搜索(维持一个栈结构来保存父节点),将查询点到根节点的距离先设为最优估计(best estimate),然后遍历A的左子树.

K-d树进行最近邻搜索的过程演示和详细分解

(4)计算查询点到根节点的左孩子节点(left child,即B节点)的距离,和之前的最优估计(best estimate)进行比较,发现到B节点的距离更短,因此更新最优估计(best estimate),再按照先左子树后右子树的方式依次遍历.

K-d树进行最近邻搜索的过程演示和详细分解

(5)由于B的子节点D和E到查询点的距离都没有B近,因此在B的子分支(sub-branch)上,B是最优估计点,对左子树的遍历完成,按照深度优先搜索的原理,将返回有子树进行遍历.

K-d树进行最近邻搜索的过程演示和详细分解

(6)同理对右子树遍历完,此时A节点的所有孩子节点都已经被搜索过,B节点是整棵树的最优估计,即B节点是查询点的最近邻,同时可以计算得到查询点到最近邻的距离.

K-d树进行最近邻搜索的过程演示和详细分解

Reperence

[1] k-d tree - Wikipedia
[2] 代码实现可参考 http://pointclouds.org/documentation/tutorials/kdtree_search.php