Range Searching and Multi-Dimensional Data|Quadtrees|K-d Trees|CS 61B Data Structures, Spring 2019
Multi-Dimensional Data
Motivation: 2D Range Finding and Nearest Neighbors
Suppose we want to perform operations on a set of Body objects in space?
- 2D Range Searching: How many objects are in the highlighted rectangle?
- Nearest Neighbors: What is the closest object to the space horse?
A silly way to find the Answer:
Suppose the set is stored as a hash table, where the hash code of each Body is given below.
How would nearest( ) be implemented?
- The bucket that each object is effectively random. Have to iterate over all N items.
What would be runtime of nearest?
- Θ(N)
The problem with hash tables is that the bucket # of an item is effectively random.
How to fix it?
Uniform Partitioning
The problem with hash tables is that the bucket # of an item is effectively random
- One fix is to ensure that the bucket #s depend only on position!
Simplest idea: Partition space into uniform rectangular buckets (also called “bins”).
- Example on right for 4x4 grid of such buckets.
Question:
What is the runtime for nearest assuming points are evenly spread out?
Still Θ(N).
On average, runtime will be 16 times faster than without the spatial partitioning, but N/16 is still Θ(N).
Note: Often works well in practice. But let’s see a better way
QuadTrees
Trees vs. Hash Tables
One key advantage of Search Trees over Hash Tables is that trees explicitly track the order of items.
Examples:
- Finding the minimum item in a BST is Θ(log N) time, but Θ(N) in a hash table.
- Can relatively easy modify BSTs to support operations like rank(5), which returns the 5th largest item in Θ(log N) time (see optional textbook).
- Impossible to do efficiently in a hash table.
Basic idea:
Build a BST where every node has 4 neighbors, corresponding to northwest, northeast, southwest, and southeast.
Quadtree Insertion Demo
Below: Quadtree Representation of 5 objects in 2D space.
- Insertion Demo: Link
Quadtrees are also spatial partitioning in disguise.
- Earlier approach: Uniform partitioning (perfect grid of rectangles).
- Quadtrees: Hierarchical partitioning. Each node “owns” 4 subspaces.
- Space is more finely divided in regions where there are more points.(空间划分的更为精细,可以准确的找到某个点所在的位置)
- Results in better runtime in many circumstances
Quadtree Range Finding Demo
Quadtrees allow us to better prune when performing a rectangle search.
- Same optimization as before: Prune (i.e. don’t explore) subspaces that don’t intersect the query rectangle.(不探索不与感兴趣区域相交的部分)
- Range Search Demo: Link
Higher Dimensional Data and K-d Trees
3D Data and Even Higher Dimensional Space
- Suppose we want to store objects in 3D space.
- Quadtrees only have four directions, but in 3D, there are 8.
One approach: Use an Oct-tree or Octree.
- Very widely used in practice.
- Even Higher Dimensional Space
you may want to organize data on a large number of dimensions.
Example: Want to find songs with the following features:
- Length between 3 minutes and 6 minutes.
- Between 1000 and 20,000 listens.(有1000到两万人听过)
- Between 120 to 150 BPM.(歌曲的速度又称“”曲速“”)
- Were recorded after 2004.
In these cases, one somewhat common solution is a k-d tree.
- Fascinating data structure that handles arbitrary numbers of dimensions.
- k-d means “k dimensional”.
- For the sake of simplicity in lecture, we’ll use 2D data, but the idea generalizes naturally.
K-d Trees
k-d tree example for 2-d:
- Basic idea, root node partitions entire space into left and right (by x).
- All depth 1 nodes partition subspace into up and down (by y).
- All depth 2 nodes partition subspace into left and right (by x).
若根节点按照x排序,则其第一级子节点按y排序,第二级子节点按x排序,以此类推
- Let’s see an example of how to build a k-d tree.
K-d tree insertion demo.
Each point owns 2 subspaces.
- Similar to a quadtree.
- Example: D owns the two subspaces shown.
- The top subspace is infinitely large.
红线代表在该节点,其所在空间被按照该节点x周划分,蓝线则代表空间被所在节点的y轴划分。
K-d Trees and Nearest Neighbor
Just like spatial partitioning and quadtrees, k-d trees support an efficient nearest method.
- Optimization: Do not explore subspaces that can’t possible have a better answer than your current best.
Example: Find the nearest point to (0, 7).
Answer is (1, 5).
Nearest Pseudocode
Summary and Applications
Multi-Dimensional Data Summary
Multidimensional data has interesting operations:
- Range Finding: What are all the objects inside this (rectangular) subspace?
- Nearest: What is the closest object to a specific point?
- Can be generalized to k-nearest.
The most common approach is spatial partitioning:
- Uniform Partitioning: Analogous to hashing.
- Quadtree: Generalized 2D BST where each node “owns” 4 subspaces.
- K-d Tree: Generalized k-d BST where each node “owns” 2 subspaces.
- Dimension of ownership cycles with each level of depth in tree.
Spatial partitioning allows for pruning of the search space.