Range Searching and Multi-Dimensional Data|Quadtrees|K-d Trees|CS 61B Data Structures, Spring 2019

Multi-Dimensional Data

Motivation: 2D Range Finding and Nearest Neighbors

Suppose we want to perform operations on a set of Body objects in space?

2D Range Searching: How many objects are in the highlighted rectangle?
Nearest Neighbors: What is the closest object to the space horse?

A silly way to find the Answer:

Suppose the set is stored as a hash table, where the hash code of each Body is given below.

How would nearest( ) be implemented?

The bucket that each object is effectively random. Have to iterate over all N items.

What would be runtime of nearest?

Θ(N)

The problem with hash tables is that the bucket # of an item is effectively random.

How to fix it?

Uniform Partitioning

The problem with hash tables is that the bucket # of an item is effectively random

One fix is to ensure that the bucket #s depend only on position!

Simplest idea: Partition space into uniform rectangular buckets (also called “bins”).

Example on right for 4x4 grid of such buckets.

Range Searching and Multi-Dimensional Data|Quadtrees|K-d Trees|CS 61B Data Structures, Spring 2019

Question:

What is the runtime for nearest assuming points are evenly spread out?
Still Θ(N).
On average, runtime will be 16 times faster than without the spatial partitioning, but N/16 is still Θ(N).
Note: Often works well in practice. But let’s see a better way

QuadTrees

Trees vs. Hash Tables

One key advantage of Search Trees over Hash Tables is that trees explicitly track the order of items.

Examples:

Finding the minimum item in a BST is Θ(log N) time, but Θ(N) in a hash table.
Can relatively easy modify BSTs to support operations like rank(5), which returns the 5th largest item in Θ(log N) time (see optional textbook).
- Impossible to do efficiently in a hash table.

Basic idea:

Build a BST where every node has 4 neighbors, corresponding to northwest, northeast, southwest, and southeast.

Quadtree Insertion Demo

Below: Quadtree Representation of 5 objects in 2D space.

Insertion Demo: Link

Range Searching and Multi-Dimensional Data|Quadtrees|K-d Trees|CS 61B Data Structures, Spring 2019

Quadtrees are also spatial partitioning in disguise.

Earlier approach: Uniform partitioning (perfect grid of rectangles).
Quadtrees: Hierarchical partitioning. Each node “owns” 4 subspaces.
- Space is more finely divided in regions where there are more points.(空间划分的更为精细，可以准确的找到某个点所在的位置)
- Results in better runtime in many circumstances

Quadtree Range Finding Demo

Quadtrees allow us to better prune when performing a rectangle search.

Same optimization as before: Prune (i.e. don’t explore) subspaces that don’t intersect the query rectangle.（不探索不与感兴趣区域相交的部分）
Range Search Demo: Link

Higher Dimensional Data and K-d Trees

3D Data and Even Higher Dimensional Space

Suppose we want to store objects in 3D space.

Quadtrees only have four directions, but in 3D, there are 8.

One approach: Use an Oct-tree or Octree.

Very widely used in practice.

Even Higher Dimensional Space
you may want to organize data on a large number of dimensions.
Example: Want to find songs with the following features:

Length between 3 minutes and 6 minutes.
Between 1000 and 20,000 listens.（有1000到两万人听过）
Between 120 to 150 BPM.(歌曲的速度又称“”曲速“”)
Were recorded after 2004.

In these cases, one somewhat common solution is a k-d tree.

Fascinating data structure that handles arbitrary numbers of dimensions.
- k-d means “k dimensional”.
For the sake of simplicity in lecture, we’ll use 2D data, but the idea generalizes naturally.

K-d Trees

k-d tree example for 2-d:

Basic idea, root node partitions entire space into left and right (by x).
All depth 1 nodes partition subspace into up and down (by y).
All depth 2 nodes partition subspace into left and right (by x).

若根节点按照x排序，则其第一级子节点按y排序，第二级子节点按x排序，以此类推

Let’s see an example of how to build a k-d tree.
K-d tree insertion demo.

Each point owns 2 subspaces.

Similar to a quadtree.
- Example: D owns the two subspaces shown.
The top subspace is infinitely large.

红线代表在该节点，其所在空间被按照该节点x周划分，蓝线则代表空间被所在节点的y轴划分。

K-d Trees and Nearest Neighbor

Just like spatial partitioning and quadtrees, k-d trees support an efficient nearest method.

Optimization: Do not explore subspaces that can’t possible have a better answer than your current best.

Example: Find the nearest point to (0, 7).
Answer is (1, 5).

K-d tree nearest demo.

Nearest Pseudocode

Range Searching and Multi-Dimensional Data|Quadtrees|K-d Trees|CS 61B Data Structures, Spring 2019

Summary and Applications

Multi-Dimensional Data Summary

Multidimensional data has interesting operations:

Range Finding: What are all the objects inside this (rectangular) subspace?
Nearest: What is the closest object to a specific point?
- Can be generalized to k-nearest.

The most common approach is spatial partitioning:

Uniform Partitioning: Analogous to hashing.
Quadtree: Generalized 2D BST where each node “owns” 4 subspaces.
K-d Tree: Generalized k-d BST where each node “owns” 2 subspaces.
- Dimension of ownership cycles with each level of depth in tree.

Spatial partitioning allows for pruning of the search space.

Range Searching and Multi-Dimensional Data|Quadtrees|K-d Trees|CS 61B Data Structures, Spring 2019

Multi-Dimensional Data

Motivation: 2D Range Finding and Nearest Neighbors

A silly way to find the Answer:

How to fix it?

Uniform Partitioning

Simplest idea: Partition space into uniform rectangular buckets (also called “bins”).

Question:

QuadTrees

Trees vs. Hash Tables

Basic idea:

Quadtree Insertion Demo

Quadtree Range Finding Demo

Higher Dimensional Data and K-d Trees

3D Data and Even Higher Dimensional Space

K-d Trees

K-d Trees and Nearest Neighbor

Nearest Pseudocode

Summary and Applications

Multi-Dimensional Data Summary

相关推荐