The following problem appeared as an assignment in the *coursera course Algorithm-I* by *Prof.Robert Sedgewick ** *from the Princeton University few years back (and also in the course *cos226* offered at *Princeton*). The problem definition and the description is taken from the course website and lectures. The original assignment was to be done in java, where in this article both the *java* and a corresponding *python* implementation will also be described.

- Use a
to support*2d-tree*- efficient
(find*range search**all of the points*contained in a*query rectangle*) (find a*nearest-neighbor search**closest*point to a*query point*).

2d-trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval. The figure below describes the problem:

**2d-tree implementation**: Ais a generalization of a*2d-tree***BST**to two-dimensional keys. The idea is to build a BST with points in the nodes, using the*x*– and*y*-coordinates of the points as keys in strictly alternating sequence, starting with the*x*-coordinates, as shown in the next figure.The algorithms for search and insert are similar to those for BSTs, but at the root we use the*Search and insert.**x*-coordinate (if the point to be inserted has a smaller*x*-coordinate than the point at the root, go left; otherwise go right); then at the next level, we use the*y*-coordinate (if the point to be inserted has a smaller*y*-coordinate than the point in the node, go left; otherwise go right); then at the next level the*x*-coordinate, and so forth.- The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of
and*range search*. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its subtree. The root corresponds to the entire plane [(−∞, −∞), (+∞, +∞ )]; the left and right children of the root correspond to the two rectangles split by the*nearest-neighbor search**x*-coordinate of the point at the root; and so forth.To find all points contained in a given query rectangle, start at the root and recursively search for points in**Range search**:*both*subtrees using the following*pruning rule*: if the query rectangle does not intersect the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a subtree only if it might contain a point contained in the query rectangle.To find a closest point to a given query point, start at the root and recursively search in**Nearest-neighbor search**:*both*subtrees using the following*pruning rule*: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a node only if it might contain a point that is closer than the best one found so far. The effectiveness of the pruning rule depends on quickly finding a nearby point. To do this, organize the recursive method so that when there are two possible subtrees to go down, you choose first*the subtree that is on the same side of the splitting line as the query point*; the closest point found while exploring the first subtree may enable pruning of the second subtree.**k-nearest neighbors search**: This method returns the*k*points that are closest to the query point (in any order); return all*n*points in the data structure if*n*≤*k*. It must do this in an efficient manner, i.e. using the technique from kd-tree nearest neighbor search, not from brute force.**BoidSimulator**: Once the**k-nearest neighbors search**we can simulate boids: how a flock of birds flies together and a hawk predates. Behold their flocking majesty.The following figures show the theory that are going to be used, taken from the lecture slides of the same course.

## Results

The following figures and animations show how the

**2-d-tree**is grown with*recursive space-partioning*for a few sample datasets. - efficient
**Circle 10 dataset**

/>/>>

**Circle 100 dataset**

The following figure shows the result of the * range search algorithm* on the same dataset after the

The next animations show the * nearest neighbor search algorithm* for a

The next animation shows how the kd-tree is traversed for **nearest-neighbor search **for a different query point (0.04, 0.7).

The next figures show the result of * k-nearest-neighbor search*, by extending the previous algorithm with different values of

As can be seen from the next figure, the time complexity of 2-d tree building (insertion), nearest neighbor search and k-nearest neighbor query depend not only on the *size* of the datasets but also on the *geometry* of the datasets.

**Applications**

The **flocking*** boids simulator* is implemented with

**Training phase**

Build a *2d-tree* from a *labeled* 2D training dataset (points marked with red or blue represent 2 different *class labels*).

**Testing phase**

- For a
*query point*(new test point with*unknown class label*) run*k-nearest neighbor search*on the*2d-tree*with the query point (for a fixed value of k, e.g., 3). - Take a
*majority vote*on the class labels of the*k-nearest neighbors*(with known class labels) obtained by querying the 2d-tree.*Label*the*query point*with the class label that*majority*of its*neighbors*have. - Repeat for different values of k.

The following figures show how the *kd tree *built can be used to *classify* (randomly generated) 2D datasets and the *decision boundaries* are learnt with *k=3, 5* and *10**respectively*.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central