The following problem appeared as an assignment in the *coursera course Algorithm-I* by *Prof.Robert Sedgewick ** *from the Princeton University few years back (and also in the course *cos226* offered at *Princeton*). The problem definition and the description is taken from the course website and lectures. The original assignment was to be done in java, where in this article both the *java* and a corresponding *python* implementation will also be described.

- Use a
to support*2d-tree*- efficient
(find*range search**all of the points*contained in a*query rectangle*) (find a*nearest-neighbor search**closest*point to a*query point*).

2d-trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval. The figure below describes the problem:

**2d-tree implementation**: Ais a generalization of a*2d-tree***BST**to two-dimensional keys. The idea is to build a BST with points in the nodes, using the*x*– and*y*-coordinates of the points as keys in strictly alternating sequence, starting with the*x*-coordinates, as shown in the next figure.The algorithms for search and insert are similar to those for BSTs, but at the root we use the*Search and insert.**x*-coordinate (if the point to be inserted has a smaller*x*-coordinate than the point at the root, go left; otherwise go right); then at the next level, we use the*y*-coordinate (if the point to be inserted has a smaller*y*-coordinate than the point in the node, go left; otherwise go right); then at the next level the*x*-coordinate, and so forth.- The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of
and*range search*. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its subtree. The root corresponds to the entire plane [(−∞, −∞), (+∞, +∞ )]; the left and right children of the root correspond to the two rectangles split by the*nearest-neighbor search**x*-coordinate of the point at the root; and so forth.To find all points contained in a given query rectangle, start at the root and recursively search for points in**Range search**:*both*subtrees using the following*pruning rule*: if the query rectangle does not intersect the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a subtree only if it might contain a point contained in the query rectangle.To find a closest point to a given query point, start at the root and recursively search in**Nearest-neighbor search**:*both*subtrees using the following*pruning rule*: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a node only if it might contain a point that is closer than the best one found so far. The effectiveness of the pruning rule depends on quickly finding a nearby point. To do this, organize the recursive method so that when there are two possible subtrees to go down, you choose first*the subtree that is on the same side of the splitting line as the query point*; the closest point found while exploring the first subtree may enable pruning of the second subtree.**k-nearest neighbors search**: This method returns the*k*points that are closest to the query point (in any order); return all*n*points in the data structure if*n*≤*k*. It must do this in an efficient manner, i.e. using the technique from kd-tree nearest neighbor search, not from brute force.**BoidSimulator**: Once the**k-nearest neighbors search**we can simulate boids: how a flock of birds flies together and a hawk predates. Behold their flocking majesty.The following figures show the theory that are going to be used, taken from the lecture slides of the same course.

## Results

The following figures and animations show how the

**2-d-tree**is grown with*recursive space-partioning*for a few sample datasets. - efficient
**Circle 10 dataset**

/>/>>

**Circle 100 dataset**

The following figure shows the result of the * range search algorithm* on the same dataset after the

The next animations show the * nearest neighbor search algorithm* for a

The next animation shows how the kd-tree is traversed for **nearest-neighbor search **for a different query point (0.04, 0.7).

The next figures show the result of * k-nearest-neighbor search*, by extending the previous algorithm with different values of

As can be seen from the next figure, the time complexity of 2-d tree building (insertion), nearest neighbor search and k-nearest neighbor query depend not only on the *size* of the datasets but also on the *geometry* of the datasets.

**Applications**

The **flocking*** boids simulator* is implemented with

**Training phase**

Build a *2d-tree* from a *labeled* 2D training dataset (points marked with red or blue represent 2 different *class labels*).

**Testing phase**

- For a
*query point*(new test point with*unknown class label*) run*k-nearest neighbor search*on the*2d-tree*with the query point (for a fixed value of k, e.g., 3). - Take a
*majority vote*on the class labels of the*k-nearest neighbors*(with known class labels) obtained by querying the 2d-tree.*Label*the*query point*with the class label that*majority*of its*neighbors*have. - Repeat for different values of k.

The following figures show how the *kd tree *built can be used to *classify* (randomly generated) 2D datasets and the *decision boundaries* are learnt with *k=3, 5* and *10**respectively*.

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- Natural Language Trends in Visual Analysis - Aug 6

In this latest Data Science Central webinar, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, autocompletion to supporting analytical intent. More recently, chatbot systems have garnered interest as conversational interfaces for a variety of tasks. Machine learning approaches have proven to be promising for approximating the heuristics and conversational cues for continuous learning in a chatbot interface. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- Natural Language Trends in Visual Analysis - Aug 6

In this latest Data Science Central webinar, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, autocompletion to supporting analytical intent. More recently, chatbot systems have garnered interest as conversational interfaces for a variety of tasks. Machine learning approaches have proven to be promising for approximating the heuristics and conversational cues for continuous learning in a chatbot interface. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central