
Geometry : 
Subject: Re: [geometry] Spatial Indexes: build and knn query performance for bulk loading/packing vs. linear balancing
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 20131111 08:42:41
Hi Mikkel,
Mikkel B. Stegmann wrote:
> Hi Adam,
>
> thanks for your input. I've now trimmed the test and converted it back to
> using actual boost 2d point instances. The build time differences remain the
> same. See pdf's and code below for details (the
> "SpatialIndexTrimmedUtilities.h" include is solely for the time and memory
> measurements done in the brief test code).
>
> One thing worth noticing  and this is entirely my bad  the search times
> are improved compared to earlier. I (wrongfully) assumed that a run time of
> 510 us was fine for a single search test, as my timings were highly
> reproducible. However, there seems to be some initial cost of doing the
> first search, as the search time is much lower when looking at an average
> across 100 searches (which I do now). The difference between bulk/nonbulk
> however, remains subtle.
>
> As the building times are my main concern, I will try to add some jitter to
> my uniform points and redo the test. (But, the problem is that my actual
> application is not that far from a fairly regular grid.)
>
Yes, in your specific case the linear algorithm may be the best.
But to be honest the Rtree probably isn't the best spatial index type
in your case. If your points are generated uniformly (more or less) e.g.
one point per cell of some grid then you may predict exactly which
points must be checked. Maybe you don't need a spatial index at all?
Also you may create the spatial index using the info about all points
(you musn't insert or remove them later). I'd choose a different
datastructure in this case. However I understand that you don't want to
reinvent the wheel and just use some existing library.
You could consider implementing some simple spatial index. Which one?
This strongly depends on your needs.
If points are distributed across some grid and fill all cells, you might
store them in a Regular Grid, e.g. using vector as a storage (or vector
of vectors if you'd like to store more than one point in each cell) and
directly extract points from the area of interest.
If there are some empty areas where there are no points (and you'd like
to save some space) you might store them in a vector and sort them
according to the index of a cell on a Space Filling Curve (e.g.
http://en.wikipedia.org/wiki/Zorder_curve) which maps the nd
coordinates to 1d, then just find your points in this sorted order.
You could also go for KDTree classic variant in which points are stored
as nodes partitioning the space. This binary tree may be stored in e.g.
a vector where the node's children are stored at indexes 2*n and 2*n+1.
Since your points are uniformly distributed I'd go for a simplest
splitting algorithm using object median (std::nth_element) splitting
objects on a half (for next dimension) for each new level of nodes. The
idea is very simple (divide and conquer) and it would be easy to implement.
The choose also depends on what you need to get as an output. E.g. knn
queries are very simple to implement using KDtree.
Regards,
Adam
Geometry list run by mateusz at loskot.net